Variable selection #1

rvosa · 2019-01-09T12:24:05Z

De set niet-gecorreleerde variabelen kan op verschillende manieren bepaald worden, bijvoorbeeld met de hand (te veel werk voor 220 soorten, lijkt me) of automatisch, door iteratief variabelen te verwijderen en AIC tests te doen. Het eindresultaat is dan dat er voor elke soort in principe een andere set variabelen geselecteerd zou kunnen worden. Is dat erg voor de vergelijkbaarheid? Bijvoorbeeld, als twee variabelen altijd met elkaar correleren (zeg, twee proxies die met temperatuur te maken hebben), dan zou het goed zijn voor de interpretatie dat we altijd dezelfde van de twee te selecteren. Maar misschien is het sowieso goed om ook modellen te bouwen waar alle soorten dezelfde set variabelen gebruiken? Punt van discussie...

rvosa · 2019-02-25T13:19:28Z

De gehanteerde method is nu hier geimplementeerd:

trait-geo-diverse-ungulates/script/MaxEnt_function.R

Lines 2 to 55 in f2df7fc

    
           removeCollinearity_adjusted <- function(raster.stack, multicollinearity.cutoff = .7, 
        
                                                   select.variables = FALSE, sample.points = FALSE,  
        
                                                   nb.points = 10000, plot = FALSE) 
        
           { 
        
             env.df <- getValues(raster.stack) 
        
             env.df <- env.df[-unique(which(is.na(env.df), arr.ind = T)[, 1]), ] # Removing NAs  
        
             # Correlation matrix creation 
        
             cor.matrix <- matrix(data = 0, 
        
                                  nrow = nlayers(raster.stack), 
        
                                  ncol = nlayers(raster.stack), 
        
                                  dimnames = list(names(raster.stack), names(raster.stack))) 
        
             # Correlation based on Pearson 
        
             cor.matrix<-1 - abs(stats::cor(env.df, method = "pearson" )) 
        
             cor.matrix[is.na(cor.matrix)]<- 0 
        
             # Transforming the correlation matrix into an ascendent hierarchical classification 
        
             dist.matrix <- stats::as.dist(cor.matrix) 
        
             ahc <- stats::hclust(dist.matrix, method = "complete") 
        
             groups <- stats::cutree(ahc, h = 1 - multicollinearity.cutoff) 
        
             if(length(groups) == max(groups)) 
        
             { 
        
               message(paste("  - No multicollinearity detected in your data at threshold ", multicollinearity.cutoff, "\n", sep = "")) 
        
               mc <- FALSE 
        
             } else 
        
             { mc <- TRUE } 
        
             # Random selection of variables 
        
             if(select.variables) 
        
             { 
        
               sel.vars <- NULL 
        
               for (i in 1:max(groups)) 
        
               { 
        
                 sel.vars <- c(sel.vars, sample(names(groups[groups == i]), 1)) 
        
               } 
        
             } else 
        
             { 
        
               if(mc) 
        
               { 
        
                 sel.vars <- list() 
        
                 for (i in groups) 
        
                 { 
        
                   sel.vars[[i]] <- names(groups)[groups == i] 
        
                 } 
        
               } else 
        
               { 
        
                 sel.vars <- names(raster.stack) 
        
               } 
        
             } 
        
             return(sel.vars) 
        
           }

This was referenced Jan 14, 2019

Soil properties #4

Closed

Niche modelleer vragen: #7

Closed

Validatie #2

Open

rvosa closed this as completed Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable selection #1

Variable selection #1

rvosa commented Jan 9, 2019

rvosa commented Feb 25, 2019

Variable selection #1

Variable selection #1

Comments

rvosa commented Jan 9, 2019

rvosa commented Feb 25, 2019