Skip to content

Commit

Permalink
Adding foldID
Browse files Browse the repository at this point in the history
  • Loading branch information
rvalavi committed Sep 18, 2018
1 parent 1ee9b57 commit 112ef57
Show file tree
Hide file tree
Showing 6 changed files with 43 additions and 31 deletions.
18 changes: 12 additions & 6 deletions R/blocking.R
Original file line number Diff line number Diff line change
Expand Up @@ -214,14 +214,15 @@ systematicNum <- function(layer, num=5){
#' Use spatial blocks to separate train and test folds
#'
#' This function creates spatially separated folds based on a pre-specified distance. It assigns blocks to the training and
#' testing folds either randomly or in a systematic manner. The distance (\code{theRange}) should be in \strong{metres},
#' testing folds randomly, systematically or in a checkerboard pattern. The distance (\code{theRange}) should be in \strong{metres},
#' regardless of the unit of the reference system of the input data (for more information see the details section). By default,
#' the function creates blocks according to the extent and shape of the study area, assuming that the user has considered the
#' landscape for the given species and case study.
#' Alternatively, blocks can be created based on species spatial data. This is especially useful when the
#' species data is not evenly dispersed in the whole region. Blocks can also be offset so the origin is not at the outer
#' corner of the rasters. Instead of providing a distance, the blocks can also be created by specifying a number of rows and
#' columns and divide the study area into vertical or horizontal bins, as presented in Wenger & Olden (2012) and Bahn & McGill (2012).
#' Finally, the blocks can be specified by a user-defined spatial polygon layer.
#'
#'
#' To keep the consistency, all the functions use \strong{metres} as their unit. In this function, when the input map
Expand Down Expand Up @@ -285,10 +286,11 @@ systematicNum <- function(layer, num=5){
#' @return An object of class S3. A list of objects including:
#' \itemize{
#' \item{folds - a list containing the folds. Each fold has two vectors with the training (first) and testing (second) indices}
#' \item{foldID - a vector of values indicating the number of the fold for each observation (each number corresponds to the same point in species data)}
#' \item{biomodTable - a matrix with the folds to be used in \pkg{biomod2} package}
#' \item{k - number of the folds}
#' \item{blocks - SpatialPolygon of the blocks}
#' \item{range - the distance band of separating trainig and testing folds}
#' \item{range - the distance band of separating trainig and testing folds, if provided}
#' \item{species - the name of the species (column), if provided}
#' \item{plots - ggplot object}
#' \item{records - a table with the number of points in each category of training and testing}
Expand Down Expand Up @@ -440,11 +442,13 @@ spatialBlock <- function(speciesData, species=NULL, blocks=NULL, rasterLayer=NUL
trainTestTable <- base::data.frame(train=rep(0, k), test=0)
}
foldList <- list()
foldNum <- rep(NA, nrow(speciesData))
biomodTable <- data.frame(RUN1=rep(TRUE, length(speciesData)))
for(p in 1:k){
sp.over <- sp::over(speciesData, subBlocks[subBlocks$folds==p, ]) # overlay layers to specify the inside & oudside points
trainSet <- which(is.na(sp.over[,1])) # exclude all the data from the bufer area
testSet <- which(!is.na(sp.over[,1]))
foldNum[testSet] <- p
foldList[[p]] <- assign(paste0("fold", p), list(trainSet, testSet))
if(!is.null(species)){
lnPrsences <- length(presences)
Expand Down Expand Up @@ -479,6 +483,7 @@ spatialBlock <- function(speciesData, species=NULL, blocks=NULL, rasterLayer=NUL
maxSD <- stats::sd(unlist(trainTestTable))
subBlocks2 <- subBlocks
foldList2 <- foldList
foldNum2 <- foldNum
biomodTable2 <- biomodTable
iter <- i
}
Expand All @@ -492,6 +497,7 @@ spatialBlock <- function(speciesData, species=NULL, blocks=NULL, rasterLayer=NUL
subBlocks <- subBlocks2
trainTestTable <- trainTestTable2
foldList <- foldList2
foldNum <- foldNum2
biomodTable <- biomodTable2
print(paste0("The best fold was in iteration ", iter, ":"))
# print(trainTestTable)
Expand Down Expand Up @@ -563,11 +569,11 @@ spatialBlock <- function(speciesData, species=NULL, blocks=NULL, rasterLayer=NUL
# save the objects
if(biomod2Format==TRUE){
biomodTable <- as.matrix(biomodTable)
theList <- list(folds=foldList, biomodTable=biomodTable, k=k, blocks=subBlocks, species=species, range=theRange,
plots=p2, records=trainTestTable)
theList <- list(folds=foldList, foldID=foldNum, biomodTable=biomodTable, k=k, blocks=subBlocks, species=species,
range=theRange, plots=p2, records=trainTestTable)
} else{
theList <- list(folds=foldList, biomodTable=NULL, k=k, blocks=subBlocks, species=species, range=theRange,
plots=p2, records=trainTestTable)
theList <- list(folds=foldList, foldID=foldNum, biomodTable=NULL, k=k, blocks=subBlocks, species=species,
range=theRange, plots=p2, records=trainTestTable)
}
class(theList) <- c("SpatialBlock")
return(theList)
Expand Down
7 changes: 5 additions & 2 deletions R/environBlock.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ normalize <- function(x){
#' @return An object of class S3. A list of objects including:
#' \itemize{
#' \item{folds - a list containing the folds. Each fold has two vectors with the training (first) and testing (second) indices}
#' \item{foldID - a vector of values indicating the number of the fold for each observation (each number corresponds to the same point in species data)}
#' \item{biomodTable - a matrix with the folds to be used in \pkg{biomod2} package}
#' \item{k - number of the folds}
#' \item{species - the name of the species (column), if provided}
Expand Down Expand Up @@ -98,6 +99,7 @@ envBlock <- function(rasterLayer, speciesData, species=NULL, k=5, standardizatio
if(methods::is(rasterLayer, 'Raster')){
if(raster::nlayers(rasterLayer) >= 1){
foldList <- list()
foldNum <- rep(NA, nrow(speciesData))
if(!is.null(species)){
presences <- speciesData[speciesData@data[,species]==1,] # creating a layer of presence data
trainTestTable <- base::data.frame(trainPr=rep(0, k), trainAb=0, testPr=0, testAb=0)
Expand Down Expand Up @@ -130,6 +132,7 @@ envBlock <- function(rasterLayer, speciesData, species=NULL, k=5, standardizatio
for(i in 1:k){
testSet <- which(speciesData@data$fold == i)
trainSet <- which(speciesData@data$fold != i)
foldNum[testSet] <- i
foldList[[i]] <- assign(paste0("fold", i), list(trainSet, testSet))
if(!is.null(species)){
lnPrsences <- length(presences)
Expand Down Expand Up @@ -182,9 +185,9 @@ envBlock <- function(rasterLayer, speciesData, species=NULL, k=5, standardizatio
}
if(biomod2Format==TRUE){
biomodTable <- as.matrix(biomodTable)
theList <- list(folds=foldList, biomodTable=biomodTable, k=k, species=species, records=trainTestTable)
theList <- list(folds=foldList, foldID=foldNum, biomodTable=biomodTable, k=k, species=species, records=trainTestTable)
} else{
theList <- list(folds=foldList, biomodTable=NULL, k=k, species=species, records=trainTestTable)
theList <- list(folds=foldList, foldID=foldNum, biomodTable=NULL, k=k, species=species, records=trainTestTable)
}
} else stop("'The raster layer is empty!'")
} else stop('The input file is not a valid R raster file')
Expand Down
1 change: 1 addition & 0 deletions man/envBlock.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions man/spatialBlock.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vignettes/BlockCV_for_SDM.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ rangeExplorer(rasterLayer = awt,
Note that the interactive plots cannot be shown here, as they require opening an external window or web browser. When using *rangeExplorer*, slide to the selected block size, and click **Apply Changes** to change the block size.

## Evaluating SDMs with block cross-validation: examples
In this section, we show how to use the folds generated by **blockCV** in the previous sections for the evaluation of species distribution models constructed on the species data available in the package. We show three modelling examples which cover both the use of presence-absence and presence-background methods.
In this section, we show how to use the folds generated by **blockCV** in the previous sections for the evaluation of species distribution models constructed on the species data available in the package. The **blockCV** stores training and testing folds in three different formats. The common format for all three blocking strategies is a list of the id of observations in each fold. For *spatialBlock* and *envBlock* the folds are also stored in a matrix format suitable for the **biomod2** package and lastly a vector of fold's number for each observation. This is equal the number of observation in species spatial data. These three formats are stored in the blocking objects as *folds*, *biomodTable* and *foldID* respectively. We show three modelling examples which cover both the use of presence-absence and presence-background methods.

### Evaluating presence-background models
#### maxnet
Expand Down
40 changes: 20 additions & 20 deletions vignettes/BlockCV_for_SDM.html

Large diffs are not rendered by default.

0 comments on commit 112ef57

Please sign in to comment.