Skip to content

Commit

Permalink
Improvements to documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
luismurao committed Jul 19, 2024
1 parent 0368155 commit aeecc5c
Show file tree
Hide file tree
Showing 29 changed files with 445 additions and 315 deletions.
3 changes: 2 additions & 1 deletion R/00_classes.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
#' S3 classes to organize data and results of \code{tenm} objects
#' @importFrom methods new
#' @author Luis Osorio-Olvera
#' @return his object is a list comprising four elements: a) A data.frame
#' @return
#' This object is a list comprising four elements: a) A data.frame
#' containing occurrence records and layer information. b) A character vector
#' specifying variable names. c) A character vector indicating the names of
#' longitude and latitude variables. d) A character vector denoting the
Expand Down
36 changes: 23 additions & 13 deletions R/00_methods.R
Original file line number Diff line number Diff line change
@@ -1,23 +1,33 @@

#' Predict the potential distribution of \pkg{tenm} based on specific time periods
#' of environmental conditions or averages.
#' Predict the potential distribution of species based on environmental
#' conditions
#' @importFrom methods new
#' @docType methods
#' @param object An object of class sp.temporal.selection
#' @param model_variables A vector with variable names
#' @param model_variables A character vector specifying the variable names used
#' to buid the model.
#' @param layers A SpatRaster object or a list where each element is a
#' SpatRaster.
#' @param layers_path Path to layers
#' @param layers_ext Layers extension
#' @param mve If the projection will use the minimum volume ellipsoid algorithm
#' @param level Proportion of data used to fit the minimum volume ellipsoid
#' @param output A character to indicate if model uses suitability values
#' or Mahalanobis distances. Possible values are "suitability" and "mahalanobis"
#' @param layers_path Path to the directory containing raster layers.
#' @param layers_ext File extension of the raster layers.
#' @param mve Logical indicating whether to use the minimum volume
#' ellipsoid algorithm.
#' @param level Proportion of data to include inside the ellipsoid
#' if mve is \code{T}.
#' @param output Character indicating if the model outputs "suitability" values
#' or "mahalanobis" distances.
#' @param ... Additional parameters passed to
#' \code{\link[tenm]{ellipsoid_projection}}
#' @return Returns a SpatRaster of suitability values or Mahalanobis distances
#' to niche center.
#' @details Note that each SpatRaster in the 'layers' parameter should have the
#' \code{\link[tenm]{ellipsoid_projection}}.
#' @return A SpatRaster object representing predicted suitability values or
#' Mahalanobis distances to niche center.
#' @details
#' This function predicts the potential distribution of a species based on
#' environmental conditions represented by raster layers. The prediction is
#' based on the model statistics and environmental variables specified in
#' 'model_variables'. If 'mve' is \code{T}, the minimum volume ellipsoid algorithm
#' is used to model the niche space. The output can be either "suitability",
#' or "mahalanobis", indicating distance to the niche center.
#' Note that each SpatRaster in the 'layers' parameter should have the
#' same number of elements (layers) as 'model_variables'. The predict method
#' assumes that variables in each SpatRaster correspond to those in
#' 'model_variables'. If layers in the 'layers' parameter are given as a
Expand Down
4 changes: 2 additions & 2 deletions R/PartialROC.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@
#' @param n_iter Number of bootstrap iterations to perform for partial ROC
#' calculations. Default is 1000.
#' @param rseed Logical. Whether or not to set a random seed for
#' reproducibility. Default is FALSE.
#' reproducibility. Default is \code{F}.
#' @param sub_sample Logical. Indicates whether to use a subsample of
#' the test data. Recommended for large datasets.
#' @param sub_sample_size Size of the subsample to use for computing pROC
#' values when sub_sample is TRUE.
#' values when sub_sample is \code{T}.
#' @return A list of two elements:
#' - "pROC_summary": a data.frame containing the mean
#' AUC value, AUC ratio calculated for each iteration and the p-value of the
Expand Down
38 changes: 23 additions & 15 deletions R/bg_by_date.R
Original file line number Diff line number Diff line change
@@ -1,22 +1,30 @@
#' Function to obtain environmental background organized by date
#' @description Function to retrieve background data from occurrence records.
#' The background data is organized as a function of the dated
#' environmental data
#' @param this_species Species Temporal Environmental Data (sp.temporal.env)
#' object from \code{\link[tenm]{ex_by_date}}.
#' @param buffer_ngbs Number of pixel neighbors used to build the buffer.
#' @param buffer_distance Distance used to create a buffer in which background
#' data will be taken.
#' @param n_bg Number of background points.
#' @param process_ngbs_by Numeric. Estimates neighbor cells each x cells. This
#' is for memory management.
#' @return Returns an object of class sp.temporal.bg which is a list that
#' contains a data.frame with longitude, latitude, year, layer_date, layer_path,
#' cell_ids_year and environmental information.
#' @details The buffer is built around the occurrences using a neighborhood
#' distance.
#' environmental data.
#' @param this_species An object of class sp.temporal.env representing species
#' occurrence data organized by date. See \code{\link[tenm]{ex_by_date}}.
#' @param buffer_ngbs Number of pixel neighbors used to build the buffer around
#' each occurrence point.
#' @param buffer_distance Distance (in the same units as raster layers) used to
#' create a buffer around occurrence points to sample background data.
#' @param n_bg Number of background points to sample.
#' @param process_ngbs_by Numeric parameter to improve memory management.
#' It process neighbor cells by a quantity specified by the user.
#' @return An object of class sp.temporal.bg containing background data
#' organized by date. The object is a list with the following components:
#' - "bg_df": A data.frame with columns for longitude, latitude, year,
#' layer_date, layer_path, cell_ids_year, and environmental information.
#' - Other metadata relevant to background sampling.
#' @details
#' This function retrieves background data around species occurrence points,
#' sampled based on the dated environmental data provided in `this_species`.
#' Background points are sampled within a buffer around each occurrence point.
#' The function returns an object of class sp.temporal.bg, which contains
#' background data organized by date. This object is the input of the function
#' \code{\link[tenm]{tenm_selection}}.
#' @examples
#' \dontrun{
#' \donttest{
#' library(tenm)
#' data("abronia")
#' tempora_layers_dir <- system.file("extdata/bio",package = "tenm")
Expand Down
36 changes: 20 additions & 16 deletions R/cells2samp.R
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
#' Helper function to randomly select cell IDs for generating
#' environmental background data.
#' @description Returns pixel IDs to be sample for generating
#' environmental background data.
#' @param data A data.frame with longitude and latitude data.
#' @param longitude A character vector of the column name of longitude.
#' @param latitude A character vector of the column name of latitude.
#' @param cell_ids A numeric vector indicating the IDs of the cells that
#' be used as the geographic centers of the buffers. The default values NULL.
#' @param buffer_ngbs Number of pixel neighbors around occurrences to be used
#' to build the buffer.
#' @param raster_mask An object of class SpatRaster that will be used to
#' obtain pixel IDs.
#' @param process_ngbs_by Numeric. Estimates neighbor cells each x cells. This
#' is for memory management.
#' @param n_bg Number of background pixels.
#' @param progress Logical. Show computation progress.
#' @return A numeric vector with the IDs of cells to be sampled.
#' @description
#' This function returns pixel IDs to be sampled for generating environmental
#' background data around species occurrence points.
#' @param data A data.frame containing longitude and latitude data of occurrence
#' points.
#' @param longitude A character vector specifying the column name of longitude
#' in 'data'.
#' @param latitude A character vector specifying the column name of latitude
#' in 'data'.
#' @param cell_ids A numeric vector indicating the IDs of cells that serve as
#' geographic centers for buffers. Default is NULL.
#' @param buffer_ngbs Number of neighboring pixels around occurrence points
#' used to build the buffer for sampling.
#' @param raster_mask An object of class SpatRaster used to obtain pixel IDs.
#' @param process_ngbs_by Numeric parameter to improve memory management.
#' It process neighbor cells by a quantity specified by the user.
#' @param n_bg Number of background pixels to sample.
#' @param progress Logical. If \code{T}, show computation progress.
#' @return A numeric vector of cell IDs to be sampled for environmental
#' background data.
#' @examples
#' \donttest{
#' # cells to sample
Expand Down
40 changes: 24 additions & 16 deletions R/clean_dup.R
Original file line number Diff line number Diff line change
@@ -1,25 +1,33 @@
#' Function to thin longitude and latitude data
#' @description Thin duplicated or redundant occurrence records that
#' present overlapping longitude and latitude coordinates. The user can thin
#' occurrences using a geographical distance threshold or by a pixel
#' neighborhood.
#' @param data A data.frame with longitude and latitude of occurrence records
#' @description
#' Cleans up duplicated or redundant occurrence records that present overlapping
#' longitude and latitude coordinates. Thinning can be performed using either a
#' geographical distance threshold or a pixel neighborhood approach.
#' @param data A data.frame with longitude and latitude of occurrence records.
#' @param longitude A character vector indicating the column name of the
#' "longitude" variable.
#' @param latitude A character vector indicating the column name of the
#' "latitude" variable.
#' @param threshold A numeric value representing the euclidean distance between
#' coordinates to be considered as a duplicate.
#' @param by_mask Logical. If TRUE data thinning will be done
#' using a raster layer as a mask.
#' @param raster_mask An object of class SpatRaster that will be used a
#' reference to identify duplicates.
#' @param threshold A numeric value representing the distance threshold between
#' coordinates to be considered duplicates. Units depend on whether
#' `by_mask` is \code{T} or \code{F}. If \code{T}, the user needs to specify the number
#' of pixels that define the neighborhood of duplicates (see n_ngbs parameter).
#' @param by_mask Logical. If \code{T}, the thinning process will use a raster layer
#' as a mask for defining distance in pixel units.
#' @param raster_mask An object of class SpatRaster that serves as a reference
#' to thin the occurrence data. Required if `by_mask` is \code{T}.
#' @param n_ngbs Number of pixels used to define the neighborhood matrix that
#' helps to determine which occurrences are duplicates.
#' - A value of 0 removes occurrences within the same pixel, keeping one.
#' - A value of 1 considers as duplicates all occurrences within a
#' distance of one pixel.
#' @return Returns a data.frame with coordinate data from a species
#' helps determine which occurrences are duplicates:
#' - 0 removes occurrences within the same pixel, keeping one.
#' - 1 considers duplicates all occurrences within a distance of one pixel.
#' - n considers duplicates all occurrences within a distance of n pixels.
#' @return Returns a data.frame with cleaned occurrence records, excluding
#' duplicates based on the specified criteria.
#' @details
#' This function cleans up duplicated occurrences based on the specified
#' distance threshold. If `by_mask` is \code{T}, the distance is interpreted as
#' pixel distance using the provided raster_mask; otherwise, it is interpreted
#' as geographic distance.
#' @examples
#' data(abronia)
#' tempora_layers_dir <- system.file("extdata/bio",package = "tenm")
Expand Down
55 changes: 33 additions & 22 deletions R/clean_dup_by_date.R
Original file line number Diff line number Diff line change
@@ -1,26 +1,33 @@
#' Function to thin occurrence data
#' @description Cleans up duplicated longitude and latitude data by year using a
#' specified distance threshold. The distance can be specified as a geographic
#' distance or, if a raster_mask is provided, as a pixel distance.
#' @param this_species Species Temporal Data object
#' see \code{\link[tenm]{sp_temporal_data}}.
#' @param threshold A numeric value representing the distance
#' between coordinates to be considered as a duplicate.
#' @param by_mask Logical. If TRUE the thinning process will be done
#' using a raster layer as a mask.
#' @param raster_mask An object of class SpatRaster that will be used as
#' reference to thin the data.
#' Cleans up duplicated longitude and latitude data by year using a specified
#' distance threshold. The distance can be specified as a geographic distance
#' or, if a raster_mask is provided, as a pixel distance.
#' @param this_species An object of class sp.temporal.modeling representing
#' species occurrence data organized by date.
#' See \code{\link[tenm]{sp_temporal_data}}.
#' @param threshold A numeric value representing the distance threshold between
#' coordinates to be considered duplicates. Units depend on whether
#' `by_mask` is \code{T} or \code{F}. If \code{T}, the user needs to specify the number
#' of pixels that define the neighborhood of duplicates (see n_ngbs parameter).
#' @param by_mask Logical. If \code{T}, the thinning process will use a raster layer
#' as a mask for defining distance in pixel units.
#' @param raster_mask An object of class SpatRaster that serves as a reference
#' to thin the occurrence data. Required if `by_mask` is \code{T}.
#' @param n_ngbs Number of pixels used to define the neighborhood matrix that
#' helps to determine which occurrences are duplicates.
#' - A value of 0 removes occurrences within the same pixel, keeping one.
#' - A value of 1 considers as duplicates all occurrences within a
#' distance of one pixel.
#' @return A sp.temporal.modeling object that contains a temporal data.frame.
#' This table has five columns: longitude, latitude, date variable,
#' layers_dates and layers_path.
#' helps determine which occurrences are duplicates:
#' - 0 removes occurrences within the same pixel, keeping one.
#' - 1 considers duplicates all occurrences within a distance of one pixel.
#' - n considers duplicates all occurrences within a distance of n pixels.
#' @return An object of class sp.temporal.modeling containing a temporal
#' data.frame with cleaned occurrence data, including columns for
#' longitude, latitude, date variable, layers_dates, and layers_path.
#'
#' @details This function is build on the basis of
#' \code{\link[tenm]{clean_dup}}. See the help of the function for more examples
#' @details
#' This function is based on \code{\link[tenm]{clean_dup}}. It cleans up
#' duplicated occurrences based on the specified threshold. If `by_mask`
#' is \code{T}, the distance is interpreted as pixel distance using the provided
#' raster_mask; otherwise, it is interpreted as geographic distance.

#' @examples
#' library(tenm)
#' data("abronia")
Expand Down Expand Up @@ -63,7 +70,7 @@ clean_dup_by_date <- function(this_species,threshold,by_mask = FALSE,
df_occs_date <- this_species$temporal_df
df_occs_dateL <- split(df_occs_date,df_occs_date$layers_path,drop=T)
clean_by_date <- seq_along(df_occs_dateL) |>
purrr::map_df(function(x){
furrr::future_map_dfr(function(x){
dd <- tenm::clean_dup(data = df_occs_dateL[[x]],
longitude = this_species$lon_lat_vars[1],
latitude = this_species$lon_lat_vars[2],
Expand All @@ -72,7 +79,11 @@ clean_dup_by_date <- function(this_species,threshold,by_mask = FALSE,
raster_mask = raster_mask,
n_ngbs = n_ngbs)
return(dd)
})
},.progress = TRUE,
.options = furrr::furrr_options(globals = c("this_species",
"df_occs_date",
"df_occs_dateL"),
seed = NULL))

sp.temp.data.clean <- list(temporal_df = clean_by_date,
sp_date_var = this_species$sp_date_var,
Expand Down
33 changes: 21 additions & 12 deletions R/correlation_finder.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,24 @@
#' Function to find out strong correlations in a correlation matrix
#' @description The function finds out which variables have strong
#' correlations according to a correlation threshold.
#' @param environmental_data A matrix or a data.frame of environmental data
#' @param method A method to estimate correlation matrix. Possible options are
#' "spearman", "pearson" or "kendall".
#' @param threshold Correlation value used to filter variables.
#' @param verbose Verbose output.
#' @return Returns a list of two elements: the first is a vector with the names
#' of not correlated variables; the second is a list with the correlation values
#' of all variables.
#' Function to find strong correlations within environmental predictors
#' @description
#' This function identifies variables with strong correlations based on a
#' specified threshold.
#' @param environmental_data A matrix or data.frame containing
#' environmental data.
#' @param method Method used to estimate the correlation matrix. Possible
#' options include "spearman" (Spearman's rank correlation),
#' "pearson" (Pearson's correlation),
#' or "kendall" (Kendall's tau correlation).
#' @param threshold Correlation threshold value. Variables with absolute
#' correlation values greater than or equal to this threshold are considered
#' strongly correlated.
#' @param verbose Logical. If \code{T}, prints verbose output detailing
#' correlations.
#' @return A list with two elements:
#' - `not_correlated_vars`: A vector containing names of variables that are
#' not strongly correlated.
#' - `correlation_values`: A list with correlation values for all pairs of
#' variables.

#' @export
#' @examples
#' \donttest{
Expand Down Expand Up @@ -92,7 +102,6 @@ correlation_finder <- function(environmental_data,method="spearman",threshold,
print(list_cor[[i]])
cat('----------------------------------------------------------------\n\n')
}
return()
}
return(list(descriptors=descriptors,list_cor=list_cor))
}
Expand Down
40 changes: 21 additions & 19 deletions R/cov_center.R
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@

#' Function to compute the covariance matrix that defines an ellipsoid
#' niche model.
#' @description Function to compute the covariance matrix, the niche centroid
#' and volume of an ellipsoid model. It uses the values of the niche variables
#' of the occurrences points.
#' @param data A data.frame or a matrix with the numeric values of the variables
#' that will be used to model the niche.
#' @param mve A logical value. If TRUE a minimum volume ellipsoid will be
#' computed using
#' the function \code{\link[MASS]{cov.mve}} of the \pkg{MASS} package. If FALSE
#' the covariance matrix of the input data will be used.
#' @param level A numerical value specifying the proportion of the data to be
#' used to compute the ellipsoid.
#' @param vars A numeric or a string vector specifying the columns indexes/names
#' of the variables of the input data which will be used to fit the ellipsoid
#' model.
#' @return Returns a list containing the centroid of the ellipsoid, the
#' covariance matrix based on the input data, ellipsoid volume, semi-axis length
#' and axis coordinates.
#' Function to compute the covariance matrix of an ellipsoid niche model.
#' @description
#' Computes the covariance matrix, niche centroid, volume, and other
#' ellipsoid parameter based on the values of niche variables from
#' occurrence points.
#' @param data A data.frame or matrix containing numeric values of variables
#' used to model the niche.
#' @param mve Logical. If \code{T}, computes a minimum volume ellipsoid using
#' the \code{\link[MASS]{cov.mve}} function from the MASS package. If \code{F},
#' uses the covariance matrix of the input data.
#' @param level Proportion of data to be used for computing the ellipsoid,
#' applicable when mve is \code{T}.
#' @param vars Vector specifying column indexes or names of variables in
#' the input data used to fit the ellipsoid model.
#' @return A list containing the following components:
#' - `centroid`: Centroid (mean vector) of the ellipsoid.
#' - `covariance_matrix`: Covariance matrix based on the input data.
#' - `volume`: Volume of the ellipsoid.
#' - `semi_axes_lengths`: Lengths of semi-axes of the ellipsoid.
#' - `axis_coordinates`: Coordinates of ellipsoid axes.

#' @examples
#' \donttest{
#' library(tenm)
Expand Down
Loading

0 comments on commit aeecc5c

Please sign in to comment.