Improvements to documentation

luismurao · Jul 19, 2024 · aeecc5c · aeecc5c
1 parent 0368155
commit aeecc5c
Show file tree

Hide file tree

Showing 29 changed files with 445 additions and 315 deletions.
diff --git a/R/00_classes.R b/R/00_classes.R
@@ -7,7 +7,8 @@
 #' S3 classes to organize data and results of \code{tenm} objects
 #' @importFrom methods new
 #' @author Luis Osorio-Olvera
-#' @return his object is a list comprising four elements: a) A data.frame
+#' @return
+#' This object is a list comprising four elements: a) A data.frame
 #' containing occurrence records and layer information. b) A character vector
 #' specifying variable names. c) A character vector indicating the names of
 #' longitude and latitude variables. d) A character vector denoting the

diff --git a/R/00_methods.R b/R/00_methods.R
@@ -1,23 +1,33 @@
 
-#' Predict the potential distribution of \pkg{tenm} based on specific time periods
-#' of environmental conditions or averages.
+#' Predict the potential distribution of species based on environmental
+#' conditions
 #' @importFrom methods new
 #' @docType methods
 #' @param object An object of class sp.temporal.selection
-#' @param model_variables A vector with variable names
+#' @param model_variables A character vector specifying the variable names used
+#' to buid the model.
 #' @param layers A SpatRaster object or a list where each element is a
 #' SpatRaster.
-#' @param layers_path Path to layers
-#' @param layers_ext Layers extension
-#' @param mve If the projection will use the minimum volume ellipsoid algorithm
-#' @param level Proportion of data used to fit the minimum volume ellipsoid
-#' @param output A character to indicate if model uses suitability values
-#' or Mahalanobis distances. Possible values are "suitability" and "mahalanobis"
+#' @param layers_path Path to the directory containing raster layers.
+#' @param layers_ext File extension of the raster layers.
+#' @param mve Logical indicating whether to use the minimum volume
+#' ellipsoid algorithm.
+#' @param level Proportion of data to include inside the ellipsoid
+#' if mve is \code{T}.
+#' @param output Character indicating if the model outputs "suitability" values
+#'  or "mahalanobis" distances.
 #' @param ... Additional parameters passed to
-#' \code{\link[tenm]{ellipsoid_projection}}
-#' @return Returns a SpatRaster of suitability values or Mahalanobis distances
-#' to niche center.
-#' @details Note that each SpatRaster in the 'layers' parameter should have the
+#' \code{\link[tenm]{ellipsoid_projection}}.
+#' @return A SpatRaster object representing predicted suitability values or
+#' Mahalanobis distances to niche center.
+#' @details
+#' This function predicts the potential distribution of a species based on
+#' environmental conditions represented by raster layers. The prediction is
+#' based on the model statistics and environmental variables specified in
+#' 'model_variables'. If 'mve' is \code{T}, the minimum volume ellipsoid algorithm
+#' is used to model the niche space. The output can be either "suitability",
+#' or "mahalanobis", indicating distance to the niche center.
+#' Note that each SpatRaster in the 'layers' parameter should have the
 #' same number of elements (layers) as 'model_variables'. The predict method
 #' assumes that variables in each SpatRaster correspond to those in
 #' 'model_variables'. If layers in the 'layers' parameter are given as a

diff --git a/R/PartialROC.R b/R/PartialROC.R
@@ -18,11 +18,11 @@
 #' @param n_iter Number of bootstrap iterations to perform for partial ROC
 #' calculations. Default is 1000.
 #' @param rseed Logical. Whether or not to set a random seed for
-#' reproducibility. Default is FALSE.
+#' reproducibility. Default is \code{F}.
 #' @param sub_sample Logical. Indicates whether to use a subsample of
 #'  the test data. Recommended for large datasets.
 #' @param sub_sample_size Size of the subsample to use for computing pROC
-#' values when sub_sample is TRUE.
+#' values when sub_sample is \code{T}.
 #' @return A list of two elements:
 #' - "pROC_summary": a data.frame containing the mean
 #'   AUC value, AUC ratio calculated for each iteration and the p-value of the

diff --git a/R/bg_by_date.R b/R/bg_by_date.R
@@ -1,22 +1,30 @@
 #' Function to obtain environmental background organized by date
 #' @description Function to retrieve background data from occurrence records.
 #' The background data is organized as a function of the dated
-#' environmental data
-#' @param this_species Species Temporal Environmental Data (sp.temporal.env)
-#' object from \code{\link[tenm]{ex_by_date}}.
-#' @param buffer_ngbs Number of pixel neighbors used to build the buffer.
-#' @param buffer_distance Distance used to create a buffer in which background
-#' data will be taken.
-#' @param n_bg Number of background points.
-#' @param process_ngbs_by Numeric. Estimates neighbor cells each x cells. This
-#' is for memory management.
-#' @return Returns an object of class sp.temporal.bg which is a list that
-#' contains a data.frame with longitude, latitude, year, layer_date, layer_path,
-#' cell_ids_year and environmental information.
-#' @details The buffer is built around the occurrences using a neighborhood
-#' distance.
+#' environmental data.
+#' @param this_species An object of class sp.temporal.env representing species
+#' occurrence data organized by date. See \code{\link[tenm]{ex_by_date}}.
+#' @param buffer_ngbs Number of pixel neighbors used to build the buffer around
+#' each occurrence point.
+#' @param buffer_distance Distance (in the same units as raster layers) used to
+#' create a buffer around occurrence points to sample background data.
+#' @param n_bg Number of background points to sample.
+#' @param process_ngbs_by Numeric parameter to improve memory management.
+#' It process neighbor cells by a quantity specified by the user.
+#' @return An object of class sp.temporal.bg containing background data
+#' organized by date. The object is a list with the following components:
+#'   - "bg_df": A data.frame with columns for longitude, latitude, year,
+#'     layer_date, layer_path, cell_ids_year, and environmental information.
+#'   - Other metadata relevant to background sampling.
+#' @details
+#' This function retrieves background data around species occurrence points,
+#' sampled based on the dated environmental data provided in `this_species`.
+#' Background points are sampled within a buffer around each occurrence point.
+#' The function returns an object of class sp.temporal.bg, which contains
+#' background data organized by date. This object is the input of the function
+#' \code{\link[tenm]{tenm_selection}}.
 #' @examples
-#' \dontrun{
+#' \donttest{
 #' library(tenm)
 #' data("abronia")
 #' tempora_layers_dir <- system.file("extdata/bio",package = "tenm")

diff --git a/R/cells2samp.R b/R/cells2samp.R
@@ -1,21 +1,25 @@
 #' Helper function to randomly select cell IDs for generating
 #' environmental background data.
-#' @description Returns pixel IDs to be sample for generating
-#' environmental background data.
-#' @param data A data.frame with longitude and latitude data.
-#' @param longitude A character vector of the column name of longitude.
-#' @param latitude A character vector of the column name of latitude.
-#' @param cell_ids A numeric vector indicating the IDs of the cells that
-#' be used as the geographic centers of the buffers. The default values NULL.
-#' @param buffer_ngbs Number of pixel neighbors around occurrences to be used
-#' to build the buffer.
-#' @param raster_mask An object of class SpatRaster that will be used to
-#' obtain pixel IDs.
-#' @param process_ngbs_by Numeric. Estimates neighbor cells each x cells. This
-#' is for memory management.
-#' @param n_bg Number of background pixels.
-#' @param progress Logical. Show computation progress.
-#' @return A numeric vector with the IDs of cells to be sampled.
+#' @description
+#' This function returns pixel IDs to be sampled for generating environmental
+#' background data around species occurrence points.
+#' @param data A data.frame containing longitude and latitude data of occurrence
+#' points.
+#' @param longitude A character vector specifying the column name of longitude
+#' in 'data'.
+#' @param latitude A character vector specifying the column name of latitude
+#' in 'data'.
+#' @param cell_ids A numeric vector indicating the IDs of cells that serve as
+#' geographic centers for buffers. Default is NULL.
+#' @param buffer_ngbs Number of neighboring pixels around occurrence points
+#' used to build the buffer for sampling.
+#' @param raster_mask An object of class SpatRaster used to obtain pixel IDs.
+#' @param process_ngbs_by Numeric parameter to improve memory management.
+#' It process neighbor cells by a quantity specified by the user.
+#' @param n_bg Number of background pixels to sample.
+#' @param progress Logical. If \code{T}, show computation progress.
+#' @return A numeric vector of cell IDs to be sampled for environmental
+#' background data.
 #' @examples
 #' \donttest{
 #' # cells to sample

diff --git a/R/clean_dup.R b/R/clean_dup.R
@@ -1,25 +1,33 @@
 #' Function to thin longitude and latitude data
-#' @description Thin duplicated or redundant occurrence records that
-#' present overlapping longitude and latitude coordinates. The user can thin
-#' occurrences using a geographical distance threshold or by a pixel
-#' neighborhood.
-#' @param data A data.frame with longitude and latitude of occurrence records
+#' @description
+#' Cleans up duplicated or redundant occurrence records that present overlapping
+#' longitude and latitude coordinates. Thinning can be performed using either a
+#' geographical distance threshold or a pixel neighborhood approach.
+#' @param data A data.frame with longitude and latitude of occurrence records.
 #' @param longitude A character vector indicating the column name of the
 #' "longitude" variable.
 #' @param latitude A character vector indicating the column name of the
 #' "latitude" variable.
-#' @param threshold A numeric value representing the euclidean distance between
-#' coordinates to be considered as a duplicate.
-#' @param by_mask Logical. If TRUE data thinning will be done
-#' using a raster layer as a mask.
-#' @param raster_mask An object of class SpatRaster that will be used a
-#' reference to identify duplicates.
+#' @param threshold A numeric value representing the distance threshold between
+#' coordinates to be considered duplicates. Units depend on whether
+#' `by_mask` is \code{T} or \code{F}. If \code{T}, the user needs to specify the number
+#' of pixels that define the neighborhood of duplicates (see n_ngbs parameter).
+#' @param by_mask Logical. If \code{T}, the thinning process will use a raster layer
+#' as a mask for defining distance in pixel units.
+#' @param raster_mask An object of class SpatRaster that serves as a reference
+#' to thin the occurrence data. Required if `by_mask` is \code{T}.
 #' @param n_ngbs Number of pixels used to define the neighborhood matrix that
-#' helps to determine which occurrences are duplicates.
-#' - A value of 0 removes occurrences within the same pixel, keeping one.
-#' - A value of 1 considers as duplicates all occurrences within a
-#' distance of one pixel.
-#' @return Returns a data.frame with coordinate data from a species
+#' helps determine which occurrences are duplicates:
+#'   - 0 removes occurrences within the same pixel, keeping one.
+#'   - 1 considers duplicates all occurrences within a distance of one pixel.
+#'   - n considers duplicates all occurrences within a distance of n pixels.
+#' @return Returns a data.frame with cleaned occurrence records, excluding
+#' duplicates based on the specified criteria.
+#' @details
+#' This function cleans up duplicated occurrences based on the specified
+#' distance threshold. If `by_mask` is \code{T}, the distance is interpreted as
+#' pixel distance using the provided raster_mask; otherwise, it is interpreted
+#' as geographic distance.
 #' @examples
 #' data(abronia)
 #' tempora_layers_dir <- system.file("extdata/bio",package = "tenm")

diff --git a/R/clean_dup_by_date.R b/R/clean_dup_by_date.R
@@ -1,26 +1,33 @@
 #' Function to thin occurrence data
-#' @description Cleans up duplicated longitude and latitude data by year using a
-#' specified distance threshold. The distance can be specified as a geographic
-#' distance or, if a raster_mask is provided, as a pixel distance.
-#' @param this_species Species Temporal Data object
-#' see \code{\link[tenm]{sp_temporal_data}}.
-#' @param threshold A numeric value representing the distance
-#' between coordinates to be considered as a duplicate.
-#' @param by_mask Logical. If TRUE the thinning process will be done
-#' using a raster layer as a mask.
-#' @param raster_mask An object of class SpatRaster that will be used as
-#' reference to thin the data.
+#' Cleans up duplicated longitude and latitude data by year using a specified
+#' distance threshold. The distance can be specified as a geographic distance
+#' or, if a raster_mask is provided, as a pixel distance.
+#' @param this_species An object of class sp.temporal.modeling representing
+#' species occurrence data organized by date.
+#' See \code{\link[tenm]{sp_temporal_data}}.
+#' @param threshold A numeric value representing the distance threshold between
+#' coordinates to be considered duplicates. Units depend on whether
+#' `by_mask` is \code{T} or \code{F}. If \code{T}, the user needs to specify the number
+#' of pixels that define the neighborhood of duplicates (see n_ngbs parameter).
+#' @param by_mask Logical. If \code{T}, the thinning process will use a raster layer
+#' as a mask for defining distance in pixel units.
+#' @param raster_mask An object of class SpatRaster that serves as a reference
+#' to thin the occurrence data. Required if `by_mask` is \code{T}.
 #' @param n_ngbs Number of pixels used to define the neighborhood matrix that
-#' helps to determine which occurrences are duplicates.
-#' - A value of 0 removes occurrences within the same pixel, keeping one.
-#' - A value of 1 considers as duplicates all occurrences within a
-#' distance of one pixel.
-#' @return A sp.temporal.modeling object that contains a temporal data.frame.
-#' This table has five columns: longitude, latitude, date variable,
-#' layers_dates and layers_path.
+#' helps determine which occurrences are duplicates:
+#'   - 0 removes occurrences within the same pixel, keeping one.
+#'   - 1 considers duplicates all occurrences within a distance of one pixel.
+#'   - n considers duplicates all occurrences within a distance of n pixels.
+#' @return An object of class sp.temporal.modeling containing a temporal
+#' data.frame with cleaned occurrence data, including columns for
+#'  longitude, latitude, date variable, layers_dates, and layers_path.
 #'
-#' @details This function is build on the basis of
-#' \code{\link[tenm]{clean_dup}}. See the help of the function for more examples
+#' @details
+#' This function is based on \code{\link[tenm]{clean_dup}}. It cleans up
+#' duplicated occurrences based on the specified threshold. If `by_mask`
+#' is \code{T}, the distance is interpreted as pixel distance using the provided
+#' raster_mask; otherwise, it is interpreted as geographic distance.
+
 #' @examples
 #' library(tenm)
 #' data("abronia")
@@ -63,7 +70,7 @@ clean_dup_by_date <- function(this_species,threshold,by_mask = FALSE,
   df_occs_date <- this_species$temporal_df
   df_occs_dateL <- split(df_occs_date,df_occs_date$layers_path,drop=T)
   clean_by_date <- seq_along(df_occs_dateL) |>
-    purrr::map_df(function(x){
+    furrr::future_map_dfr(function(x){
       dd <- tenm::clean_dup(data = df_occs_dateL[[x]],
                             longitude = this_species$lon_lat_vars[1],
                             latitude = this_species$lon_lat_vars[2],
@@ -72,7 +79,11 @@ clean_dup_by_date <- function(this_species,threshold,by_mask = FALSE,
                             raster_mask = raster_mask,
                             n_ngbs = n_ngbs)
       return(dd)
-    })
+    },.progress = TRUE,
+    .options = furrr::furrr_options(globals = c("this_species",
+                                                "df_occs_date",
+                                                "df_occs_dateL"),
+                                    seed = NULL))
 
   sp.temp.data.clean <- list(temporal_df = clean_by_date,
                              sp_date_var = this_species$sp_date_var,

diff --git a/R/correlation_finder.R b/R/correlation_finder.R
@@ -1,14 +1,24 @@
-#' Function to find out strong correlations in a correlation matrix
-#' @description The function finds out which variables have strong
-#' correlations according to a correlation threshold.
-#' @param environmental_data A matrix or a data.frame of environmental data
-#' @param method A method to estimate correlation matrix. Possible options are
-#' "spearman", "pearson" or "kendall".
-#' @param threshold Correlation value used to filter variables.
-#' @param verbose Verbose output.
-#' @return Returns a list of two elements: the first is a vector with the names
-#' of not correlated variables; the second is a list with the correlation values
-#' of all variables.
+#' Function to find strong correlations within environmental predictors
+#' @description
+#' This function identifies variables with strong correlations based on a
+#' specified threshold.
+#' @param environmental_data A matrix or data.frame containing
+#' environmental data.
+#' @param method Method used to estimate the correlation matrix. Possible
+#' options include "spearman" (Spearman's rank correlation),
+#' "pearson" (Pearson's correlation),
+#' or "kendall" (Kendall's tau correlation).
+#' @param threshold Correlation threshold value. Variables with absolute
+#' correlation values greater than or equal to this threshold are considered
+#' strongly correlated.
+#' @param verbose Logical. If \code{T}, prints verbose output detailing
+#' correlations.
+#' @return A list with two elements:
+#'   - `not_correlated_vars`: A vector containing names of variables that are
+#'      not strongly correlated.
+#'   - `correlation_values`: A list with correlation values for all pairs of
+#'      variables.
+
 #' @export
 #' @examples
 #' \donttest{
@@ -92,7 +102,6 @@ correlation_finder <- function(environmental_data,method="spearman",threshold,
         print(list_cor[[i]])
         cat('----------------------------------------------------------------\n\n')
       }
-      return()
     }
     return(list(descriptors=descriptors,list_cor=list_cor))
   }

diff --git a/R/cov_center.R b/R/cov_center.R
@@ -1,23 +1,25 @@
 
-#' Function to compute the covariance matrix that defines an ellipsoid
-#' niche model.
-#' @description Function to compute the covariance matrix, the niche centroid
-#' and volume of an ellipsoid model. It uses the values of the niche variables
-#' of the occurrences points.
-#' @param data A data.frame or a matrix with the numeric values of the variables
-#' that will be used to model the niche.
-#' @param mve A logical value. If TRUE a minimum volume ellipsoid will be
-#' computed using
-#' the function \code{\link[MASS]{cov.mve}} of the \pkg{MASS} package. If FALSE
-#' the covariance matrix of the input data will be used.
-#' @param level A numerical value specifying the proportion of the data to be
-#' used to compute the ellipsoid.
-#' @param vars A numeric or a string vector specifying the columns indexes/names
-#' of the variables of the input data which will be used to fit the ellipsoid
-#' model.
-#' @return Returns a list containing the centroid of the ellipsoid, the
-#' covariance matrix based on the input data, ellipsoid volume, semi-axis length
-#'  and axis coordinates.
+#' Function to compute the covariance matrix of an ellipsoid niche model.
+#' @description
+#' Computes the covariance matrix, niche centroid, volume, and other
+#' ellipsoid parameter based on the values of niche variables from
+#' occurrence points.
+#' @param data A data.frame or matrix containing numeric values of variables
+#'   used to model the niche.
+#' @param mve Logical. If \code{T}, computes a minimum volume ellipsoid using
+#'   the \code{\link[MASS]{cov.mve}} function from the MASS package. If \code{F},
+#'   uses the covariance matrix of the input data.
+#' @param level Proportion of data to be used for computing the ellipsoid,
+#'   applicable when mve is \code{T}.
+#' @param vars Vector specifying column indexes or names of variables in
+#' the input data used to fit the ellipsoid model.
+#' @return A list containing the following components:
+#'   - `centroid`: Centroid (mean vector) of the ellipsoid.
+#'   - `covariance_matrix`: Covariance matrix based on the input data.
+#'   - `volume`: Volume of the ellipsoid.
+#'   - `semi_axes_lengths`: Lengths of semi-axes of the ellipsoid.
+#'   - `axis_coordinates`: Coordinates of ellipsoid axes.
+
 #' @examples
 #' \donttest{
 #' library(tenm)