Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/rm sp group lines #54

Merged
merged 23 commits into from Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
205 changes: 112 additions & 93 deletions R/group_lines.R
@@ -1,82 +1,93 @@
#' Group Lines
#'
#' \code{group_lines} groups rows into spatial groups by creating trajectories
#' and grouping based on spatial overlap. The function accepts a
#' \code{data.table} with relocation data, individual identifiers and a
#' \code{threshold}. The relocation data is transformed into \code{SpatialLines}
#' and overlapping \code{SpatialLines} are grouped. The \code{threshold}
#' argument is used to specify the criteria for distance between lines.
#' Relocation data should be in two columns representing the X and Y
#' coordinates.
#' `group_lines` groups rows into spatial groups by generating LINESTRINGs and
#' grouping based on spatial intersection. The function accepts a `data.table`
#' with relocation data, individual identifiers and a distance threshold. The
#' relocation data is transformed into sf LINESTRINGs using [build_lines] and
#' intersecting LINESTRINGs are grouped. The threshold argument is used to
#' specify the distance criteria for grouping. Relocation data should be in two
#' columns representing the X and Y coordinates.
#'
#' The \code{DT} must be a \code{data.table}. If your data is a
#' \code{data.frame}, you can convert it by reference using
#' \code{\link[data.table:setDT]{data.table::setDT}}.
#' ## R-spatial evolution
#'
#' The \code{id}, \code{coords}, \code{sortBy} (and optional \code{timegroup}
#' and \code{splitBy}) arguments expect the names of respective columns in
#' \code{DT} which correspond to the individual identifier, X and Y coordinates,
#' sorting, timegroup (generated by \code{group_times}) and additional grouping
#' Please note, spatsoc has followed updates from R spatial, GDAL and PROJ for
#' handling projections, see more at
#' <https://r-spatial.org/r/2020/03/17/wkt.html>.
#'
#' In addition, `group_lines` (and [build_lines]) previously used
#' [sp::SpatialLines], [rgeos::gIntersects], [rgeos::gBuffer] but have been
#' updated to use [sf::st_as_sf], [sf::st_linestring], [sf::st_intersects], and
#' [sf::st_buffer] according to the R-spatial evolution, see more at
#' <https://r-spatial.org/r/2022/04/12/evolution.html>.
#'
#' ## Notes on arguments
#' The `DT` must be a `data.table`. If your data is a
#' `data.frame`, you can convert it by reference using [data.table::setDT].
#'
#' The `id`, `coords`, `sortBy` (and optional `timegroup`
#' and `splitBy`) arguments expect the names of respective columns in
#' `DT` which correspond to the individual identifier, X and Y coordinates,
#' sorting, timegroup (generated by [group_times]) and additional grouping
#' columns.
#'
#' The \code{projection} argument expects a character string defining
#' the EPSG code. For example, for UTM zone 36N (EPSG 32736), the projection
#' argument is "EPSG:32736". See \url{https://spatialreference.org}
#' for a list of EPSG codes. Please note, R spatial has followed updates
#' to GDAL and PROJ for handling projections, see more at
#' \url{https://r-spatial.org/r/2020/03/17/wkt.html}.
#' The `projection` argument expects a numeric or character defining the
#' coordinate reference system. For example, for UTM zone 36N (EPSG 32736), the
#' projection argument is either `projection = 'EPSG:32736'` or `projection =
#' 32736`. See details in [`sf::st_crs()`] and
#' <https://spatialreference.org> for a list of EPSG codes.
#'
#' The \code{sortBy} is used to order the input \code{data.table} when creating
#' \code{SpatialLines}. It must a \code{POSIXct} to ensure the rows are sorted
#' by date time.
#' The `sortBy` argument is used to order the input `DT` when creating sf
#' LINESTRINGs. It must a column in the input `DT` of type POSIXct to ensure the
#' rows are sorted by date time.
#'
#' The \code{threshold} must be provided in the units of the coordinates. The
#' \code{threshold} can be equal to 0 if strict overlap is required, else it
#' needs to be greater than 0. The coordinates must be planar coordinates (e.g.:
#' UTM). In the case of UTM, a \code{threshold} = 50 would indicate a 50m
#' distance threshold.
#' The `threshold` must be provided in the units of the coordinates. The
#' `threshold` can be equal to 0 if strict overlap is intended, otherwise it
#' should be some value greater than 0. The coordinates must be planar
#' coordinates (e.g.: UTM). In the case of UTM, a `threshold = 50` would
#' indicate a 50m distance threshold.
#'
#' The \code{timegroup} argument is optional, but recommended to pair with
#' \code{\link{group_times}}. The intended framework is to group rows temporally
#' with \code{\link{group_times}} then spatially with \code{group_lines} (or
#' \code{\link{group_pts}}, \code{\link{group_polys}}). With \code{group_lines},
#' pick a relevant \code{group_times} \code{threshold} such as \code{'1 day'} or
#' \code{'7 days'} which is informed by your study species and system.
#' The `timegroup` argument is optional, but recommended to pair with
#' [group_times]. The intended framework is to group rows temporally with
#' [group_times] then spatially with [group_lines] (or [group_pts],
#' [group_polys]). With [group_lines], pick a relevant [group_times] threshold
#' such as `'1 day'` or `'7 days'` which is informed by your study species,
#' system or question.
#'
#' The \code{splitBy} argument offers further control over grouping. If within
#' your \code{DT}, you have multiple populations, subgroups or other distinct
#' parts, you can provide the name of the column which identifies them to
#' \code{splitBy}. The grouping performed by \code{group_lines} will only
#' consider rows within each \code{splitBy} subgroup.
#' The `splitBy` argument offers further control building LINESTRINGs. If in
#' your input `DT`, you have multiple temporal groups (e.g.: years) for example,
#' you can provide the name of the column which identifies them and build
#' LINESTRINGs for each individual in each year. The grouping performed by
#' [group_lines] will only consider rows within each `splitBy` subgroup.
#'
#' @return \code{group_lines} returns the input \code{DT} appended with a
#' \code{group} column.
#' @return `group_lines` returns the input `DT` appended with a "group"
#' column.
#'
#' This column represents the spatial (and if \code{timegroup} was provided -
#' spatiotemporal) group calculated by overlapping lines. As with the other
#' This column represents the spatial (and if `timegroup` was provided -
#' spatiotemporal) group calculated by intersecting lines. As with the other
#' grouping functions, the actual value of group is arbitrary and represents
#' the identity of a given group where 1 or more individuals are assigned to a
#' group. If the data was reordered, the group may change, but the contents of
#' each group would not.
#'
#' A message is returned when a column named \code{group} already exists in
#' the input \code{DT}, because it will be overwritten.
#'
#' A message is returned when a column named "group" already exists in the
#' input `DT`, because it will be overwritten.
#'
#' @inheritParams group_pts
#' @inheritParams build_polys
#' @param threshold The width of the buffer around the lines in the units of the
#' projection. Supply 0 to compare intersection without buffering.
#' @param spLines Alternatively to providing a DT, provide a SpatialLines object
#' created with the sp package. If a spLines object is provided, groups cannot
#' be calculated by a timegroup or splitBy.
#' projection. Use `threshold = 0` to compare intersection without buffering.
#' @param sfLines Alternatively to providing a DT, provide a simple feature
#' LINESTRING object generated with the sf package. The id argument is
#' required to provide the identifier matching each LINESTRING.
#' If an sfLines object is provided, groups cannot be calculated by timegroup
#' or splitBy.
#' @param sortBy Character string of date time column(s) to sort rows by. Must
#' be a POSIXct.
#'
#' @export
#'
#' @family Spatial grouping
#' @seealso \code{\link{build_lines}} \code{\link{group_times}}
#' @seealso [build_lines] [group_times]
#'
#' @examples
#' # Load data.table
Expand All @@ -92,10 +103,10 @@
#' DT[, datetime := as.POSIXct(datetime, tz = 'UTC')]
#'
#' # EPSG code for example data
#' utm <- 'EPSG:32736'
#' utm <- 32736
#'
#' \donttest{group_lines(DT, threshold = 50, projection = utm, sortBy = 'datetime',
#' id = 'ID', coords = c('X', 'Y'))}
#' group_lines(DT, threshold = 50, projection = utm, sortBy = 'datetime',
#' id = 'ID', coords = c('X', 'Y'))
#'
#' ## Daily movement tracks
#' # Temporal grouping
Expand All @@ -114,7 +125,6 @@
#' id = 'ID', coords = c('X', 'Y'),
#' timegroup = 'timegroup', sortBy = 'datetime',
#' splitBy = 'population')

group_lines <-
function(DT = NULL,
threshold = NULL,
Expand All @@ -124,7 +134,7 @@ group_lines <-
timegroup = NULL,
sortBy = NULL,
splitBy = NULL,
spLines = NULL) {
sfLines = NULL) {

# due to NSE notes in R CMD check
group <- ..coords <- ..id <- ..sortBy <- withinGroup <- NULL
Expand All @@ -138,11 +148,37 @@ group_lines <-
stop('cannot provide a negative threshold')
}

if (!is.null(spLines) && !is.null(DT)) {
stop('cannot provide both DT and spLines')
} else if (is.null(spLines) && is.null(DT)) {
stop('must provide either DT or spLines')
} else if (is.null(spLines) && !is.null(DT)) {
if (!is.null(sfLines) && !is.null(DT)) {
stop('cannot provide both DT and sfLines')
} else if (is.null(sfLines) && is.null(DT)) {
stop('must provide either DT or sfLines')
} else if (!is.null(sfLines) && is.null(DT)) {
if (!inherits(sfLines, 'sf') ||
!'LINESTRING' %in% sf::st_geometry_type(sfLines)) {
stop('sfLines provided must be a sf object with LINESTRINGs')
}
if (is.null(id)) {
stop('id must be provided')
}

if (uniqueN(sfLines[[id]]) != nrow(sfLines)) {
stop('number of unique values in sfLines does not match nrow(sfLines)')
}

if (threshold == 0) {
inter <- sf::st_intersects(sfLines, sfLines, sparse = FALSE)
} else {
buffered <- sf::st_buffer(sfLines, dist = threshold)
inter <- sf::st_intersects(sfLines, buffered, sparse = FALSE)
}
dimnames(inter) <- list(sfLines[[id]], sfLines[[id]])
g <- igraph::graph_from_adjacency_matrix(inter)
ovr <- igraph::clusters(g)$membership
out <- data.table::data.table(names(ovr),
unlist(ovr))
data.table::setnames(out, c('ID', 'group'))
return(out[])
} else if (is.null(sfLines) && !is.null(DT)) {
if (is.null(projection)) {
stop('projection must be provided when DT is')
}
Expand Down Expand Up @@ -173,48 +209,32 @@ group_lines <-
message('group column will be overwritten by this function')
set(DT, j = 'group', value = NULL)
}
} else if (!is.null(spLines) && is.null(DT)) {
if (!('SpatialLines' %in% class(spLines) && isS4(spLines))) {
stop('spLines provided must be a SpatialLines object')
}

if (threshold == 0) {
inter <- rgeos::gIntersects(spLines, spLines, byid = TRUE)
} else {
buffered <- rgeos::gBuffer(spLines, width = threshold,
byid = TRUE)
inter <- rgeos::gIntersects(spLines, buffered, byid = TRUE)
}
g <- igraph::graph_from_adjacency_matrix(inter)
ovr <- igraph::clusters(g)$membership
out <- data.table::data.table(names(ovr),
unlist(ovr))
data.table::setnames(out, c('ID', 'group'))
return(out[])
}

if (is.null(timegroup)) {
withCallingHandlers({
spLines <- build_lines(
lns <- build_lines(
DT = DT,
projection = projection,
coords = coords,
id = id,
sortBy = sortBy
sortBy = sortBy,
splitBy = splitBy
)},
warning = function(w){
if (startsWith(conditionMessage(w), 'some rows dropped')) {
invokeRestart('muffleWarning')
}
}
)
if (!is.null(spLines)) {
if (nrow(lns) != 0) {
if (threshold == 0) {
inter <- rgeos::gIntersects(spLines, spLines, byid = TRUE)
inter <- sf::st_intersects(lns, lns, sparse = FALSE)
} else {
buffered <- rgeos::gBuffer(spLines, width = threshold, byid = TRUE)
inter <- rgeos::gIntersects(spLines, buffered, byid = TRUE)
buffered <- sf::st_buffer(lns, dist = threshold)
inter <- sf::st_intersects(lns, buffered, sparse = FALSE)
}
dimnames(inter) <- list(lns[[id]], lns[[id]])
g <- igraph::graph_from_adjacency_matrix(inter)
ovr <- igraph::clusters(g)$membership
ovrDT <- data.table::data.table(ID = names(ovr),
Expand Down Expand Up @@ -248,7 +268,7 @@ group_lines <-
ovrDT <-
DT[, {
withCallingHandlers({
spLines <- build_lines(
lns <- build_lines(
DT = .SD,
projection = projection,
coords = ..coords,
Expand All @@ -261,15 +281,14 @@ group_lines <-
}
}
)
if (!is.null(spLines)) {
if (!is.null(lns)) {
if (threshold == 0) {
inter <- rgeos::gIntersects(spLines, spLines, byid = TRUE)
inter <- sf::st_intersects(lns, lns, sparse = FALSE)
} else {
buffered <- rgeos::gBuffer(spLines, width = threshold,
byid = TRUE)
inter <- rgeos::gIntersects(spLines, buffered, byid = TRUE)

buffered <- sf::st_buffer(lns, dist = threshold)
inter <- sf::st_intersects(lns, buffered, sparse = FALSE)
}
dimnames(inter) <- list(lns[[id]], lns[[id]])
g <- igraph::graph_from_adjacency_matrix(inter)
ovr <- igraph::clusters(g)$membership
out <- data.table::data.table(names(ovr),
Expand Down
11 changes: 8 additions & 3 deletions man/build_lines.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/build_lines_sp.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/build_polys.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.