-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] tar_terra_rast_wrap: multi-target method to preserve SpatRaster metadata #63
base: master
Are you sure you want to change the base?
Conversation
… metadata - Source data file is written using `terra::wrapCache()` to a user-specified cache directory - Target is created for PackedSpatRaster based on cache - Target is created for cache files - Add cache-managing functions `geotargets_destroy_cache()` and `geotargets_init_cache()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it has taken so long to get to this. I'm excited about this PR because it seems to me like this could actually make sense to just be the default way that geotargets
works with terra
and that would solve a lot of issues.
#' @export | ||
geotargets_init_cache <- function(name = NULL) { | ||
cachedir <- geotargets_option_get("cache.dir") | ||
target_cache_dir <- file.path(cachedir %||% "geotargets_cache", name %||% "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_cache_dir <- file.path(cachedir %||% "geotargets_cache", name %||% "") | |
target_cache_dir <- file.path(cachedir %||% "_geotargets", name %||% "") |
Maybe? Just for consistency with _targets/
—both being directories you shouldn't edit manually.
geotargets_destroy_cache <- function(name = NULL, init = FALSE) { | ||
cachedir <- geotargets_option_get("cache.dir") | ||
target_cache_dir <- file.path(cachedir %||% "geotargets_cache", name %||% "") | ||
res <- unlink(target_cache_dir, recursive = TRUE) | ||
if (init) geotargets_init_cache(name = name) | ||
invisible(res) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a way that this could set some "flag" that could be used to invalidate the "upstream" target through a custom cue
? Or perhaps it runs tar_invalidate()
on all targets created with tar_terra_rast_wrap()
when run? I think it's fine that manually deleting a file from the cache breaks the pipeline, but I think any "official" way of deleting the cache should correctly invalidate targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think of it, the directory names inside the cache are target names, yeah? So if this could get all those dir names and pass them to tar_invalidate(any_of(dirnames))
I think it would make this function a lot more useful.
resources = targets::tar_option_get("resources"), | ||
storage = targets::tar_option_get("storage"), | ||
retrieval = targets::tar_option_get("retrieval"), | ||
cue = targets::tar_option_get("cue")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cue = targets::tar_option_get("cue")) { | |
cue = targets::tar_option_get("cue"), | |
description = targets::tar_option_get("description")) { |
full.names = TRUE, | ||
recursive = TRUE | ||
)")), | ||
format = "file_fast", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could allow the option of either "file" or "file_fast" here, but I'm guessing it doesn't really matter since it seems like nothing will ever depend on this target.
rast_cache_files <- targets::tar_target_raw( | ||
paste0(name, "_cache_files"), | ||
str2expression(paste0(" | ||
list.files( | ||
file.path(", shQuote(cachedir), ", ", shQuote(name),"), | ||
full.names = TRUE, | ||
recursive = TRUE | ||
)")), | ||
format = "file_fast", | ||
deps = name | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm questioning whether this target even needs to exist. Unless there is a way to make it be "upstream" of the wrapCache
target, then it doesn't really serve a purpose. Invalidating this target will never do anything, and there's no reason to use this target rather than the upstream one in a pipeline. So maybe this doesn't need to return multiple targets.
extension <- switch(filetype, | ||
"GTiff" = ".tif", | ||
"GPKG" = ".gpkg", | ||
"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? We don't need file extensions when writing to _targets/
, why do we need them here?
This is a draft PR that might be able to address #58
This is a completely different way of managing target files--where the target file in _targets/objects/ is an RDS file (like ordinary targets) containing a PackedSpatRaster which is backed by a cached geospatial data file (and any sidecars) held in a user-specified folder
terra::wrapCache()
to a user-specified cache directorygeotargets_destroy_cache()
,geotargets_init_cache()
and env option for cache path GEOTARGETS_CACHE_DIR (and associated methods)For now this only works for SpatRaster, but I think a similar solution could be developed for SpatVectorProxy (although this would require either changes to
wrapCache()
in terra, or a customwrapCache()
-like method developed for this case)Current "issue" is that you can modify the cache (intentionally or unintentionally) and the main target will not be invalidated. I tried tracking the cache directory before running the caching target, but then this leads to the caching having to run twice before it is skipped.
Example of storing units and categories: