-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas on generalization of spatial package backends and file sources using GDAL (terra, sf, stars, etc.) #4
Comments
Regarding Item 2 (/vsizip/), it should be possible to drop the .zip extension from file and target name. I was not aware of the alternate syntax! From https://gdal.org/user/virtual_file_systems.html#vsizip-zip-archives:
|
These are great ideas and I totally agree that the ideal situation would be one in which users can provide |
excellent descript @brownag note there is also {gdalraster} now with already support for SOZip creation and file management, and nascent 'gdalvector' support: https://usdaforestservice.github.io/gdalraster/reference/addFilesInZip.html https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html gdalraster has become very richly featured very quickly, and could be the GDAL API that's otherwise entirely missing from R atm. |
I lean towards this conclusion also-- so that the choice is explicit and not changeable through magical options or settings. How these operations would be handled on the backend could/should be more generic, but for reproducibility and clarity it is probably best to give users options that require explicit choices. This might mean many combinations of thin wrapper methods around the core functions, but I don't think that is inherently bad as long as there is some overall order to how they are named.
Sweet! I have seen {gdalraster} and watched some of the (rapid) progress on that with interest... but I don't think I was aware of the plans to provide bindings for the OGR vector API! I have used your vapour package for some of my generic/vector GDAL needs that go beyond terra/sf Something truly generic, mirroring the GDAL API/"close to the GDAL metal" would allow for all sorts of capabilities and customization--perhaps {gdalraster} would be a good choice for an imported package doing the core backend work for plumbing to the various user-facing types/formats. |
Thanks for this @brownag ! Following on from @Aariq 's #7 - I quite like the idea of I'm still learning about a lot of spatial things, so there is a bit of this I don't quite understand, but I think that this issue could be split out into multiple components, eventually, as there are a few threads in here. Overall my preference for syntax would be something like: tar_terra_raster(
new_raster
raster_creation_function(args)
) But overall this would work the same as: tar_target(
new_raster
raster_creation_function(args),
format = "format_terra_raster"
) Or something? with tar_terra_shapefile(
my_shape,
create_shapefile(args),
filetype = "parquet"
) But then I wonder if tar_parquet_shapefile(
my_shape,
create_shapefile(args)
) Would be better? Naming things is hard. But I think that it is worthwhile thinking about the API design - once we have ideas on how we want the user to interact with the package my experience is that it is usually easier to write the code. |
I agree we should try to think through this and make some decisions before getting too far. Super helpful discussion here. |
I think everything in here is either in the package already, a PR, or a separate issue. Thanks for the contributions @brownag! |
I wanted to throw up some ideas for discussion, might be a bit rambling for a single issue. Happy to break off any particular items as new issues or address in specific PRs; I will submit some draft PRs once I have fleshed these ideas out. I say "we" a lot in here but ultimately I am just one interested opinion and welcome any thoughts or alternatives.
The current target storage format functions defined are file-format centric. This is great, because GDAL is the library behind the scenes for common interfaces to a variety of different file formats. GDAL is used in several R spatial packages notably: sf, terra, and stars. I think this project should abstract out the functionality for GDAL data source paths and provide support for multiple R package/object type interfaces in the result the user sees.
In my opinion, {geotargets} should provide default behavior based on type of spatial data, i.e. vector geometry vs. raster--this is so the user doesn't have to think too much about the formats in their target store, just that they are able to roundtrip an R object equivalent to what they started with. If they care about the format, they should have the ability to choose.
I'd like to make (or suggest others make) a couple PRs to implement:
These should provide some room for discussion about specifics how the group wants to abstract or break out functionality.
Spatial backends based on GDAL
I imagine some users don't care so much what file format their target store contains, but likely will care more about the object types that are returned and the associated packages. The object type matters because of chosen dependencies and preferred workflows of the user. The file type may matter especially when it comes time to read targets back in, in part or in full, when they start taking up a lot of disk space, or some step in the process requires a specific format.
We may not want to require users to load both {sf} and {terra}, for example
Package usage gated by
requireNamespace()
and having all of these types of packages that produce the user-facing object in Suggests seems like a good strategy. The alternative would be to say, pick {terra} for use internally and then provide conversion methods for compatibility with other ({sf}/{stars}) objects as input/output.I personally am a big fan of {terra}, but still use {sf} for quite a few things. {terra} is great in that it can do both vector and raster data, but there are many R spatial users and a much broader R ecosystem built around {sf}. I think users should be able to avoid one or the other, or interchange as needed, in their workflows if they need to be able to.
Specific result types (e.g. sf data.frame, or lazy tbl, vs SpatVector/SpatVectorProxy) would be customize with options set for the whole pipeline, for a target factory, or in wrapper functions.
For example:
In addition to a
tar_geotiff()
with multiple options set we could have functions liketar_geotiff_stars()
andtar_geotiff_terra()
. More generic functions would be possible if we abstract out the file type for all GDAL drivers, you might havetar_vector_sf(filetype="parquet")
ortar_vector(filetype="ESRI Shapefile", package="terra")
Target factories and formats could utilize default arguments, possibly customized based on selected filetype; they might read a targets option or environment variable, or be settable through a function {geotargets} would offer.
tar_shapefile_*()
helper method. Perhaps such a function would be better namedtar_shapefile_zip()
to indicate that it only works with target names ending in a ".zip" suffix (which is something users may prefer to avoid)Generalization of compression for spatial targets with GDAL
The /vsizip/ GDAL virtual file system functionality used in
format_shapefile()
is an example of something that can be generalized further with a focus on generic GDAL data source paths. I think the idea of being able to compress files that are in the target store (and keep them compressed) is attractive for spatial data which can be quite large--even if targets are not comprised of multiple files.utils::zip()
.The text was updated successfully, but these errors were encountered: