Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using targets with geoarrow? #1

Closed
njtierney opened this issue Feb 21, 2024 · 4 comments
Closed

Using targets with geoarrow? #1

njtierney opened this issue Feb 21, 2024 · 4 comments

Comments

@njtierney
Copy link
Owner

Heya @anthonynorth, @MilesMcBain mentioned that you would use geoarrow with targets to help store the data, I was just wondering if you have any pointers/thoughts on storing rasters and shapefiles in targets?

@anthonynorth
Copy link

Vector

Using {geoarrow} with {targets} should be pretty painless now with geoarrow/geoarrow-r latest. You'll just need to ensure {geoarrow} is attached prior to reading / writing. Minimal reprex with any old vector geometry data:

library(targets)
tar_script({
    requireNamespace("geoarrow")
    list(tar_target(sa4s, as.data.frame(strayr::read_absmap("sa42021")), 
        format = "parquet"))
})
requireNamespace("geoarrow")
#> Loading required namespace: geoarrow
tar_make()
#> Loading required namespace: geoarrow
#> > dispatched target sa4s
#> trying URL 'https://github.com/wfmackey/absmapsdata/blob/master/data/absmapsdata_file_list.rda?raw=true'
#> Content type 'application/octet-stream' length 407 bytes
#> ==================================================
#> downloaded 407 bytes
#> 
#> trying URL 'https://github.com/wfmackey/absmapsdata/raw/master/data/sa42021.rda'
#> Content type 'application/octet-stream' length 3044178 bytes (2.9 MB)
#> ==================================================
#> downloaded 2.9 MB
#> 
#> o completed target sa4s [3.2 seconds]
#> > ended pipeline [4.02 seconds]
str(tar_read(sa4s))
#> tibble [108 × 10] (S3: tbl_df/tbl/data.frame)
#>  $ sa4_code_2021  : chr [1:108] "101" "102" "103" "104" ...
#>  $ sa4_name_2021  : chr [1:108] "Capital Region" "Central Coast" "Central West" "Coffs Harbour - Grafton" ...
#>  $ gcc_code_2021  : chr [1:108] "1RNSW" "1GSYD" "1RNSW" "1RNSW" ...
#>  $ gcc_name_2021  : chr [1:108] "Rest of NSW" "Greater Sydney" "Rest of NSW" "Rest of NSW" ...
#>  $ state_code_2021: chr [1:108] "1" "1" "1" "1" ...
#>  $ state_name_2021: chr [1:108] "New South Wales" "New South Wales" "New South Wales" "New South Wales" ...
#>  $ areasqkm_2021  : num [1:108] 51896 1681 70297 13230 339356 ...
#>  $ cent_lat       : num [1:108] -35.6 -33.3 -33.2 -29.8 -31 ...
#>  $ cent_long      : num [1:108] 149 151 148 153 145 ...
#>  $ geometry       : geoarrow_vctr[1:108] <MULTIPOLYGON (((150.311 -35.666, 150.313 -35.668, 15
#>  - attr(*, "sf_column")= chr "geometry"
#>  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
#>   ..- attr(*, "names")= chr [1:9] "sa4_code_2021" "sa4_name_2021" "gcc_code_2021" "gcc_name_2021" ...

Created on 2024-02-21 with reprex v2.0.2

Raster

I don't use rasters much, so I don't have much input there.

@njtierney
Copy link
Owner Author

Awesome, thanks so much @anthonynorth ! That looks like it will work great for me.

Cheers!

@anthonynorth
Copy link

anthonynorth commented Feb 21, 2024

Oh looks like there's no friendly to / from geoarrow to sfc or other formats yet. It probably makes sense to wrap the whatever <-> geoarrow conversion in a tar_format.

Something like this (untested), using wkb as the output format type. I didn't check if a geoarrow/geoparquet tar format exists!

library(targets)
tar_script({
    requireNamespace("geoarrow")
    tar_format_geoparquet = tar_format(
      read = function(path) {
        arrow::read_parquet(path) |>
          dplyr::mutate(dplyr::across(wk::is_handleable, wk::as_wkb))
      },
      write = function(object, path) {
        arrow::write_parquet(object, path)
      },
      marshal = function(object) {
        arrow::as_arrow_table(object)
      },
      unmarshal = function(object) {
        as.data.frame(object) |>
          dplyr::mutate(dplyr::across(wk::is_handleable, wk::as_wkb))
      }
    )

    list(tar_target(sa4s, as.data.frame(strayr::read_absmap("sa42021")), 
        format = tar_format_geoparquet))
})
tar_make()
#> Loading required namespace: geoarrow
#> > dispatched target sa4s
#> trying URL 'https://github.com/wfmackey/absmapsdata/blob/master/data/absmapsdata_file_list.rda?raw=true'
#> Content type 'application/octet-stream' length 407 bytes
#> ==================================================
#> downloaded 407 bytes
#> 
#> trying URL 'https://github.com/wfmackey/absmapsdata/raw/master/data/sa42021.rda'
#> Content type 'application/octet-stream' length 3044178 bytes (2.9 MB)
#> ==================================================
#> downloaded 2.9 MB
#> 
#> o completed target sa4s [2.42 seconds]
#> > ended pipeline [3.41 seconds]
requireNamespace("geoarrow")
#> Loading required namespace: geoarrow
str(tar_read(sa4s))
#> tibble [108 × 10] (S3: tbl_df/tbl/data.frame)
#>  $ sa4_code_2021  : chr [1:108] "101" "102" "103" "104" ...
#>  $ sa4_name_2021  : chr [1:108] "Capital Region" "Central Coast" "Central West" "Coffs Harbour - Grafton" ...
#>  $ gcc_code_2021  : chr [1:108] "1RNSW" "1GSYD" "1RNSW" "1RNSW" ...
#>  $ gcc_name_2021  : chr [1:108] "Rest of NSW" "Greater Sydney" "Rest of NSW" "Rest of NSW" ...
#>  $ state_code_2021: chr [1:108] "1" "1" "1" "1" ...
#>  $ state_name_2021: chr [1:108] "New South Wales" "New South Wales" "New South Wales" "New South Wales" ...
#>  $ areasqkm_2021  : num [1:108] 51896 1681 70297 13230 339356 ...
#>  $ cent_lat       : num [1:108] -35.6 -33.3 -33.2 -29.8 -31 ...
#>  $ cent_long      : num [1:108] 149 151 148 153 145 ...
#>  $ geometry       : wk_wkb[1:108] <MULTIPOLYGON (((150.3113 -35.66587, 150.3126 -35.66813, 150
#>  - attr(*, "sf_column")= chr "geometry"
#>  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
#>   ..- attr(*, "names")= chr [1:9] "sa4_code_2021" "sa4_name_2021" "gcc_code_2021" "gcc_name_2021" ...

Created on 2024-02-21 with reprex v2.0.2

@njtierney
Copy link
Owner Author

njtierney commented Mar 4, 2024

Thanks again for your help with this @anthonynorth ! I'm currently working on a geotargets extension package: https://github.com/njtierney/geotargets which should hopefully implement these features. I've linked this issue - looking forward to working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants