Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate targets that are a list of geospatial rasters #3

Closed
njtierney opened this issue Mar 4, 2024 · 7 comments · Fixed by #50
Closed

Facilitate targets that are a list of geospatial rasters #3

njtierney opened this issue Mar 4, 2024 · 7 comments · Fixed by #50

Comments

@njtierney
Copy link
Owner

For example, in:

targets::tar_script({
  library(geodata)
  library(targets)
  agcrop_area <- function(crop ) {
    
    the_raster <- crop_spam(
      crop = crop,
      var = "area",
      path = "data/rasters",
      africa = TRUE
    )
    
    the_raster
    
  }
  
  format_geotiff <- tar_format(
    read = function(path) terra::rast(path),
    write = function(object, path) {
      terra::writeRaster(
        x = object,
        filename = path,
        filetype = "GTiff",
        overwrite = TRUE
      )
    },
    marshal = function(object) terra::wrap(object),
    unmarshal = function(object) terra::unwrap(object)
  )
  
  tar_target(raster_coffee  = agcrop_area(crop = "acof"))
  tar_target(raster_veg = agcrop_area(crop = "vege"))
  
  tar_target(
    raster_countries,
    command = list(
      raster_coffee,
      raster_veg
    ),
    format = format_geotiff
  )
  
    
})

tar_target(
  raster_countries,
  command = list(
    raster_coffee,
    raster_veg
  ),
  format = format_geotiff
)
#> Error in tar_target(raster_countries, command = list(raster_coffee, raster_veg), : could not find function "tar_target"

Created on 2024-03-04 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.3.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Melbourne
#>  date     2024-03-04
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.3.0)
#>  base64url     1.4     2018-05-14 [2] CRAN (R 4.3.0)
#>  callr         3.7.5   2024-02-19 [1] CRAN (R 4.3.1)
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
#>  codetools     0.2-19  2023-02-01 [2] CRAN (R 4.3.3)
#>  data.table    1.15.0  2024-01-30 [1] CRAN (R 4.3.1)
#>  digest        0.6.34  2024-01-11 [1] CRAN (R 4.3.1)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.1)
#>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.1)
#>  igraph        2.0.2   2024-02-17 [1] CRAN (R 4.3.1)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  processx      3.8.3   2023-12-10 [1] CRAN (R 4.3.1)
#>  ps            1.7.6   2024-01-18 [1] CRAN (R 4.3.1)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [2] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [2] CRAN (R 4.3.0)
#>  R.oo          1.26.0  2024-01-24 [2] CRAN (R 4.3.1)
#>  R.utils       2.12.3  2023-11-18 [2] CRAN (R 4.3.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex        2.1.0   2024-01-11 [2] CRAN (R 4.3.1)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
#>  secretbase    0.3.0   2024-02-21 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.3.0)
#>  styler        1.10.2  2023-08-29 [2] CRAN (R 4.3.0)
#>  targets       1.5.1   2024-02-15 [1] CRAN (R 4.3.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.1)
#>  xfun          0.42    2024-02-08 [1] CRAN (R 4.3.1)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
#> 
#>  [1] /Users/nick/Library/R/arm64/4.3/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@njtierney
Copy link
Owner Author

njtierney commented Mar 4, 2024

I was hoping that

format_geotiffs <- tar_format(
  read = function(path) purrr::map(path, terra::rast),
  write = function(object, path) {
    purrr::map2(
      .x = object,
      .y = path,
      .f = function(x, y) {
        terra::writeRaster(
          x = x,
          filename = y,
          filetype = "GTiff",
          overwrite = TRUE
        )
      }
    )
  },
  marshal = function(object) purrr::map(object, terra::wrap),
  unmarshal = function(object) purrr::map(object, terra::unwrap)
)

would work but it only returns the first element in the list when loading it :(

@brownag
Copy link
Contributor

brownag commented Mar 4, 2024

If the rasters are conformal (i.e. same resolution, extent, CRS) then it would probably be best to call rast() on the list, and write a multiband GeoTIFF.

If the rasters have different resolution, extent, or CRS, you can create a SpatRasterCollection object with sprc(). The same code should approximately work for an ordinary list. There is no writeRaster(<SpatRasterCollection>) method, but you can achieve the desired writing of various grids a single file by looping and toggling the GDAL creation option "APPEND_SUBDATASET=YES" for the 2nd subdataset on to the end. For example:

targets::tar_script({
    library(terra)
    library(targets)

    elev_scale <- function(z = 1, projection = "EPSG:4326") {
        terra::project(terra::rast(system.file("ex", "elev.tif", package = "terra")) * z,
                       projection)
    }

    format_sprc_geotiff <- tar_format(
        read = function(path) terra::sprc(path),
        write = function(object, path) {
            for (i in seq(object)) {
                if (i > 1) {
                    opt <- "APPEND_SUBDATASET=YES"
                } else opt <- ""
                terra::writeRaster(
                    x = object[i],
                    filename = path,
                    filetype = "GTiff",
                    overwrite = (i == 1),
                    gdal = opt
                )
            }
        },
        marshal = function(object) terra::wrap(object),
        unmarshal = function(object) terra::unwrap(object)
    )

    tar_target(
        raster_elevs,
        # two rasters, one unaltered, one scaled by factor of 2 and reprojected to interrupted good homolosine
        command = terra::sprc(list(elev_scale(1), 
                                   elev_scale(2, "+proj=igh"))),
        format = format_sprc_geotiff
    )
})
targets::tar_make()
#> terra 1.7.73
#> ▶ dispatched target raster_elevs
#> ● completed target raster_elevs [0.177 seconds]
#> ▶ completed pipeline [0.33 seconds]
#> Warning message:
#> [rast] skipped sub-datasets (see 'describe(sds=TRUE)'):
#> /tmp/RtmpX6R4Zb/reprex-26a50397dc957-flat-ram/_targets/scratch/raster_elevs
targets::tar_load("raster_elevs")
raster_elevs[1]
#> class       : SpatRaster 
#> dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : raster_elevs 
#> name        : elevation 
#> min value   :       141 
#> max value   :       547
raster_elevs[2]
#> class       : SpatRaster 
#> dimensions  : 115, 114, 1  (nrow, ncol, nlyr)
#> resolution  : 683.4048, 683.4048  (x, y)
#> extent      : 1480982, 1558890, 5478149, 5556741  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=igh +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#> source      : raster_elevs 
#> name        : elevation 
#> min value   :  282.8344 
#> max value   : 1087.3800

Created on 2024-03-03 with reprex v2.0.2

@Aariq
Copy link
Collaborator

Aariq commented Mar 4, 2024

The other option would be dynamic branching that creates a downstream target for each element of SpatRasterCollection which might require some custom option for the iteration argument. Not sure if SpatRasterCollection acts like a list or vector well enough for dynamic branching to work "out of the box".

@njtierney
Copy link
Owner Author

Thanks for providing examples here @brownag ! I am new to spatial stuff and didn't know that you could combine rasters like that. That's really neat.

Looking forward to making a nice rich set of vignettes to demonstrate all these examples!

@Aariq
Copy link
Collaborator

Aariq commented Mar 19, 2024

Confirmed that dynamic branching does not work "out of the box". An example (not reproducible, sorry, will investigate reprex later)

tar_plan(
  years = 2022:2023,
  tar_target(
    name = prism_tmean,
    command = get_prism_tmean(years),
    pattern = map(years),
    format = "file"
  ),
  tar_terra_rast(
    gdd_doy,
    calc_gdd_doy(dir = prism_tmean, gdd_threshold = 200),
    pattern = map(prism_tmean)
  )
)

This pipeline runs without error, but tar_read(gdd_doy) gives: ! gdd_doy_1dda6797 must be a vector, not a <SpatRaster> object. This is because the default iteration is "vector" which uses vctrs::vec_c(). We actually want it to use the c() method for SpatRaster objects here. For example, this works:

> c(tar_read(gdd_doy_1dda6797), tar_read(gdd_doy_14d2965f))
class       : SpatRaster 
dimensions  : 621, 1405, 2  (nrow, ncol, nlyr)
resolution  : 0.04166667, 0.04166667  (x, y)
extent      : -125.0208, -66.47917, 24.0625, 49.9375  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269) 
sources     : gdd_doy_1dda6797  
              gdd_doy_14d2965f  
names       : 2022, 2023 
min values  :    8,    9 
max values  :  214,  211 

As a workaround, I can use iteration = "list" and then do rast(tar_read(gdd_doy)) to get a 2-layer SpatRaster, but the names of the layers then come from the names of the targets, not the name of the layer in each SpatRaster, so it's not ideal.

This kind of stuff should at least be in a vignette.

@Aariq
Copy link
Collaborator

Aariq commented Mar 28, 2024

I've now run into a situation where I have a large multi-layer raster (1.4GB) that doesn't seem to download correctly when S3 bucket storage is used. It would be nice if there was a frictionless way of optionally saving each layer as a separate target (i.e. file in the _targets/ store) with dynamic branching.

@njtierney
Copy link
Owner Author

i've just used the format_sprc_geotiff as @brownag wrote - I'm keen to implement this in a PR and we can iterate on things there - for example adding documentation for these different examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants