-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nc_proxy #499
nc_proxy #499
Conversation
… Start playing with vignette.
…blodgett-usgs-main
…blodgett-usgs-main
Looks great, and works for me too! |
It's coming along. I found some snags when testing the call_list stuff a bit last night. Think I have a solution there. Will update with more testing and examples in a bit. |
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
# Load up the same file from two sources.
nc_gdal <- read_stars(system.file("nc/bcsd_obs_1999.nc", package = "stars"),
proxy = TRUE)
nc_nc <- read_ncdf(system.file("nc/bcsd_obs_1999.nc", package = "stars"),
proxy = TRUE)
#> no 'var' specified, using pr, tas
#> other available variables:
#> latitude, longitude, time
#> No projection information found in nc file.
#> Coordinate variable units found to be degrees,
#> assuming WGS84 Lat/Lon.
sf::st_crs(nc_gdal) <- sf::st_crs(nc_nc)
nc_sf <- sf::st_transform(
read_sf(system.file("gpkg/nc.gpkg", package="sf")),
sf::st_crs(nc_nc))
# Crop with collect = TRUE adds to the call_list
(nc_gdal2 <- st_crop(nc_gdal, nc_sf[10, ], collect = TRUE))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#>
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 37 40 -85 0.125 WGS 84 NA NULL [x]
#> y 5 7 37.125 -0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000013397a60>
(nc_nc2 <- st_crop(nc_nc, nc_sf[10, ], collect = TRUE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#>
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000022140550>
# Convert to stars and plot.
(nc_gdal3 <- st_as_stars(nc_gdal2))
#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> pr [mm/m] 38.669998 56.425001 77.11500 88.78302 113.44250 227.21001 48
#> tas [C] 4.377258 6.257742 13.77183 13.97364 20.12717 25.98855 48
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 37 40 -85 0.125 WGS 84 NA NULL [x]
#> y 5 7 37.125 -0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
(nc_nc3 <- st_as_stars(nc_nc2))
#> Will return stars object with 144 cells.
#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> pr [mm/m] 38.669998 56.425001 77.11500 88.78302 113.44250 227.21001 48
#> tas [C] 4.377258 6.257742 13.77183 13.97364 20.12717 25.98855 48
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 1 4 -80.5 0.125 WGS 84 NA NULL [x]
#> latitude 1 3 36.25 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
plot(nc_sf$geom[10])
plot(nc_gdal3["pr",,,1], add = TRUE) plot(nc_sf$geom[10])
plot(nc_nc3["pr",,,1], add = TRUE) # Crop with collect = FALSE does not add to the call list.
(nc_gdal2 <- st_crop(nc_gdal, nc_sf[10, ], collect = FALSE))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#>
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 37 40 -85 0.125 WGS 84 NA NULL [x]
#> y 5 7 37.125 -0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
(nc_nc2 <- st_crop(nc_nc, nc_sf[10, ], collect = FALSE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
(nc_gdal3 <- st_as_stars(nc_gdal2))
#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> pr [mm/m] 38.560001 58.087500 76.69500 89.63785 114.58000 227.21001
#> tas [C] 4.377258 6.216976 13.72325 13.99577 20.12717 25.98855
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 37 40 -85 0.125 WGS 84 NA NULL [x]
#> y 5 7 37.125 -0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
(nc_nc3 <- st_as_stars(nc_nc2))
#> Will return stars object with 144 cells.
#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> pr [mm/m] 38.560001 58.087500 76.69500 89.63785 114.58000 227.21001
#> tas [C] 4.377258 6.216976 13.72325 13.99577 20.12717 25.98855
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 1 4 -80.5 0.125 WGS 84 NA NULL [x]
#> latitude 1 3 36.25 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
plot(nc_sf$geom[10])
plot(nc_gdal3["pr",,,1], add = TRUE) plot(nc_sf$geom[10])
plot(nc_nc3["pr",,,1], add = TRUE) # Also works with a slightly deeper call list.
(nc_gdal2 <- st_crop(nc_gdal, nc_sf[10, ]))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#>
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 37 40 -85 0.125 WGS 84 NA NULL [x]
#> y 5 7 37.125 -0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029aaf038>
(nc_nc2 <- st_crop(nc_nc, nc_sf[10, ]))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#>
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029fda550>
# NOTE: the first call in the call list will read multiple variables
# then this subset will be applied.
(nc_gdal2 <- nc_gdal2["pr", , , 1])
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#>
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 37 40 -85 0.125 WGS 84 NA NULL [x]
#> y 5 7 37.125 -0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029aaf038>
#>
#> [[2]]
#> x[i = i, , , 1, drop = drop, crop = crop]
#> attr(,".Environment")
#> <environment: 0x000000002a51fb20>
(nc_nc2 <- nc_nc2["pr", , , 1])
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#>
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029fda550>
#>
#> [[2]]
#> x[i = i, , , 1, drop = drop, crop = crop]
#> attr(,".Environment")
#> <environment: 0x000000002a9d62a0>
plot(nc_sf$geom[10])
plot(nc_gdal2, add = TRUE)
plot(nc_sf$geom[10])
plot(nc_nc2, add = TRUE)
#> Plotting first variable only.
#> Will return stars object with 144 cells. nc_nc
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 1 81 -85 0.125 WGS 84 NA NULL [x]
#> latitude 1 33 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
## Look at order of call_list operations.
# This will put two calls in the call list and modify dimensions.
(nc2 <- st_crop(nc_nc, nc_sf[10, ], collect = TRUE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#>
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x000000002a7cb068>
(nc2 <- nc2["pr", , , 1])
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#> tas
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 12 NA NA POSIXct NA 1999-01-31,...,1999-12-31
#>
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x000000002a7cb068>
#>
#> [[2]]
#> x[i = i, , , 1, drop = drop, crop = crop]
#> attr(,".Environment")
#> <environment: 0x000000002aceb400>
# This will put one call in the call list and modify dimensions more.
(nc3 <- nc_nc["pr", , , 1])
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 1 81 -85 0.125 WGS 84 NA NULL [x]
#> latitude 1 33 33 0.125 WGS 84 NA NULL [y]
#> time 1 1 NA NA POSIXct NA 1999-01-31 UTC
(nc3 <- st_crop(nc3, nc_sf[10, ], collect = TRUE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#>
#> Available nc variables:
#> pr
#>
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 37 40 -85 0.125 WGS 84 NA NULL [x]
#> latitude 27 29 33 0.125 WGS 84 NA NULL [y]
#> time 1 1 NA NA POSIXct NA 1999-01-31 UTC
#>
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x000000002b51a060>
# Same outcome but nc3 reads much less data.
(nc2 <- st_as_stars(nc2))
#> Will return stars object with 144 cells.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> pr [mm/m] 122.24 122.9675 123.83 125.84 127.805 134.97 4
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 1 4 -80.5 0.125 WGS 84 NA NULL [x]
#> latitude 1 3 36.25 0.125 WGS 84 NA NULL [y]
#> time 1 1 NA NA POSIXct NA 1999-01-31 UTC
(nc3 <- st_as_stars(nc3))
#> Will return stars object with 12 cells.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> pr [mm/m] 122.24 122.9675 123.83 125.84 127.805 134.97 4
#> dimension(s):
#> from to offset delta refsys point values x/y
#> longitude 1 4 -80.5 0.125 WGS 84 NA NULL [x]
#> latitude 1 3 36.25 0.125 WGS 84 NA NULL [y]
#> time 1 1 NA NA POSIXct NA 1999-01-31 UTC Created on 2022-01-12 by the reprex package (v2.0.1) |
I started working with curvilinear coordinates and am realizing there's not much (any?) curvilinear handling in |
Absolutely. The low hanging fruit here is where we do read the long and lat matrices in memory - not proxy those -, then proxy the array, and read selections of that in memory when needed (or else loop over chunks). |
OK -- that's what I was thinking it would do. I'll put some time into it when I get a chance. |
…h Z and T coordinates.
…d some debugging of coordinate/dimension handling
A little progress -- I can proxy the stageIV precip attributes now. st_as_stars.stars isn't much now that I factored out a stand along |
David, LGTM - is this ready to go? |
I certainly have more I would like to accomplish, but I think this is a positive step, so if you are comfortable merging with somewhat limited functionality compared to the long term, yes. |
I will refine this description tomorrow, but I'm excited to share progress.
As of January 10th, this needs a lot more testing, but the core of it is shown in the reprex below. The
stars_proxy
is now nearly 1:1 with the regularstars_proxy
. I think they could likely be combined with a little effort on both sides.Most of the heavy lifting is done in
read_ncdf
by updating the NetCDF request based on the nc_proxydimensions
passed in. I found a number of edge case bugs that are included and should help clean up some of the regularstars_proxy
methods.Created on 2022-01-10 by the reprex package (v2.0.1)