Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nc_proxy #499

Merged
merged 30 commits into from
Mar 8, 2022
Merged

nc_proxy #499

merged 30 commits into from
Mar 8, 2022

Conversation

dblodgett-usgs
Copy link
Contributor

I will refine this description tomorrow, but I'm excited to share progress.

As of January 10th, this needs a lot more testing, but the core of it is shown in the reprex below. The stars_proxy is now nearly 1:1 with the regular stars_proxy. I think they could likely be combined with a little effort on both sides.

Most of the heavy lifting is done in read_ncdf by updating the NetCDF request based on the nc_proxy dimensions passed in. I found a number of edge case bugs that are included and should help clean up some of the regular stars_proxy methods.

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

# We can get a proxy object from an OPeNDAP source:
f <- "https://cida.usgs.gov/thredds/dodsC/bcsd_obs"

nc <- stars::read_ncdf(f, proxy = TRUE)
#> no 'var' specified, using pr, prate, tas, tasmax, tasmin, wind
#> other available variables:
#>  longitude, latitude, time, longitude_bnds, latitude_bnds
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.

# the print method shows what's available.
nc
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs"
#> 
#> Available nc variables:
#> pr
#> prate
#> tas
#> tasmax
#> tasmin
#> wind
#> 
#> dimension(s):
#>           from  to  offset delta  refsys point                    values x/y
#> longitude    1 462 -124.75 0.125  WGS 84    NA                      NULL [x]
#> latitude     1 222  25.125 0.125  WGS 84    NA                      NULL [y]
#> time         1 600      NA    NA POSIXct    NA 1950-01-31,...,1999-12-31

# the plot method plots some number of timesteps.
plot(nc, max_times = 9)
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.
#> Will return stars object with 923076 cells.

# we can use subsetting as normal.
plot(nc["tasmax"], max_times = 1)
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.
#> Will return stars object with 102564 cells.

states <- sf::read_sf("https://reference.geoconnex.us/collections/states/items")

states <- dplyr::filter(states, !STUSPS %in% c("AS", "AK", "HI", "PR", "VI", "GU", "MP"))


IA <- dplyr::filter(states, STUSPS == "IA")
plot(sf::st_geometry(IA))
plot(sf::st_geometry(states), add = TRUE)

# We can use the sf selector...
nc_crop <- nc[sf::st_bbox(IA)]

# Or just use st_crop
nc_crop <- st_crop(nc, sf::st_bbox(IA), collect = FALSE)

# Other options cause a call list, but the above don't...
attr(nc_crop, "call_list")
#> NULL

# Plot one time step of just the cropped proxy pulls data
plot(nc_crop[,,,1], add = TRUE)
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.
#> Will return stars object with 1378 cells.

# We can also just subset directly...
plot(sf::st_geometry(states))
plot(nc["wind", 100:300, 100:200, 25], add = TRUE)
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.
#> Will return stars object with 20301 cells.

# Normal stuff works too...
f <- system.file("nc/reduced.nc", package = "stars")

nc <- read_ncdf(f, proxy = TRUE)
#> no 'var' specified, using sst, anom, err, ice
#> other available variables:
#>  lon, lat, zlev, time
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.

nc["sst"]
#> netcdf source stars proxy object from:
#> [1] "[...]/reduced.nc"
#> 
#> Available nc variables:
#> sst
#> 
#> dimension(s):
#>      from  to offset delta  refsys point         values x/y
#> lon     1 180     -1     2  WGS 84    NA           NULL [x]
#> lat     1  90    -90     2  WGS 84    NA           NULL [y]
#> zlev    1   1     NA    NA      NA    NA              0    
#> time    1   1     NA    NA POSIXct    NA 1981-12-31 UTC

nc[[5]] <- nc[[4]]

nc
#> netcdf source stars proxy object from:
#> [1] "[...]/reduced.nc"
#> 
#> Available nc variables:
#> sst
#> anom
#> err
#> ice
#> 
#> 
#> dimension(s):
#>      from  to offset delta  refsys point         values x/y
#> lon     1 180     -1     2  WGS 84    NA           NULL [x]
#> lat     1  90    -90     2  WGS 84    NA           NULL [y]
#> zlev    1   1     NA    NA      NA    NA              0    
#> time    1   1     NA    NA POSIXct    NA 1981-12-31 UTC

c(nc["sst"], nc["ice"])
#> netcdf source stars proxy object from:
#> [1] "[...]/reduced.nc"
#> 
#> Available nc variables:
#> sst
#> ice
#> 
#> dimension(s):
#>      from  to offset delta  refsys point         values x/y
#> lon     1 180     -1     2  WGS 84    NA           NULL [x]
#> lat     1  90    -90     2  WGS 84    NA           NULL [y]
#> zlev    1   1     NA    NA      NA    NA              0    
#> time    1   1     NA    NA POSIXct    NA 1981-12-31 UTC

Created on 2022-01-10 by the reprex package (v2.0.1)

@edzer
Copy link
Member

edzer commented Jan 12, 2022

Looks great, and works for me too!

@dblodgett-usgs
Copy link
Contributor Author

It's coming along. I found some snags when testing the call_list stuff a bit last night. Think I have a solution there. Will update with more testing and examples in a bit.

@dblodgett-usgs
Copy link
Contributor Author

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

# Load up the same file from two sources.
nc_gdal <- read_stars(system.file("nc/bcsd_obs_1999.nc", package = "stars"),
                 proxy = TRUE)

nc_nc <- read_ncdf(system.file("nc/bcsd_obs_1999.nc", package = "stars"),
                proxy = TRUE)
#> no 'var' specified, using pr, tas
#> other available variables:
#>  latitude, longitude, time
#> No projection information found in nc file. 
#>  Coordinate variable units found to be degrees, 
#>  assuming WGS84 Lat/Lon.

sf::st_crs(nc_gdal) <- sf::st_crs(nc_nc)

nc_sf <- sf::st_transform(
    read_sf(system.file("gpkg/nc.gpkg", package="sf")),
    sf::st_crs(nc_nc))

# Crop with collect = TRUE adds to the call_list
(nc_gdal2 <- st_crop(nc_gdal, nc_sf[10, ], collect = TRUE))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#> 
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#> 
#> dimension(s):
#>      from to offset  delta  refsys point                    values x/y
#> x      37 40    -85  0.125  WGS 84    NA                      NULL [x]
#> y       5  7 37.125 -0.125  WGS 84    NA                      NULL [y]
#> time    1 12     NA     NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000013397a60>
(nc_nc2 <- st_crop(nc_nc, nc_sf[10, ], collect = TRUE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> 
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000022140550>

# Convert to stars and plot.
(nc_gdal3 <- st_as_stars(nc_gdal2))

#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#>                Min.   1st Qu.   Median     Mean   3rd Qu.      Max. NA's
#> pr [mm/m] 38.669998 56.425001 77.11500 88.78302 113.44250 227.21001   48
#> tas [C]    4.377258  6.257742 13.77183 13.97364  20.12717  25.98855   48
#> dimension(s):
#>      from to offset  delta  refsys point                    values x/y
#> x      37 40    -85  0.125  WGS 84    NA                      NULL [x]
#> y       5  7 37.125 -0.125  WGS 84    NA                      NULL [y]
#> time    1 12     NA     NA POSIXct    NA 1999-01-31,...,1999-12-31
(nc_nc3 <- st_as_stars(nc_nc2))
#> Will return stars object with 144 cells.
#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#>                Min.   1st Qu.   Median     Mean   3rd Qu.      Max. NA's
#> pr [mm/m] 38.669998 56.425001 77.11500 88.78302 113.44250 227.21001   48
#> tas [C]    4.377258  6.257742 13.77183 13.97364  20.12717  25.98855   48
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude    1  4  -80.5 0.125  WGS 84    NA                      NULL [x]
#> latitude     1  3  36.25 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31

plot(nc_sf$geom[10])
plot(nc_gdal3["pr",,,1], add = TRUE)

plot(nc_sf$geom[10])
plot(nc_nc3["pr",,,1], add = TRUE)

# Crop with collect = FALSE does not add to the call list.  
(nc_gdal2 <- st_crop(nc_gdal, nc_sf[10, ], collect = FALSE))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#> 
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#> 
#> dimension(s):
#>      from to offset  delta  refsys point                    values x/y
#> x      37 40    -85  0.125  WGS 84    NA                      NULL [x]
#> y       5  7 37.125 -0.125  WGS 84    NA                      NULL [y]
#> time    1 12     NA     NA POSIXct    NA 1999-01-31,...,1999-12-31
(nc_nc2 <- st_crop(nc_nc, nc_sf[10, ], collect = FALSE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31

(nc_gdal3 <- st_as_stars(nc_gdal2))

#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#>                Min.   1st Qu.   Median     Mean   3rd Qu.      Max.
#> pr [mm/m] 38.560001 58.087500 76.69500 89.63785 114.58000 227.21001
#> tas [C]    4.377258  6.216976 13.72325 13.99577  20.12717  25.98855
#> dimension(s):
#>      from to offset  delta  refsys point                    values x/y
#> x      37 40    -85  0.125  WGS 84    NA                      NULL [x]
#> y       5  7 37.125 -0.125  WGS 84    NA                      NULL [y]
#> time    1 12     NA     NA POSIXct    NA 1999-01-31,...,1999-12-31
(nc_nc3 <- st_as_stars(nc_nc2))
#> Will return stars object with 144 cells.
#> stars object with 3 dimensions and 2 attributes
#> attribute(s):
#>                Min.   1st Qu.   Median     Mean   3rd Qu.      Max.
#> pr [mm/m] 38.560001 58.087500 76.69500 89.63785 114.58000 227.21001
#> tas [C]    4.377258  6.216976 13.72325 13.99577  20.12717  25.98855
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude    1  4  -80.5 0.125  WGS 84    NA                      NULL [x]
#> latitude     1  3  36.25 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31

plot(nc_sf$geom[10])
plot(nc_gdal3["pr",,,1], add = TRUE)

plot(nc_sf$geom[10])
plot(nc_nc3["pr",,,1], add = TRUE)

# Also works with a slightly deeper call list.
(nc_gdal2 <- st_crop(nc_gdal, nc_sf[10, ]))
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#> 
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#> 
#> dimension(s):
#>      from to offset  delta  refsys point                    values x/y
#> x      37 40    -85  0.125  WGS 84    NA                      NULL [x]
#> y       5  7 37.125 -0.125  WGS 84    NA                      NULL [y]
#> time    1 12     NA     NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029aaf038>
(nc_nc2 <- st_crop(nc_nc, nc_sf[10, ]))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> 
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029fda550>

# NOTE: the first call in the call list will read multiple variables
# then this subset will be applied.
(nc_gdal2 <- nc_gdal2["pr", , , 1])
#> stars_proxy object with 2 attributes in 2 file(s):
#> $pr
#> [1] "[...]/bcsd_obs_1999.nc:pr"
#> 
#> $tas
#> [1] "[...]/bcsd_obs_1999.nc:tas"
#> 
#> dimension(s):
#>      from to offset  delta  refsys point                    values x/y
#> x      37 40    -85  0.125  WGS 84    NA                      NULL [x]
#> y       5  7 37.125 -0.125  WGS 84    NA                      NULL [y]
#> time    1 12     NA     NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029aaf038>
#> 
#> [[2]]
#> x[i = i, , , 1, drop = drop, crop = crop]
#> attr(,".Environment")
#> <environment: 0x000000002a51fb20>
(nc_nc2 <- nc_nc2["pr", , , 1])
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> 
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x0000000029fda550>
#> 
#> [[2]]
#> x[i = i, , , 1, drop = drop, crop = crop]
#> attr(,".Environment")
#> <environment: 0x000000002a9d62a0>

plot(nc_sf$geom[10])
plot(nc_gdal2, add = TRUE)

plot(nc_sf$geom[10])
plot(nc_nc2, add = TRUE)
#> Plotting first variable only.
#> Will return stars object with 144 cells.

nc_nc
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude    1 81    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude     1 33     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31

## Look at order of call_list operations.

# This will put two calls in the call list and modify dimensions.
(nc2 <- st_crop(nc_nc, nc_sf[10, ], collect = TRUE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> 
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x000000002a7cb068>

(nc2 <- nc2["pr", , , 1])
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> tas
#> 
#> dimension(s):
#>           from to offset delta  refsys point                    values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA                      NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA                      NULL [y]
#> time         1 12     NA    NA POSIXct    NA 1999-01-31,...,1999-12-31    
#> 
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x000000002a7cb068>
#> 
#> [[2]]
#> x[i = i, , , 1, drop = drop, crop = crop]
#> attr(,".Environment")
#> <environment: 0x000000002aceb400>

# This will put one call in the call list and modify dimensions more.
(nc3 <- nc_nc["pr", , , 1])
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> 
#> dimension(s):
#>           from to offset delta  refsys point         values x/y
#> longitude    1 81    -85 0.125  WGS 84    NA           NULL [x]
#> latitude     1 33     33 0.125  WGS 84    NA           NULL [y]
#> time         1  1     NA    NA POSIXct    NA 1999-01-31 UTC

(nc3 <- st_crop(nc3, nc_sf[10, ], collect = TRUE))
#> netcdf source stars proxy object from:
#> [1] "[...]/bcsd_obs_1999.nc"
#> 
#> Available nc variables:
#> pr
#> 
#> dimension(s):
#>           from to offset delta  refsys point         values x/y
#> longitude   37 40    -85 0.125  WGS 84    NA           NULL [x]
#> latitude    27 29     33 0.125  WGS 84    NA           NULL [y]
#> time         1  1     NA    NA POSIXct    NA 1999-01-31 UTC    
#> 
#> call_list:
#> [[1]]
#> st_crop(x = x, y = y, crop = crop, epsilon = epsilon)
#> attr(,".Environment")
#> <environment: 0x000000002b51a060>

# Same outcome but nc3 reads much less data.
(nc2 <- st_as_stars(nc2))
#> Will return stars object with 144 cells.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>             Min.  1st Qu. Median   Mean 3rd Qu.   Max. NA's
#> pr [mm/m] 122.24 122.9675 123.83 125.84 127.805 134.97    4
#> dimension(s):
#>           from to offset delta  refsys point         values x/y
#> longitude    1  4  -80.5 0.125  WGS 84    NA           NULL [x]
#> latitude     1  3  36.25 0.125  WGS 84    NA           NULL [y]
#> time         1  1     NA    NA POSIXct    NA 1999-01-31 UTC
    
(nc3 <- st_as_stars(nc3))
#> Will return stars object with 12 cells.
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>             Min.  1st Qu. Median   Mean 3rd Qu.   Max. NA's
#> pr [mm/m] 122.24 122.9675 123.83 125.84 127.805 134.97    4
#> dimension(s):
#>           from to offset delta  refsys point         values x/y
#> longitude    1  4  -80.5 0.125  WGS 84    NA           NULL [x]
#> latitude     1  3  36.25 0.125  WGS 84    NA           NULL [y]
#> time         1  1     NA    NA POSIXct    NA 1999-01-31 UTC

Created on 2022-01-12 by the reprex package (v2.0.1)

@dblodgett-usgs
Copy link
Contributor Author

I started working with curvilinear coordinates and am realizing there's not much (any?) curvilinear handling in stars_proxy. I want to start building that out for nc_proxy. Is that something you have plans for @edzer ?

@edzer
Copy link
Member

edzer commented Jan 21, 2022

Is that something you have plans for @edzer ?

Absolutely. The low hanging fruit here is where we do read the long and lat matrices in memory - not proxy those -, then proxy the array, and read selections of that in memory when needed (or else loop over chunks).

@dblodgett-usgs
Copy link
Contributor Author

OK -- that's what I was thinking it would do. I'll put some time into it when I get a chance.

@dblodgett-usgs
Copy link
Contributor Author

A little progress -- I can proxy the stageIV precip attributes now. st_as_stars.stars isn't much now that I factored out a stand along add_curvilinear() function.

@edzer
Copy link
Member

edzer commented Mar 8, 2022

David, LGTM - is this ready to go?

@dblodgett-usgs
Copy link
Contributor Author

I certainly have more I would like to accomplish, but I think this is a positive step, so if you are comfortable merging with somewhat limited functionality compared to the long term, yes.

@edzer edzer merged commit e99abcb into r-spatial:main Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants