Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading station data timeseries (NetCDF example) #30

Closed
adrfantini opened this issue Apr 14, 2018 · 32 comments
Closed

Support for reading station data timeseries (NetCDF example) #30

adrfantini opened this issue Apr 14, 2018 · 32 comments

Comments

@adrfantini
Copy link

adrfantini commented Apr 14, 2018

Followup of our discussions at EGU: stars aims at reading station data interpreting the X-Y coordinates correctly as sf point features. I did a bit of digging on the topic, although I can only speak for NetCDF data since that 's what I know best.

The CF conventions (ch. 9) define a set of supported features: point, timeSeries, trajectory, profile, timeSeriesProfile, trajectoryProfile. The convention goes into the details on how these should be defined, see here for timeSeries for example. There are several recommendations and a few mandatory attributes.

Here is how to create an example timeseries file for 10 stations over 20 timesteps using R:

library(ncdf4)

time <- ncdim_def("time"   , "days since 1970-01-01 00:00:00 UTC", vals=as.integer(1:20), longname="time", calendar="gregorian")
stat <- ncdim_def("station", ""                                  , vals=as.integer(1:10), create_dimvar=FALSE)

num  <- ncvar_def("num" , ""              , list(stat)      , longname="Station number", prec="integer")
var  <- ncvar_def("pr"  , "kg m-2 s-1"    , list(time, stat), longname="Total precipitation flux", missval=-10)
lat  <- ncvar_def("lat" , "degrees_north" , list(stat)      , longname="Station latitude")
lon  <- ncvar_def("lon" , "degrees_east"  , list(stat)      , longname="Station longitude")
alt  <- ncvar_def("alt" , "m"             , list(stat)      , longname="Vertical distance above the surface")

nc <- nc_create("timeseries.nc", list(num, var, lat, lon, alt), force_v4=TRUE)

ncvar_put(nc, num, vals=as.integer(1:10))
ncvar_put(nc, var, vals=runif(20 * 10, 0, 100/86400))
ncvar_put(nc, lat, vals=40:49)
ncvar_put(nc, lon, vals=10:19)
ncvar_put(nc, alt, vals=100*(1:10))

ncatt_put(nc, var, "coordinates", "lat lon alt num")
ncatt_put(nc, num, "cf_role", "timeseries_id")
ncatt_put(nc, var, "standard_name", "precipitation_flux")
ncatt_put(nc, lat, "standard_name", "latitude")
ncatt_put(nc, lon, "standard_name", "longitude")
ncatt_put(nc, alt, "standard_name", "height")

ncatt_put(nc, 0, "featureType", "timeSeries")
ncatt_put(nc, 0, "Conventions", "CF-1.7")

nc_close(nc)

The created file has metadata (ncdump -h timeseries.nc):

netcdf timeseries {
dimensions:
        station = 10 ;
        time = 20 ;
variables:
        int num(station) ;
                num:long_name = "Station number" ;
                num:cf_role = "timeseries_id" ;
        int time(time) ;
                time:units = "days since 1970-01-01 00:00:00 UTC" ;
                time:long_name = "time" ;
                time:calendar = "gregorian" ;
        float pr(station, time) ;
                pr:units = "kg m-2 s-1" ;
                pr:_FillValue = -10.f ;
                pr:long_name = "Total precipitation flux" ;
                pr:coordinates = "lat lon alt num" ;
                pr:standard_name = "precipitation_flux" ;
        float lat(station) ;
                lat:units = "degrees_north" ;
                lat:long_name = "Station latitude" ;
                lat:standard_name = "latitude" ;
        float lon(station) ;
                lon:units = "degrees_east" ;
                lon:long_name = "Station longitude" ;
                lon:standard_name = "longitude" ;
        float alt(station) ;
                alt:units = "m" ;
                alt:long_name = "Vertical distance above the surface" ;
                alt:standard_name = "height" ;

// global attributes:
                :featureType = "timeSeries" ;
                :Conventions = "CF-1.7"
}

This AFAICT should be 100% CF-compliant.

stars currently does not understand what to do with this dataset:

library(stars)
(a <- read_stars("timeseries.nc"))
stars object with 2 dimensions and 1 attribute
attribute(s):
 timeseries.nc      
 Min.   :6.871e-07  
 1st Qu.:3.588e-04  
 Median :6.584e-04  
 Mean   :6.104e-04  
 3rd Qu.:8.768e-04  
 Max.   :1.151e-03  
dimension(s):
  from to offset delta refsys point values
x    1 20     NA    NA     NA    NA   NULL
y    1 10     NA    NA     NA    NA   NULL
Warning messages:
1: In CPL_read_gdal(x, options, driver, read_data, NA_value) :
  GDAL Message 1: dimension #1 (time) is not a Longitude/X dimension.
2: In CPL_read_gdal(x, options, driver, read_data, NA_value) :
  GDAL Message 1: dimension #0 (station) is not a Latitude/Y dimension.

Obviously we would want to have single sf point dimension (lat, lon, elevation), plus a time dimension.

Whether this needs to be implemented by GDAL or by stars directly I do not know.

@adrfantini adrfantini changed the title Support for reading station data timeseries Support for reading station data timeseries (NetCDF example) Apr 14, 2018
@mdsumner
Copy link
Member

You probably know this, but just fyi you can see what GDAL thinks of it with gdalinfo at the command line and rgdal::readGDAL in R, and compare raster::brick for a high-level ncdf4 wrapper. Otherwise panoply is a common favourite tool for exploring NetCDF outside of R.

For gdalinfo you also need gdalinfo NETCDF:"timeseries.nc":pr which shows more detail about that particular subdataset rather than the file.

@adrfantini
Copy link
Author

@mdsumner thanks, yes unfortunately as far as I can tell none of the software I use, gdal and panoply included, natively recognizes the created file as a station timeseries file.

@mdsumner
Copy link
Member

mdsumner commented Apr 14, 2018

Ok great, thanks. I think we'll always be able to find examples that GDAL can't handle generally, the worst part is that it has to unroll 3D+ arrays as bands (fields, columns, layers in other words) and anything using it must re-infer that structure. I'll be pushing for more general NetCDF support, and tidync is pretty well suited for this, it's on my radar to spend some time on tidync/stars as a possible way to use NetCDF more directly. The tidync parts used to get stuff into stars will be relatively simple to port out.

The question for stars will be how far to push GDAL as a convenient way to shortcut NetCDF support, versus when we bang into inherent limitations. I think GDAL is not a good choice, we made this mistake in a commercial context several years ago - in hindsight it would have been better to use generic NetCDF support, because ultimately the geographic assumptions about the first two dimensions bite, and having to infer dimensionality from spread-out band metadata is kind of flaky. Interestingly direct use of ncdf4 is what raster did - and that allows a lot more general use than GDAL does across many sources - but raster is lot more limiting than stars aspires to.

@adrfantini
Copy link
Author

From our discussions at EGU I gathered that @edzer is fully aware of GDAL's limitations in this sense and is thinking on how to push this matter forward (upstream?).

As far as NetCDF goes, NetCDF's power and weakness is that its possibilities are almost infinite, I agree there's no way to cover all use-cases. However the CF-Conventions are very complete and useful. In my ideal world I would like to cover all the use-cases listed in the Conventions with stars (and eventually linked packages). Even limiting to that it's a huge™ amount of use-cases tho.

@edzer
Copy link
Member

edzer commented Apr 15, 2018

I had an idle hope that OGR would be able to deal with such data.

I am totally with @mdsumner that it is better to read NetCDF directly rather than through GDAL if we want to represent data cubes generally. I took the GDAL route because it seemed easiest, so far, but it was a lot of work because one needs to get pretty much everything from the GDAL metadata tags. Nasty and messy.

@mdsumner maybe time for a call to discuss aligning stars and tidync?

@mdsumner
Copy link
Member

Yes, I think they are already pretty complementary - I found it smoother than expected to align and a stars-like approach to metadata is definitely a gap for tidync. Here's some examples

ropensci/tidync#68 (comment)

@mdsumner
Copy link
Member

mdsumner commented Apr 16, 2018

A minor addition, my attempts at convenient metadata extraction from NetCDF is ncmeta (on CRAN). It has obvious functions nc_vars, nc_dims, etc. and some extras - all the metadata is extracted with one function nc_meta, but the details in all the attributes (units and so on) are not used in any deep way yet.Performance is pretty good, not as fast as ncdf::nc_open, but way easier to navigate. That's what I'll be exploring next for integrating with stars, how to summarize all that with a units approach.

@edzer
Copy link
Member

edzer commented Apr 16, 2018

Cool! units is one, coordinate reference systems / datums another, and 360-day years a third!

@dblodgett-usgs
Copy link
Contributor

Just starting to wrap my head around stars ... I think I've got some functions over here that might be useful to the package? https://usgs-r.github.io/ncdfgeom/dev/reference/index.html Specifically, the projection handling functions and the read/write point and geometry time series?

@adrfantini
Copy link
Author

adrfantini commented Dec 22, 2018

@dblodgett-usgs That looks very interesting, thanks for the package!
Maybe @mdsumner might be interested in taking a look at it.

@dblodgett-usgs
Copy link
Contributor

Just a point of reference for the discussion here. Recent contributions I've made to ncmeta include logic to identify coordinate variables using COARDS and CF conventions and the ability to get projection information out of a CF grid_mapping variable.

@edzer
Copy link
Member

edzer commented Mar 1, 2019

@dblodgett-usgs is ncdfgeom something you'd like to bring to CRAN, or would you alternatively like to integrate it in some package already on CRAN?

@dblodgett-usgs
Copy link
Contributor

I would like to, but time constraints are heavy. Let me ask around. We usually have to write a peer reviewed paper to release a package but since the package implements the CF conventions -- which was heavily peer reviewed, maybe that will fly.

That said, if there were a package I could contribute to and someone else wanted to be the longer-term owner of it... I would not hesitate to go that way!

@dblodgett-usgs
Copy link
Contributor

I've initiated the review process to get the package approved for CRAN. I'll need to convert it to use RNetCDF and ncmeta, but that won't be hard. Will keep this issue updated on the progress.

@dblodgett-usgs
Copy link
Contributor

FYI, I've got https://usgs-r.github.io/ncdfgeom/ reviewed and did some refactoring to switch over to RNetCDF and ncmeta. I can submit to CRAN now but would be curious if there are dependencies or other issues with the package that need to be cleaned up first? Anyone willing to do a little code / package review? -- it's in flux, uses sf and sp, has some ncdf4 calls in the tests still, etc. But I think it's in a pretty good place. Feedback very welcome.

@dblodgett-usgs
Copy link
Contributor

@mdsumner -- I think I can get ncdfgeom on cran if you can push the latest ncmeta. With that, I can start looking into getting read_ncdf to work with ncdfgeom's timeseries.

@mdsumner
Copy link
Member

I don't think I can do that until stars itself is released - there's a breaking change in read_ncdf and CRAN stars won't pass check. Is it ok to break like that?

@dblodgett-usgs
Copy link
Contributor

Oh, I see. What's the time-line for the dual release?

@edzer
Copy link
Member

edzer commented Apr 16, 2019

Ah, I didn't know. Planning stars submission for the end of this week!

@mdsumner
Copy link
Member

That's cool, been meaning to ask - that's perfect

@edzer
Copy link
Member

edzer commented Apr 19, 2019

FYI - stars submitted to CRAN.

@mohsinrazadanish
Copy link

mohsinrazadanish commented Apr 19, 2019 via email

@adrfantini
Copy link
Author

@dblodgett-usgs I've stumbled across the need for stars to understand timeseries netCDF objects again. Do you happen to have a timeline for this?

@dblodgett-usgs
Copy link
Contributor

I'd love to get to it but a few other projects are blocking me from getting to this. Maybe in August?

@adrfantini
Copy link
Author

I'd love to get to it but a few other projects are blocking me from getting to this. Maybe in August?

It's OK, I just wanted an approx idea on that. No pressure. Thanks a lot!

@dblodgett-usgs
Copy link
Contributor

dblodgett-usgs commented Aug 16, 2019

Starting to play around with this.

The object returned by ncdfgeom::read_timeseries_dsg has basically all we need to create a stars object. I see a couple ways I could implement this.

  1. Add a class to the response from ncdfgeom::read_timeseries_dsg() and write a method to st_as_stars() that class to a stars object.
  2. Add the ability to return a stars object directly from ncdfgeom::read_timeseries_dsg().
  3. Add embedded handling of a call to ncdfgeom::read_timeseries_dsg() to stars::read_ncdf().

There are probably other options? What would be preferred?

I dropped some prototype code below creating stars objects two ways. Is one better or more sustainable than the other? Am I handling geometry correctly? I kind of expected some kind of spatial plotting in the plot method -- am I missing something or is that just what there is so far?

The file is linked to this issue so this code should be runnable as long as you install.packages("ncdfgeom")

download.file("https://github.com/r-spatial/stars/files/3507773/timeseries.nc.zip", destfile = "timeseries.nc.zip")
unzip("timeseries.nc.zip", files = "timeseries.nc")

ts <- ncdfgeom::read_timeseries_dsg("timeseries.nc")

crs <- st_crs(4326)
ts_points <- tibble::tibble(X = ts$lons, Y = ts$lats, Z = ts$alts)
ts_points <- sf::st_as_sf(ts_points, coords = c("X", "Y", "Z"), crs = crs)

data <- ts$data_frames[[1]])
data[["T"]] <- ts$time

dim <- stars:::create_dimensions(c(geometry = nrow(ts_points), 
                                   time = nrow(data)), raster = NULL)

gdim <- stars:::create_dimension(from = 1, to = length(ts$lats), 
                                refsys = crs$proj4string, point = TRUE, 
                                values = ts_points$geometry)
tdim <- stars:::create_dimension(from = 1, to = length(ts$time), 
                                 refsys = "POSIXct", point = FALSE, 
                                 values = as.POSIXct(ts$time))
dim_2 <- stars:::create_dimensions(list(time = tdim, geometry = gdim))

dim$geometry$from <- 1
dim$geometry$to <- length(ts$lats)
dim$geometry$refsys <- crs
dim$geometry$point <- TRUE
dim$geometry$values <- ts_points$geometry

dim$time$from <- 1
dim$time$to <- length(ts$time)
dim$time$refsys <- "POSIXct"
dim$geometry$point <- FALSE
dim$time$values <- as.POSIXct(ts$time)

stars_data <- stars:::st_stars(x = setNames(list(as.matrix(ts$data_frames[[1]])), 
                                            ts$varmeta[[1]]$name), 
                               dimensions = dim)

stars_data_2 <- stars:::st_stars(x = setNames(list(as.matrix(ts$data_frames[[1]])), 
                                              ts$varmeta[[1]]$name), 
                                 dimensions = dim_2)
plot(stars_data$pr)
plot(stars_data$pr$`1`)
plot(st_dimensions(stars_data)$geometry$values)
plot(stars_data_2$pr)
plot(stars_data_2$pr$`1`)
plot(st_dimensions(stars_data_2)$geometry$values)

timeseries.nc.zip

@edzer
Copy link
Member

edzer commented Aug 16, 2019

As of your question, I'd suggest both 1 and 3. 1 will allow others to do similar things on potentially more simple data structures; with 1, 3 becomes easy. The st_as_stars method could go into stars, or in ncdfgeom but would there create a dependency on stars (for importing the generic).

I did some heavy editing in your script, which now runs kind of:

  • raster objects are a list of matrix or array, not data.frame
  • $values for spatial must be of class sfc, not sf
  • refsys is a proj4string, not a crs object (although we could allow both; the problem is that some proj4strings cannot be passed to st_crs, since that validates through GDAL, not PROJ)
  • the matrix with array values has no dimnames, maybe it should
  • create_dimensions is needed to create a dimensions object

Plots are still work to do; there's quite some stuff here but that was all lattice based, we'd now go the ggplot2 path.

@dblodgett-usgs
Copy link
Contributor

Thanks for the guidance on how to get this working. I just opened a PR that seems to be the core of it. The issue I'm going to have is that st_stars, create_dimension and create_dimensions aren't exported by stars. I suppose this method could just be contributed directly to stars? There's nothing other than the ncdfgeom class that is specific to the ncdfgeom package.

The next step here would be to have read_ncdf call out to ncdfgeom in the case that it finds a NetCDF-CF DSG file. I don't want to get ahead of myself with the writer functionality, but that will be easy to add to with the writer functions in ncdfgeom.

@edzer
Copy link
Member

edzer commented Aug 27, 2019

Looks good! I guess that the st_as_stars method then should go into stars, unless you can bring up a strong argument to export these three functions - I'd be happy to adopt it!

@dblodgett-usgs
Copy link
Contributor

OK. No, I don't think those should be exported unless you want an ecosystem of packages that can build stars objects easily.

So if I added the st_as_stars.ncdfgeom method in "stars.R" and called out to ncdfgeom::read_timeries_dsg from read_ncdf you are ok with it? That would mean a dependency on ncdfgeom in stars. Just want to make sure that is an ok path with you.

@edzer
Copy link
Member

edzer commented Aug 27, 2019

Yes, but we keep it in Suggests: -- see e.g. file raster.R how this is done with the raster package (and many others): use requireNamespace(...), and prefix functions used with ncdfgeom::

@adrfantini
Copy link
Author

adrfantini commented Aug 29, 2019

I quickly tested this and it works for me, although I only tried with a couple of files, both created by me with correct CF attributes.

I guess we'll also need a method for plotting this kind of stars objects. Currently I get Error in names(df) <- as.character(e[-ix][[1]]) : 'names' attribute [20] must be the same length as the vector [10] when plotting the file created in the first message of this issue.
geom_stars also plots nothing.

Also I'm sorry I've been quite absent recently - i'm afraid this is not going to change soon. I'll start with a new job shortly and it is likely won't use stars as often, although this will depend on a few factors.

@edzer edzer closed this as completed in 1234182 Sep 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants