Support for reading station data timeseries (NetCDF example) #30

adrfantini · 2018-04-14T13:23:25Z

Followup of our discussions at EGU: stars aims at reading station data interpreting the X-Y coordinates correctly as sf point features. I did a bit of digging on the topic, although I can only speak for NetCDF data since that 's what I know best.

The CF conventions (ch. 9) define a set of supported features: point, timeSeries, trajectory, profile, timeSeriesProfile, trajectoryProfile. The convention goes into the details on how these should be defined, see here for timeSeries for example. There are several recommendations and a few mandatory attributes.

Here is how to create an example timeseries file for 10 stations over 20 timesteps using R:

library(ncdf4)

time <- ncdim_def("time"   , "days since 1970-01-01 00:00:00 UTC", vals=as.integer(1:20), longname="time", calendar="gregorian")
stat <- ncdim_def("station", ""                                  , vals=as.integer(1:10), create_dimvar=FALSE)

num  <- ncvar_def("num" , ""              , list(stat)      , longname="Station number", prec="integer")
var  <- ncvar_def("pr"  , "kg m-2 s-1"    , list(time, stat), longname="Total precipitation flux", missval=-10)
lat  <- ncvar_def("lat" , "degrees_north" , list(stat)      , longname="Station latitude")
lon  <- ncvar_def("lon" , "degrees_east"  , list(stat)      , longname="Station longitude")
alt  <- ncvar_def("alt" , "m"             , list(stat)      , longname="Vertical distance above the surface")

nc <- nc_create("timeseries.nc", list(num, var, lat, lon, alt), force_v4=TRUE)

ncvar_put(nc, num, vals=as.integer(1:10))
ncvar_put(nc, var, vals=runif(20 * 10, 0, 100/86400))
ncvar_put(nc, lat, vals=40:49)
ncvar_put(nc, lon, vals=10:19)
ncvar_put(nc, alt, vals=100*(1:10))

ncatt_put(nc, var, "coordinates", "lat lon alt num")
ncatt_put(nc, num, "cf_role", "timeseries_id")
ncatt_put(nc, var, "standard_name", "precipitation_flux")
ncatt_put(nc, lat, "standard_name", "latitude")
ncatt_put(nc, lon, "standard_name", "longitude")
ncatt_put(nc, alt, "standard_name", "height")

ncatt_put(nc, 0, "featureType", "timeSeries")
ncatt_put(nc, 0, "Conventions", "CF-1.7")

nc_close(nc)

The created file has metadata (ncdump -h timeseries.nc):

netcdf timeseries {
dimensions:
        station = 10 ;
        time = 20 ;
variables:
        int num(station) ;
                num:long_name = "Station number" ;
                num:cf_role = "timeseries_id" ;
        int time(time) ;
                time:units = "days since 1970-01-01 00:00:00 UTC" ;
                time:long_name = "time" ;
                time:calendar = "gregorian" ;
        float pr(station, time) ;
                pr:units = "kg m-2 s-1" ;
                pr:_FillValue = -10.f ;
                pr:long_name = "Total precipitation flux" ;
                pr:coordinates = "lat lon alt num" ;
                pr:standard_name = "precipitation_flux" ;
        float lat(station) ;
                lat:units = "degrees_north" ;
                lat:long_name = "Station latitude" ;
                lat:standard_name = "latitude" ;
        float lon(station) ;
                lon:units = "degrees_east" ;
                lon:long_name = "Station longitude" ;
                lon:standard_name = "longitude" ;
        float alt(station) ;
                alt:units = "m" ;
                alt:long_name = "Vertical distance above the surface" ;
                alt:standard_name = "height" ;

// global attributes:
                :featureType = "timeSeries" ;
                :Conventions = "CF-1.7"
}

This AFAICT should be 100% CF-compliant.

stars currently does not understand what to do with this dataset:

library(stars)
(a <- read_stars("timeseries.nc"))
stars object with 2 dimensions and 1 attribute
attribute(s):
 timeseries.nc      
 Min.   :6.871e-07  
 1st Qu.:3.588e-04  
 Median :6.584e-04  
 Mean   :6.104e-04  
 3rd Qu.:8.768e-04  
 Max.   :1.151e-03  
dimension(s):
  from to offset delta refsys point values
x    1 20     NA    NA     NA    NA   NULL
y    1 10     NA    NA     NA    NA   NULL
Warning messages:
1: In CPL_read_gdal(x, options, driver, read_data, NA_value) :
  GDAL Message 1: dimension #1 (time) is not a Longitude/X dimension.
2: In CPL_read_gdal(x, options, driver, read_data, NA_value) :
  GDAL Message 1: dimension #0 (station) is not a Latitude/Y dimension.

Obviously we would want to have single sf point dimension (lat, lon, elevation), plus a time dimension.

Whether this needs to be implemented by GDAL or by stars directly I do not know.

The text was updated successfully, but these errors were encountered:

mdsumner · 2018-04-14T14:04:32Z

You probably know this, but just fyi you can see what GDAL thinks of it with gdalinfo at the command line and rgdal::readGDAL in R, and compare raster::brick for a high-level ncdf4 wrapper. Otherwise panoply is a common favourite tool for exploring NetCDF outside of R.

For gdalinfo you also need gdalinfo NETCDF:"timeseries.nc":pr which shows more detail about that particular subdataset rather than the file.

adrfantini · 2018-04-14T17:16:23Z

@mdsumner thanks, yes unfortunately as far as I can tell none of the software I use, gdal and panoply included, natively recognizes the created file as a station timeseries file.

mdsumner · 2018-04-14T23:35:54Z

Ok great, thanks. I think we'll always be able to find examples that GDAL can't handle generally, the worst part is that it has to unroll 3D+ arrays as bands (fields, columns, layers in other words) and anything using it must re-infer that structure. I'll be pushing for more general NetCDF support, and tidync is pretty well suited for this, it's on my radar to spend some time on tidync/stars as a possible way to use NetCDF more directly. The tidync parts used to get stuff into stars will be relatively simple to port out.

The question for stars will be how far to push GDAL as a convenient way to shortcut NetCDF support, versus when we bang into inherent limitations. I think GDAL is not a good choice, we made this mistake in a commercial context several years ago - in hindsight it would have been better to use generic NetCDF support, because ultimately the geographic assumptions about the first two dimensions bite, and having to infer dimensionality from spread-out band metadata is kind of flaky. Interestingly direct use of ncdf4 is what raster did - and that allows a lot more general use than GDAL does across many sources - but raster is lot more limiting than stars aspires to.

adrfantini · 2018-04-15T08:01:27Z

From our discussions at EGU I gathered that @edzer is fully aware of GDAL's limitations in this sense and is thinking on how to push this matter forward (upstream?).

As far as NetCDF goes, NetCDF's power and weakness is that its possibilities are almost infinite, I agree there's no way to cover all use-cases. However the CF-Conventions are very complete and useful. In my ideal world I would like to cover all the use-cases listed in the Conventions with stars (and eventually linked packages). Even limiting to that it's a huge™ amount of use-cases tho.

edzer · 2018-04-15T19:11:59Z

I had an idle hope that OGR would be able to deal with such data.

I am totally with @mdsumner that it is better to read NetCDF directly rather than through GDAL if we want to represent data cubes generally. I took the GDAL route because it seemed easiest, so far, but it was a lot of work because one needs to get pretty much everything from the GDAL metadata tags. Nasty and messy.

@mdsumner maybe time for a call to discuss aligning stars and tidync?

mdsumner · 2018-04-15T19:50:01Z

Yes, I think they are already pretty complementary - I found it smoother than expected to align and a stars-like approach to metadata is definitely a gap for tidync. Here's some examples

ropensci/tidync#68 (comment)

mdsumner · 2018-04-16T00:30:13Z

A minor addition, my attempts at convenient metadata extraction from NetCDF is ncmeta (on CRAN). It has obvious functions nc_vars, nc_dims, etc. and some extras - all the metadata is extracted with one function nc_meta, but the details in all the attributes (units and so on) are not used in any deep way yet.Performance is pretty good, not as fast as ncdf::nc_open, but way easier to navigate. That's what I'll be exploring next for integrating with stars, how to summarize all that with a units approach.

edzer · 2018-04-16T08:32:39Z

Cool! units is one, coordinate reference systems / datums another, and 360-day years a third!

dblodgett-usgs · 2018-12-21T14:57:39Z

Just starting to wrap my head around stars ... I think I've got some functions over here that might be useful to the package? https://usgs-r.github.io/ncdfgeom/dev/reference/index.html Specifically, the projection handling functions and the read/write point and geometry time series?

adrfantini · 2018-12-22T11:46:50Z

@dblodgett-usgs That looks very interesting, thanks for the package!
Maybe @mdsumner might be interested in taking a look at it.

dblodgett-usgs · 2019-01-04T14:27:39Z

Just a point of reference for the discussion here. Recent contributions I've made to ncmeta include logic to identify coordinate variables using COARDS and CF conventions and the ability to get projection information out of a CF grid_mapping variable.

edzer · 2019-03-01T19:32:40Z

@dblodgett-usgs is ncdfgeom something you'd like to bring to CRAN, or would you alternatively like to integrate it in some package already on CRAN?

dblodgett-usgs · 2019-03-03T03:39:20Z

I would like to, but time constraints are heavy. Let me ask around. We usually have to write a peer reviewed paper to release a package but since the package implements the CF conventions -- which was heavily peer reviewed, maybe that will fly.

That said, if there were a package I could contribute to and someone else wanted to be the longer-term owner of it... I would not hesitate to go that way!

dblodgett-usgs · 2019-03-05T22:40:59Z

I've initiated the review process to get the package approved for CRAN. I'll need to convert it to use RNetCDF and ncmeta, but that won't be hard. Will keep this issue updated on the progress.

dblodgett-usgs · 2019-03-27T00:56:12Z

FYI, I've got https://usgs-r.github.io/ncdfgeom/ reviewed and did some refactoring to switch over to RNetCDF and ncmeta. I can submit to CRAN now but would be curious if there are dependencies or other issues with the package that need to be cleaned up first? Anyone willing to do a little code / package review? -- it's in flux, uses sf and sp, has some ncdf4 calls in the tests still, etc. But I think it's in a pretty good place. Feedback very welcome.

dblodgett-usgs · 2019-04-15T19:45:47Z

@mdsumner -- I think I can get ncdfgeom on cran if you can push the latest ncmeta. With that, I can start looking into getting read_ncdf to work with ncdfgeom's timeseries.

mdsumner · 2019-04-15T21:30:43Z

I don't think I can do that until stars itself is released - there's a breaking change in read_ncdf and CRAN stars won't pass check. Is it ok to break like that?

dblodgett-usgs · 2019-04-15T23:26:24Z

Oh, I see. What's the time-line for the dual release?

edzer · 2019-04-16T07:04:36Z

Ah, I didn't know. Planning stars submission for the end of this week!

mdsumner · 2019-04-16T07:54:56Z

That's cool, been meaning to ask - that's perfect

edzer · 2019-04-19T15:26:54Z

FYI - stars submitted to CRAN.

mohsinrazadanish · 2019-04-19T15:45:51Z

That's good to know. I shall be looking forward to test it. Regards, Mohsin Raza

…

On Fri, Apr 19, 2019 at 5:26 PM Edzer Pebesma ***@***.***> wrote: FYI - stars submitted to CRAN. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACLLHGLMQA53XHSTXCVJY5LPRHQD7ANCNFSM4E2UIPEA> .

adrfantini · 2019-06-30T13:21:20Z

@dblodgett-usgs I've stumbled across the need for stars to understand timeseries netCDF objects again. Do you happen to have a timeline for this?

dblodgett-usgs · 2019-06-30T14:59:25Z

I'd love to get to it but a few other projects are blocking me from getting to this. Maybe in August?

adrfantini · 2019-06-30T15:09:56Z

I'd love to get to it but a few other projects are blocking me from getting to this. Maybe in August?

It's OK, I just wanted an approx idea on that. No pressure. Thanks a lot!

dblodgett-usgs · 2019-08-16T02:25:37Z

Starting to play around with this.

The object returned by ncdfgeom::read_timeseries_dsg has basically all we need to create a stars object. I see a couple ways I could implement this.

Add a class to the response from ncdfgeom::read_timeseries_dsg() and write a method to st_as_stars() that class to a stars object.
Add the ability to return a stars object directly from ncdfgeom::read_timeseries_dsg().
Add embedded handling of a call to ncdfgeom::read_timeseries_dsg() to stars::read_ncdf().

There are probably other options? What would be preferred?

I dropped some prototype code below creating stars objects two ways. Is one better or more sustainable than the other? Am I handling geometry correctly? I kind of expected some kind of spatial plotting in the plot method -- am I missing something or is that just what there is so far?

The file is linked to this issue so this code should be runnable as long as you install.packages("ncdfgeom")

download.file("https://github.com/r-spatial/stars/files/3507773/timeseries.nc.zip", destfile = "timeseries.nc.zip")
unzip("timeseries.nc.zip", files = "timeseries.nc")

ts <- ncdfgeom::read_timeseries_dsg("timeseries.nc")

crs <- st_crs(4326)
ts_points <- tibble::tibble(X = ts$lons, Y = ts$lats, Z = ts$alts)
ts_points <- sf::st_as_sf(ts_points, coords = c("X", "Y", "Z"), crs = crs)

data <- ts$data_frames[[1]])
data[["T"]] <- ts$time

dim <- stars:::create_dimensions(c(geometry = nrow(ts_points), 
                                   time = nrow(data)), raster = NULL)

gdim <- stars:::create_dimension(from = 1, to = length(ts$lats), 
                                refsys = crs$proj4string, point = TRUE, 
                                values = ts_points$geometry)
tdim <- stars:::create_dimension(from = 1, to = length(ts$time), 
                                 refsys = "POSIXct", point = FALSE, 
                                 values = as.POSIXct(ts$time))
dim_2 <- stars:::create_dimensions(list(time = tdim, geometry = gdim))

dim$geometry$from <- 1
dim$geometry$to <- length(ts$lats)
dim$geometry$refsys <- crs
dim$geometry$point <- TRUE
dim$geometry$values <- ts_points$geometry

dim$time$from <- 1
dim$time$to <- length(ts$time)
dim$time$refsys <- "POSIXct"
dim$geometry$point <- FALSE
dim$time$values <- as.POSIXct(ts$time)

stars_data <- stars:::st_stars(x = setNames(list(as.matrix(ts$data_frames[[1]])), 
                                            ts$varmeta[[1]]$name), 
                               dimensions = dim)

stars_data_2 <- stars:::st_stars(x = setNames(list(as.matrix(ts$data_frames[[1]])), 
                                              ts$varmeta[[1]]$name), 
                                 dimensions = dim_2)
plot(stars_data$pr)
plot(stars_data$pr$`1`)
plot(st_dimensions(stars_data)$geometry$values)
plot(stars_data_2$pr)
plot(stars_data_2$pr$`1`)
plot(st_dimensions(stars_data_2)$geometry$values)

timeseries.nc.zip

edzer · 2019-08-16T15:16:04Z

As of your question, I'd suggest both 1 and 3. 1 will allow others to do similar things on potentially more simple data structures; with 1, 3 becomes easy. The st_as_stars method could go into stars, or in ncdfgeom but would there create a dependency on stars (for importing the generic).

I did some heavy editing in your script, which now runs kind of:

raster objects are a list of matrix or array, not data.frame
$values for spatial must be of class sfc, not sf
refsys is a proj4string, not a crs object (although we could allow both; the problem is that some proj4strings cannot be passed to st_crs, since that validates through GDAL, not PROJ)
the matrix with array values has no dimnames, maybe it should
create_dimensions is needed to create a dimensions object

Plots are still work to do; there's quite some stuff here but that was all lattice based, we'd now go the ggplot2 path.

dblodgett-usgs · 2019-08-27T14:02:26Z

Thanks for the guidance on how to get this working. I just opened a PR that seems to be the core of it. The issue I'm going to have is that st_stars, create_dimension and create_dimensions aren't exported by stars. I suppose this method could just be contributed directly to stars? There's nothing other than the ncdfgeom class that is specific to the ncdfgeom package.

The next step here would be to have read_ncdf call out to ncdfgeom in the case that it finds a NetCDF-CF DSG file. I don't want to get ahead of myself with the writer functionality, but that will be easy to add to with the writer functions in ncdfgeom.

edzer · 2019-08-27T15:19:34Z

Looks good! I guess that the st_as_stars method then should go into stars, unless you can bring up a strong argument to export these three functions - I'd be happy to adopt it!

dblodgett-usgs · 2019-08-27T17:52:43Z

OK. No, I don't think those should be exported unless you want an ecosystem of packages that can build stars objects easily.

So if I added the st_as_stars.ncdfgeom method in "stars.R" and called out to ncdfgeom::read_timeries_dsg from read_ncdf you are ok with it? That would mean a dependency on ncdfgeom in stars. Just want to make sure that is an ok path with you.

edzer · 2019-08-27T21:48:12Z

Yes, but we keep it in Suggests: -- see e.g. file raster.R how this is done with the raster package (and many others): use requireNamespace(...), and prefix functions used with ncdfgeom::

adrfantini · 2019-08-29T08:47:18Z

I quickly tested this and it works for me, although I only tried with a couple of files, both created by me with correct CF attributes.

I guess we'll also need a method for plotting this kind of stars objects. Currently I get Error in names(df) <- as.character(e[-ix][[1]]) : 'names' attribute [20] must be the same length as the vector [10] when plotting the file created in the first message of this issue.
geom_stars also plots nothing.

Also I'm sorry I've been quite absent recently - i'm afraid this is not going to change soon. I'll start with a new job shortly and it is likely won't use stars as often, although this will depend on a few factors.

adrfantini changed the title ~~Support for reading station data timeseries~~ Support for reading station data timeseries (NetCDF example) Apr 14, 2018

adrfantini mentioned this issue Mar 4, 2019

Handle non-canonical NetCDF axis order #89

Closed

dblodgett-usgs mentioned this issue Apr 19, 2019

No more sp, rgdal, or rgeos. + heavy refactor to improve code style. DOI-USGS/ncdfgeom#58

Merged

adrfantini mentioned this issue Jun 30, 2019

length(geotransform) == 6 is not TRUE when setting dimension as sfc column #190

Closed

dblodgett-usgs mentioned this issue Aug 9, 2019

read_ncdf refactor and improvements #205

Merged

dblodgett-usgs mentioned this issue Aug 28, 2019

Time series and geometry from ncdfgeom. #215

Merged

edzer closed this as completed in 1234182 Sep 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for reading station data timeseries (NetCDF example) #30

Support for reading station data timeseries (NetCDF example) #30

adrfantini commented Apr 14, 2018 •

edited

Loading

mdsumner commented Apr 14, 2018

adrfantini commented Apr 14, 2018

mdsumner commented Apr 14, 2018 •

edited

Loading

adrfantini commented Apr 15, 2018

edzer commented Apr 15, 2018

mdsumner commented Apr 15, 2018

mdsumner commented Apr 16, 2018 •

edited

Loading

edzer commented Apr 16, 2018

dblodgett-usgs commented Dec 21, 2018

adrfantini commented Dec 22, 2018 •

edited

Loading

dblodgett-usgs commented Jan 4, 2019

edzer commented Mar 1, 2019

dblodgett-usgs commented Mar 3, 2019

dblodgett-usgs commented Mar 5, 2019

dblodgett-usgs commented Mar 27, 2019

dblodgett-usgs commented Apr 15, 2019

mdsumner commented Apr 15, 2019

dblodgett-usgs commented Apr 15, 2019

edzer commented Apr 16, 2019

mdsumner commented Apr 16, 2019

edzer commented Apr 19, 2019

mohsinrazadanish commented Apr 19, 2019 via email

adrfantini commented Jun 30, 2019

dblodgett-usgs commented Jun 30, 2019

adrfantini commented Jun 30, 2019

dblodgett-usgs commented Aug 16, 2019 •

edited by edzer

Loading

edzer commented Aug 16, 2019

dblodgett-usgs commented Aug 27, 2019

edzer commented Aug 27, 2019

dblodgett-usgs commented Aug 27, 2019

edzer commented Aug 27, 2019

adrfantini commented Aug 29, 2019 •

edited

Loading

Support for reading station data timeseries (NetCDF example) #30

Support for reading station data timeseries (NetCDF example) #30

Comments

adrfantini commented Apr 14, 2018 • edited Loading

mdsumner commented Apr 14, 2018

adrfantini commented Apr 14, 2018

mdsumner commented Apr 14, 2018 • edited Loading

adrfantini commented Apr 15, 2018

edzer commented Apr 15, 2018

mdsumner commented Apr 15, 2018

mdsumner commented Apr 16, 2018 • edited Loading

edzer commented Apr 16, 2018

dblodgett-usgs commented Dec 21, 2018

adrfantini commented Dec 22, 2018 • edited Loading

dblodgett-usgs commented Jan 4, 2019

edzer commented Mar 1, 2019

dblodgett-usgs commented Mar 3, 2019

dblodgett-usgs commented Mar 5, 2019

dblodgett-usgs commented Mar 27, 2019

dblodgett-usgs commented Apr 15, 2019

mdsumner commented Apr 15, 2019

dblodgett-usgs commented Apr 15, 2019

edzer commented Apr 16, 2019

mdsumner commented Apr 16, 2019

edzer commented Apr 19, 2019

mohsinrazadanish commented Apr 19, 2019 via email

adrfantini commented Jun 30, 2019

dblodgett-usgs commented Jun 30, 2019

adrfantini commented Jun 30, 2019

dblodgett-usgs commented Aug 16, 2019 • edited by edzer Loading

edzer commented Aug 16, 2019

dblodgett-usgs commented Aug 27, 2019

edzer commented Aug 27, 2019

dblodgett-usgs commented Aug 27, 2019

edzer commented Aug 27, 2019

adrfantini commented Aug 29, 2019 • edited Loading

adrfantini commented Apr 14, 2018 •

edited

Loading

mdsumner commented Apr 14, 2018 •

edited

Loading

mdsumner commented Apr 16, 2018 •

edited

Loading

adrfantini commented Dec 22, 2018 •

edited

Loading

dblodgett-usgs commented Aug 16, 2019 •

edited by edzer

Loading

adrfantini commented Aug 29, 2019 •

edited

Loading