Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinates with bounds not correctly read for ncdf #175

Closed
edzer opened this issue May 2, 2019 · 19 comments
Closed

coordinates with bounds not correctly read for ncdf #175

edzer opened this issue May 2, 2019 · 19 comments

Comments

@edzer
Copy link
Member

edzer commented May 2, 2019

See here: https://stat.ethz.ch/pipermail/r-sig-geo/2019-May/027288.html

When read with r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1))) this gives an object with invalid dimensions (not cut to count).

@mdsumner
Copy link
Member

mdsumner commented May 3, 2019

I don't see the problem?

f <- "tas_rcp85_land-rcm_eur_12km_01_mon_198012-208011.nc" ## 850 Mb, requires login, and manual download
 read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1)))
stars object with 4 dimensions and 1 attribute
attribute(s):
    tasC]
 Min.   :-3.648
 1st Qu.: 9.904
 Median :12.657
 Mean   :12.235
 3rd Qu.:15.287
 Max.   :20.234
dimension(s):
                from  to     offset   delta refsys point
grid_longitude     1 100         NA      NA     NA FALSE
grid_latitude      1 100         NA      NA     NA FALSE
time               1   3 1980-12-16 30 days  PCICt    NA
ensemble_member    1   1         NA      NA     NA    NA
                                            values
grid_longitude  [331.9,332.01),...,[377.77,377.88) [x]
grid_latitude     [-23.1,-22.99),...,[21.45,21.56) [y]
time                                          NULL
ensemble_member                                  1
> str(read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1))))
List of 1
 $ tas:Object of class units:
 num [1:100, 1:100, 1:3, 1] 14.8 14.9 15 15.1 15.1 ...
  ..- attr(*, "units")=List of 2
  .. ..$ numerator  : chr "°C"
  .. ..$ denominator: chr(0)
  .. ..- attr(*, "class")= chr "symbolic_units"



sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] stars_0.3-1 sf_0.7-4    abind_1.4-5

@mdsumner
Copy link
Member

mdsumner commented May 3, 2019

Another thing is that these grid_longitude/grid_latitude vectors are only rectilinear via numeric noise, noise that raster ignores fwiw. I haven't tried dealing with the rotated grid part at all ...

@edzer
Copy link
Member Author

edzer commented May 3, 2019

If you safe the object to r, then

> dim(st_dimensions(r))
 grid_longitude   grid_latitude            time ensemble_member 
            418             406               3               1 
> dim(r)
 grid_longitude   grid_latitude            time ensemble_member 
            100             100               3               1 

and trying to plot it breaks (for this reason, I guess). This is a mess I created, because I extended code to read in dimension bounds, but ignored dimension selection after doing so.

@edzer
Copy link
Member Author

edzer commented May 3, 2019

btw raster will also ignore it if it is not noise, because it doesn't support rectilinear. But indeed:

> r= read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(100, 100, 3, 1)), eps=1e-3)
> r
stars object with 4 dimensions and 1 attribute
attribute(s):
    tasC]     
 Min.   :-3.648  
 1st Qu.: 9.904  
 Median :12.657  
 Mean   :12.235  
 3rd Qu.:15.287  
 Max.   :20.234  
dimension(s):
                from  to     offset   delta refsys point values    
grid_longitude     1 100      331.9    0.11     NA    NA   NULL [x]
grid_latitude      1 100      -23.1    0.11     NA    NA   NULL [y]
time               1   3 1980-12-16 30 days  PCICt    NA   NULL    
ensemble_member    1   1         NA      NA     NA    NA      1    

@mdsumner
Copy link
Member

mdsumner commented May 3, 2019

Oh I see, I have similar todo with read_ncdf and proxy

With raster I guess it has a coarser eps, definitely agree with how stars behaves there. I think generally it's when netcdf coordinates are stored in float but need double (degenerate rectilinear so to say)

1 similar comment
@mdsumner
Copy link
Member

mdsumner commented May 3, 2019

Oh I see, I have similar todo with read_ncdf and proxy

With raster I guess it has a coarser eps, definitely agree with how stars behaves there. I think generally it's when netcdf coordinates are stored in float but need double (degenerate rectilinear so to say)

@dblodgett-usgs
Copy link
Contributor

For the record, I've found this to be very common in NetCDF lat/lon coordinates. Over in intersectr I actually just added a "regularize" flag to make it easy for users to say, "no really, this is a regular coordinate variable." rather than making people guess about an appropriate e-# epsilon input. Not sure that's an approach worth taking here, but it seems like a friendly way to handle it.

@edzer
Copy link
Member Author

edzer commented May 3, 2019

I'd be OK. But also eps should have a better default than 1e-12.

@mdsumner
Copy link
Member

mdsumner commented May 3, 2019

It's the broken cases that muck it up, all regular but the first, or last - just broken because it should have been an offset/scale, where intent is clear

@edzer
Copy link
Member Author

edzer commented May 3, 2019

This is what I can get:

f = "tas_rcp85_land-rcm_eur_12km_01_mon_198012-208011.nc"
library(stars)
r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(418, 406, 3, 1)), eps=1e-3)
rx = read_stars(f, proxy = TRUE) # only for the crs!
st_crs(r) = st_crs(rx)
r0 = stars:::st_transform_proj.stars(r, 4326)
png("x.png")
plot(r0[,,,1], border = NA, axes = TRUE, reset = FALSE)
library(rnaturalearth)
plot(ne_coastline(returnclass = "sf"), add = TRUE, col = 'orange')

x

(now with units in plot title: fdc5613)

@mdsumner
Copy link
Member

mdsumner commented May 3, 2019

Huh, that's not what I was expecting! So

  • the original read (r = read_ncdf) is straightforward with a geotransform (but complex crs)
  • the grid_longitude/grid_latitude are degenerate rectilinear (trivially converted to offset/scale, assuming eps)
  • those regular coordinates are expanded out to every pixel (X-Y)
  • under transformation from rotated pole - they are curvilinear

That's why ob_trans? They rotate the pole so it's close enough to regular at a local equator? (this is a new one for me)

@edzer
Copy link
Member Author

edzer commented May 3, 2019

A similar case was found in #52

@adrfantini
Copy link

That's why ob_trans? They rotate the pole so it's close enough to regular at a local equator? (this is a new one for me)

Yep! Not the best approach imho, but it's simpler to implement than proper projected coordinates, which not all models have.

@mdsumner
Copy link
Member

mdsumner commented May 4, 2019

Funny definition of "simpler". It's definitely cultural, arbitrary ;)

@adrfantini
Copy link

Funny definition of "simpler". It's definitely cultural, arbitrary ;)

Remember that most climate scientists still use GrADS for plotting...

@maurizio85
Copy link

Dear all, many thanks for your quick replies. Anyway I still have some differences with your output, as well as some errors in the R console. Actually, and practically speaking, in my .png plot the origin of the axes is still close to Germany and the left and right borders of the image are still rectilinear.

Here my R output using your code with the sessionInfo() at the end. I hope it helps
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
[...removed text...]

f = 'tas_rcp85_land-rcm_eur_12km_01_mon_198012-208011.nc'
library(stars)
Loading required package: abind
Loading required package: sf
Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(418, 406, 3, 1)), eps=1e-3)
rx = read_stars(f, proxy = TRUE) # only for the crs!
Warning messages:
1: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
GDAL Message 1: The dataset has several variables that could be identified as vector fields, but not all share the same primary dimension. Consequently they will be ignored.
2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
GDAL Message 1: dimension #1 (time) is not a Time dimension.
3: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver), :
GDAL Message 1: dimension #0 (ensemble_member) is not a Time dimension.
st_crs(r) = st_crs(rx)
r0 = stars:::st_transform_proj.stars(r, 4326)
plot(r0[,,,1], border = NA, axes = TRUE, reset = FALSE)
Warning messages:
1: In classInt::classIntervals(na.omit(values), min(nbreaks - 1, n.unq), :
N is large, and some styles will run very slowly; sampling imposed
2: In st_is_longlat(x) :
bounding box has potentially an invalid value range for longlat data
3: In st_is_longlat(x) :
bounding box has potentially an invalid value range for longlat data
4: In st_is_longlat(x) :
bounding box has potentially an invalid value range for longlat data
5: In st_is_longlat(x) :
bounding box has potentially an invalid value range for longlat data
6: In st_is_longlat(x) :
bounding box has potentially an invalid value range for longlat data
library(rnaturalearth)
plot(ne_coastline(returnclass = "sf"), add = TRUE, col = 'orange')
Warning message:
In plot.sf(ne_coastline(returnclass = "sf"), add = TRUE, col = "orange") :
ignoring all but the first attribute
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rnaturalearth_0.1.0 stars_0.3-1 sf_0.7-4 abind_1.4-5

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 magrittr_1.5 units_0.6-3 tidyselect_0.2.5 lattice_0.20-38 R6_2.4.0 rlang_0.3.4
[8] rnaturalearthdata_0.2.0 dplyr_0.8.0.1 tools_3.6.0 parallel_3.6.0 grid_3.6.0 KernSmooth_2.23-15 ncmeta_0.0.4
[15] e1071_1.7-1 DBI_1.0.0 class_7.3-15 assertthat_0.2.1 tibble_2.1.1 RNetCDF_1.9-1 crayon_1.3.4
[22] tidyr_0.8.3 purrr_0.3.2 PCICt_0.5-4.1 glue_1.3.1 sp_1.3-1 compiler_3.6.0 pillar_1.3.1
[29] classInt_0.3-3 pkgconfig_2.0.2

@dblodgett-usgs
Copy link
Contributor

Any chance we can get a test file for this issue? Also, it seems that the issue title is no longer accurate? I'd like to test this against my latest work on read_ncdf and harden up the bounds handling but I'm not quite sure what the issue really is and would rather not sign up to download the sample data.

@edzer
Copy link
Member Author

edzer commented Aug 16, 2019

I sent you a link to the file over email.

@dblodgett-usgs
Copy link
Contributor

dblodgett-usgs commented Aug 16, 2019

OK -- I see now. This should be a good opportunity to take the next step with what I've been thinking about to get the NetCDF dimensions, coordinate variables, and canonical stars axes aligned.

I've added a couple things to the dims tibble that will help here and have been planning on building it up so that we can use it for NetCDF proxy objects that support spatio-temporal subsetting. The trick is that you have to subset the coordinate variables, transfer that to NetCDF dimension indexes, then transfer the index subset to data variable access. Then you've got to make sure the shape of the returned data is appropriate for the stars dimension object. -- it's a lot of complexity but I think I've got it more or less handled. I've got a few other things in motion here but think I can work up a fix for this next week.

For the record, the dims tibble looks like:
dims

# A tibble: 4 x 7
     id name            length unlim coord_dim coord_var      axis 
  <dbl> <chr>            <dbl> <lgl> <lgl>     <chr>          <chr>
1     3 grid_longitude     418 FALSE TRUE      grid_longitude X    
2     2 grid_latitude      406 FALSE TRUE      grid_latitude  Y    
3     1 time              1200 FALSE TRUE      ""             Z    
4     0 ensemble_member      1 FALSE TRUE      ""             T    

@edzer edzer closed this as completed Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants