Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the sgrid variable API #76

Open
ocefpaf opened this issue Jan 21, 2016 · 16 comments
Open

Improve the sgrid variable API #76

ocefpaf opened this issue Jan 21, 2016 · 16 comments

Comments

@ocefpaf
Copy link
Member

ocefpaf commented Jan 21, 2016

@kwilcox mentioned in #72

sci-wms needs to know too much information about what an sgrid is:

Anything that closes the implementation gap between pysgrid and potential users. I'd love it if sci-wms never had:

  • Rotate angles
  • Recenter any variables on to another grid
  • Figure out padding when re-centering
  • Subset in space (I should be able to provide a bbox and get back another Sgrid object read to go)
  • Subset in time (I don't want to have to write the bisect code... pysgrid could do that for me).

Basically, all data access should assume the user wants it back on the centers and pad/slice/rotate as needed. If a user wants the raw data they can just use netCDF4.

@ocefpaf
Copy link
Member Author

ocefpaf commented Jan 21, 2016

@kwilcox thanks for the sci-wms references above. The first and last should be included in pysgrid!
The second I believe is a matter of polishing pysgrid's API.

Here is a prototype API where the center averaging function became a method of the sgrid variable. There is plenty room for improvement, but that already removes the burden of applying the center_slice (padding) before averaging. We need to make that lazy and more robust to work on some edge case.

Note that rotation is not a good candidate to become an sgrid variable method, because it dependeds on 3 (or 4) variables: u, v, and angles or u, v, lon, and lat. How ugly is u.rotate(v) ?

@jay-hennen
Copy link
Contributor

I'm working on the cell tree grid searching and interpolation routines, and refactoring them now to deal with the API changes. I've got two questions relating to the staggered grids and what the current momentum is:

  1. One area of uncertainty I'm trying to deal with is variable names. Conceptually:
    psi grid == (self.node_lon, self.node_lat)
    rho grid == (self.center_lon, self.center_lat)
    u grid == ??
    v grid == ??
  2. Does one SGrid object contain/represent all four grids? (psi, rho, u, v) If so, which is the 'default' grid? I have been assuming the psi grid since it covers the 'valid' physical space.

@jay-hennen
Copy link
Contributor

I would like to suggest that the grid coordinates NOT be lazy loaded, and stored as numpy arrays (either raw or wrapped in a SGridVariable. I don't think they're large enough or infrequently used enough to warrant the data transfer/memory usage saved by lazy loading.

If they still must be lazy loaded, then upon first use (data retrieval) they should be fully loaded for the rest of the life of the object.

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 1, 2016

I'm working on the cell tree grid searching and interpolation routines,

Cool. Can you point us to a repo/branch where you are working in this?

and refactoring them now to deal with the API changes. I've got two questions relating to the staggered grids and what the current momentum is:

Be aware that pysgrid is not stable yet and we are planning a few more changes in the near future. Let's work together to avoid the pain of big refactoring!

  1. One area of uncertainty I'm trying to deal with is variable names. Conceptually:
    psi grid == (self.node_lon, self.node_lat)
    rho grid == (self.center_lon, self.center_lat)
    u grid == ??
    v grid == ??

Not sure what you are calling u, v grid, but take a look at the SGRID conventions, they might clear a few of your doubts.

Note that center will be renamed. (See #73).

  1. Does one SGrid object contain/represent all four grids? (psi, rho, u, v) If so, which is the 'default' grid? I have been assuming the psi grid since it covers the 'valid' physical space.

The sgrid needs some work, but in a way everything is there. You can see all the data as "grids" when coming from a ROMS point of view, but the SGRID conventions was written from a point of view where the velocity is defined at the edges and the other properties are defined at the faces of a square-ish cell. I believe that this approach was taken to approximate pysgrid to pyugrid ideas.

(@ayan-usgs can probably say more about this.)

I don't know about valid space because one can average on the faces or on the edges depending on the application. (Although with high resolution models this does not really matter.)

Usually, the faces are preferred because then all variables are in the same grid. (In ROMS language that is rho).

I would like to suggest that the grid coordinates NOT be lazy loaded, and stored as numpy arrays (either raw or wrapped in a SGridVariable. I don't think they're large enough or infrequently used enough to warrant the data transfer/memory usage saved by lazy loading.

👎 We do have applications where sub-setting before downloading is highly desired. The numpy-like lazy object is not defined yet, but it should not be a problem to any application you have in mind.

If they still must be lazy loaded, then upon first use (data retrieval) they should be fully loaded for the rest of the life of the object.

That is how a good lazy object should work 😄

Again this is not a blocker to any application that needs eager loading. All you have to do is to trigger the loading and keep doing your thing.

Sorry I could not help more. I need to get back to pysgrid and finish the API design to answer you better. Hope I was of some help though 😉

@jay-hennen
Copy link
Contributor

Thanks a lot Filipe.

https://github.com/jay-hennen/pysgrid

I don't have an example notebook up yet...the lazy loading of grids issue is the block at the moment. A lot of this interpolation depends on the grids being not only loaded, but use numpy-style advanced indexing. The grids should also persist between interpolation calls, if you want any semblance of efficiency.

Again this is not a blocker to any application that needs eager loading. All you have to do is to trigger the loading and keep doing your thing.

It's a little tougher than that, because it's the sgrid functions that need eager loading.

I'm working from a ROMS C-grid perspective where there are 4 distinct grids you can work on. Most data is on the 'rho' grid (cell centers + 1 value on all edges; this is like the faces+padding:both option?) but the u,v components of velocity have their own grids on the edges of the cell. I asked about the names because my variables are currently a hybrid mess of ROMS and SGrid.

Be aware that pysgrid is not stable yet and we are planning a few more changes in the near future. Let's work together to avoid the pain of big refactoring!

It was big this time because nodes was split into nodes_lon and nodes_lat. As long as paradigm shifts like that is over it should be okay :)

I don't know about valid space because one can average on the faces or on the edges depending on the application. (Although with high resolution models this does not really matter.)

When I say valid space I mean whichever grid is meant to define the physical boundary of the grid. The way I understand it that's the psi grid, so that when you place the variables on the rho/u/v grids you can get nice boundary conditions because they are all slightly larger.

@hrajagers
Copy link

Hi,

Just a few random comments on the issues you raise. I haven't had time to
read all the pys/ugrid posta thatI see pass by daily.

The sgrid conventions have indeed been inspired by the ugrid conventions.
The ugrid conventions were indeed specifically drafted to get away from
only data stored at "corner" nodes; data at cell centres and edges is quite
common although not all models use the same distribution of quantities
across the various locations. The sgrid conventions have been drafted to be
able to get rid of the multiple separate grid definitions for centre,
corner, u, v "points". However, the sgrid conventions were intended to
allow not just for data storage based on Arikawa C staggering but also A,
B, D, and E staggering. So, I would be slightly careful with assuming that
all quantities should be interpolated to the cell centres (faces), although
I agree that it would be a most pragmatic approach for the short term to
get a code/service working.

The idea was thata quantity defined at face-dimension in first direction
and node-dimension in second direction, would be a quantity defined at the
edges separating the faces in the second direction. Whether it's a quantity
along such edges or perpendicular to such edge would depend on the standard
name, and only few of these names exist. I'm not sure whether it's possible
(or easy) to determine whether the first or second dimension is associated
with the U or V direction (this may depend on the coding language used for
writing the file).

Regards,

Bert
Op 2 feb. 2016 17:08 schreef "jay-hennen" notifications@github.com:

Thanks a lot Filipe.

https://github.com/jay-hennen/pysgrid

I don't have an example notebook up yet...the lazy loading of grids issue
is the block at the moment. A lot of this interpolation depends on the
grids being not only loaded, but use numpy-style advanced indexing. The
grids should also persist between interpolation calls, if you want any
semblance of efficiency.

Again this is not a blocker to any application that needs eager loading.
All you have to do is to trigger the loading and keep doing your thing.

It's a little tougher than that, because it's the sgrid functions that
need eager loading.

I'm working from a ROMS C-grid perspective where there are 4 distinct
grids you can work on. Most data is on the 'rho' grid (cell centers + 1
value on all edges; this is like the faces+padding:both option?) but the
u,v components of velocity have their own grids on the edges of the cell. I
asked about the names because my variables are currently a hybrid mess of
ROMS and SGrid.

Be aware that pysgrid is not stable yet and we are planning a few more
changes in the near future. Let's work together to avoid the pain of big
refactoring!

It was big this time because nodes was split into nodes_lon and nodes_lat.
As long as paradigm shifts like that is over it should be okay :)

I don't know about valid space because one can average on the faces or on
the edges depending on the application. (Although with high resolution
models this does not really matter.)

When I say valid space I mean whichever grid is meant to define the
physical boundary of the grid. The way I understand it that's the psi grid,
so that when you place the variables on the rho/u/v grids you can get nice
boundary conditions because they are all slightly larger.


Reply to this email directly or view it on GitHub
#76 (comment).

@jay-hennen
Copy link
Contributor

So, I would be slightly careful with assuming that all quantities should be interpolated to the cell centres (faces)

Definitely not taking that approach. Our use case is interpolating a variable to arbitrary points on the psi grid. The variable is on a completely different grid (rho, u, v), which necessitates either additional point location calls (slow) or index translation (faster) so you can get the right values to interpolate with.

If edge1_coordinates is going to become edge1_lon and edge1_lat etc, I can work fine with that (since they would contain the staggered Arakawa C grids, right? Just have to keep which one is u and which is v straight.

@rsignell-usgs
Copy link
Contributor

@hrajagers, Bert, good to see you here!
Do you have or are you planning any code development using the SGRID conventions?

@hrajagers
Copy link

Hi Rich,

Delft3D 4 writes out files NetCDF files following SGRID conventions (set
output format to NetCDF: FlNcdf=#map his#).
Delft3D FM has an option to write out NetCDF files following UGRID
conventions (set MapFormat=4).
We have uploaded some time ago examples of both files onto our OPeNDAP
server, but I think that I didn't inform you yet about it:
http://opendap.deltares.nl/thredds/catalog/opendap/deltares/Delft3D/netcdf_example_files/catalog.html

I have started to implemented to support UGRID in the MATLAB code of
Delft3D-QUICKPLOT; I have so far mostly tested 2D output.
https://svn.oss.deltares.nl/repos/delft3d/trunk/src/tools_lgpl/matlab/quickplot/progsrc
(accessible via Deltares community account which can be created via
http://oss.deltares.nl)
Support for SGRID is planned, but I haven't started yet to work on it.

Cheers,

Bert

On Tue, Feb 2, 2016 at 6:48 PM, Rich Signell notifications@github.com
wrote:

@hrajagers https://github.com/hrajagers, Bert, good to see you here!

Do you have or are you planning any code development using the SGRID
conventions?


Reply to this email directly or view it on GitHub
#76 (comment).

@hrajagers
Copy link

OK, your psi grid is the grid of corner points.
I don't see psi, rho, u, v grids as completely different grids. For me
there is only one grid with data defined at different stagger locations.
If you see it as completely different grids, then you'll have to do a lot
of generic interpolations.
If you see it as one grid, then it should be possible to use index
translations.
By the way, is any other model than ROMS using the terms psi-grid and
rho-grid?

On Tue, Feb 2, 2016 at 6:44 PM, jay-hennen notifications@github.com wrote:

So, I would be slightly careful with assuming that all quantities should
be interpolated to the cell centres (faces)

Definitely not taking that approach. Our use case is interpolating a
variable to arbitrary points on the psi grid. The variable is on a
completely different grid (rho, u, v), which necessitates either additional
point location calls (slow) or index translation (faster) so you can get
the right values to interpolate with.

If edge1_coordinates is going to become edge1_lon and edge1_lat etc, I can
work fine with that (since they would contain the staggered Arakawa C
grids, right? Just have to keep which one is u and which is v straight.


Reply to this email directly or view it on GitHub
#76 (comment).

@jay-hennen
Copy link
Contributor

If you see it as completely different grids, then you'll have to do a lot
of generic interpolations.

Not sure what you mean by that. It's also still possible to do index translation if you think about them as separate grids, you just have to check that your point is within the new index and search the alternatives if it's wrong.

As for the ROMS names...I don't know. I'll probably convert the ones I have been using to what sgrid is using so it won't be an issue.

@hrajagers
Copy link

Not totally sure what you mean by checking whether a point is inside an
index, but I understand that you only have to generate an index table for
the interpolation only once. After that it's simple, but for in the
interpretation of one grid the searching indices of the enclosing points
wouldn't be necesary since they would follow simply from the neighbour
relationship relations. If I make myself clear.
Op 2 feb. 2016 22:13 schreef "jay-hennen" notifications@github.com:

If you see it as completely different grids, then you'll have to do a lot
of generic interpolations.

Not sure what you mean by that. It's also still possible to do index
translation if you think about them as separate grids, you just have to
check that your point is within the new index and search the alternatives
if it's wrong.

As for the ROMS names...I don't know. I'll probably convert the ones I
have been using to what sgrid is using so it won't be an issue.


Reply to this email directly or view it on GitHub
#76 (comment).

@jay-hennen
Copy link
Contributor

It's a bit tough to think about. Drawing a picture is helpful but it can also get cluttered fast! The gist of the problem is that if you pick a random point in a psi cell, that point could be in one of 4 different rho cells. If the point is in psi cell [0, 0], then it could be in rho cell [0, 0], [0, 1], [1, 0], or [1, 1] and you have to find out which. In the rectilinear case this can be done with simple less-than/greater-than comparisons, but it gets more complicated in the curvilinear case.

If you only work with rectangular grids you might also think that a point in a psi cell could be in one of two u or v cells, but that's not true in the curvilinear case either. It's actually six; two are very likely, but if the difference in grid shape could put it in one of the other four.

The whole point of finding the cell, btw, is to find the correct 4 values to use for interpolation.

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 3, 2016

This discussion is good, but it is drifting away from the pysgrid API topic. I suggest we open a new issue if you want to keep discussing grids and sgrid interpretation.

@jay-hennen I don't understand your worries about lazy loading. Maybe I will once I see the code/implementation of your ideas. (I checked your pysgrid fork but I could not find any new code to comment or experiment on. I also searched for the pyugrid version of that with no avail.)

Note that the lazy loading is virtually the same thing you would get by loading your data with netcdf4-python and then slicing the netCDF variable, to get the numpy array, with the syntax [:]. Even if we choose to wrap the netCDF variable in a dask (or biggus) object we can still be smart about it and load the numpy array into memory before doing your thing. (And yes, once loaded they will persist. Check iris cubes for an example. Everything is lazy loaded and became a numpy array in memory once you request it.)

I will be back working on this soon. Stay tuned...

@hrajagers Thanks for the clarifications and the links! I will put together a notebook example for pysgrid with DELFT data soon.

PS: I could not see the code from https://svn.oss.deltares.nl/repos/delft3d/trunk/src/tools_lgpl/matlab/quickplot/progsrc. I got a login dialog.

@jay-hennen
Copy link
Contributor

Most of the changes are in here Filipe:
d31bacf

After I slice a netCDF variable, how long does that information persist? If I make subsequent calls will the previous calls remain?

self.node_lon[:]
self.node_lon[8,5]
self.node_lon[:]

Would that translate into three lazy loads or just two? If we skip the middle line, is it two or one? Or does it work like I think it does, where you have to explicitly save the array you get back from the call, otherwise it doesn't persist?

EDIT: Nevermind, I figured it out. It works fine. I just have to be careful to slice it ASAP

On an API note, are edge1_lon/lat and edge2_lon/lat expected to be introduced? If so I can update everything to use them.

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 3, 2016

Most of the changes are in here Filipe:
d31bacf

I see you committed d31bacf yesterday, thanks! Feel free to send it as a PR so @ayan-usgs and @kwilcox can comment too.

EDIT: Nevermind, I figured it out. It works fine. I just have to be careful to slice it ASAP

Yep. That is what I meant be load it with [:]. We might introduce a .points (or .data, .values ?) property that will trigger the load and persist after the first load. (Akin to xray, iris, and virtually every other package out there.)

On an API note, are edge1_lon/lat and edge2_lon/lat expected to be introduced? If so I can update everything to use them.

I will defer that question to @ayan-usgs and I am still reading the code and getting familiar with his work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants