-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: active versus passive coordinate assignment #2
Comments
This is a great start for the discussion. I like how you have clearly laid out everything and have some great points. I have a couple of items to add to your comments that could potentially aid in the discussion.
Here is an example of how multiple CRS in the same dataset might work in the active method:
And on the variables:
|
@snowman2 Thanks. For your first point, metpy currently does this by creating a CRS stating it is handling lat/lon data. Really it looks like it is trying to see if the coordinates for a variable have coordinates that are lon/lat variables: https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py#L187 For 2, I could see one big issue with this being that attributes tend to disappear by default in xarray so any arithmetic on these DataArrays/Dataset and the |
Also related: Unidata/MetPy#960 |
Yep, that is definitely something to be wary about. If you choose to go down this route, you would have to have it well documented and maybe raise warnings/exceptions if it is missing when more than one crs like coordinate is present. If you want to produce a CF compliant file, the |
@djhoese MetPy's handling of simultaneous x/y and longitude/latitude might not be optimal right now. Currently, the line you referenced should only come into play if the In terms of coordinate recognition, if both projection x/y coordinates and lon/lat coordinates are present for a variable, MetPy will try to default to the projection coordinates, and if those are not unique, then to dimension coordinates, as in this function: https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py#L345-L368 I'm not sure if this is the best approach, but this is what we had settled on this past summer. |
Thanks for the summary @djhoese ! My 2 cents:
|
I guess an extension of this issue's general question is: should users be expected to know what "structure" their data is in? Should they know that geolocation can be retrieved from conventional CF variables ( |
I'm going to close this issue as I think in #24 and the initial 0.1.0 version (now released), geoxarray follows kind of an half-way design between active and passive behavior. On one hand, doing
Until I/we/geoxarray have more use cases (data loading all the way to final usage), I don't think we can say for certain which way each possible feature of geoxarray should be implemented. The basic functionality have an People can feel free to continue to comment here, but in my opinion specific cases should probably be discussed in new issues. |
I'm going to start making issues here on this repository so pydata/xarray#2288 doesn't get any longer. This issue is regarding what the best and most useful way is for geoxarray to add or provide information to xarray objects. This library will likely be heavily influenced by metpy's handling (@dopplershift) of CF metadata: https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py#L152. From talking with @dopplershift on gitter, he mentioned @shoyer had suggested adding a
crs
coordinate to DataArrays.The main question is should geoxarray assign things to the user provided xarray object (what I'm calling "active" behavior) or should geoxarray make things available for other libraries to use by calculating things when needed ("passive"). I haven't fully laid out the design for the passive behavior, that's what this issue is for.
Active
Basic behavior:
new_ds = my_dataset.metpy.parse_cf()
)my_data_arr.coords['crs'] = CRS(...)
)x
andy
coordinates are in meters and that acrs
coordinate defines the CRS.Pros:
Cons:
Passive
Basic behavior:
my_data_arr.geo.crs
wheregeo
is the geoxarray accessor)my_data_arr.geo.set_xy_dims('x2', 'y2')
) where handling all variants of dimension/coordinate names would be difficult otherwise.Pros:
Cons:
Example usages
See also MetPy's example.
Conclusion
I like active better I think even though passive was my original plan for solving things. I think both methods require more information from the user in special cases (accepting non-CF/non-standard coordinate/dimension names). Both require the user or using library to expect certain things about the data or require the user to know how to pass things to that library (
cartopy_crs
, etc). Some of the metpy CRS handling may also be handled by the pycrs library.My biggest flexibility issue was the active method not being able to handle multiple CRS in one Dataset, but realize now that the user just needs to split the dataset up by CRS.
Obviously feedback and concerns are welcome (@mraspaud, @pnuu, @leouieda, @fmaussion, @snowman2, @karimbahgat).
The text was updated successfully, but these errors were encountered: