Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leverage cf_xarray to get x_coords (lon) and y_coords (lat)? #364

Closed
mathause opened this issue Aug 1, 2022 · 7 comments · Fixed by #393
Closed

leverage cf_xarray to get x_coords (lon) and y_coords (lat)? #364

mathause opened this issue Aug 1, 2022 · 7 comments · Fixed by #393
Labels
API design dependencies Pull requests that update a dependency file enhancement mask

Comments

@mathause
Copy link
Member

mathause commented Aug 1, 2022

Currently - if the coordinates are not named "lat" and "lon" we need to pass them manually, e.g.:

import regionmask
import xarray as xr

rasm = xr.tutorial.load_dataset("rasm")

regionmask.defined_regions.srex.mask(rasm.xc, rasm.yc)

cf_xarray uses attrs to determine cf compliant "coordinates" - irrespective of their name. We could use this to allow

import cf_xarray # require manual import?

regionmask.defined_regions.srex.mask(rasm)

by using

x_coords = data.cf["longitude"]
y_coords = data.cf["latitude"]

internally.

@jbusecke @aaronspring that may be of interest of you.

@mathause
Copy link
Member Author

mathause commented Aug 1, 2022

These thoughts are not finished but I need to stop now.


Coding this is not very difficult but to decide how "magical" this should behave because there will be 3 ways to determine which coords to use which is sub-optimal.

a. without cf_xarray: region.mask(ds) uses ds.lon
b. with cf_xarray: region.mask(ds) would use ds.cf["longitude"] (which could be - but does not have to be - different from ds.lon)
c. region.mask(ds.longitude, ds.latitude) would use whatever is passed.

The problem is that (a) and (b) can lead to different results depending on whether cf_xarray is imported, which is bad.


In more detail:

  1. I really like that region.mask(ds) works for many cases.

  2. I don't want to require cf_xarray as dependency.

  3. I think regionmask should not attach the cf accessor. (I am really not sure about this one).

    import regionmask
    import xarray as xr
    
    rasm = xr.tutorial.load_dataset("rasm")
    
    # SHOULD ERROR
    regionmask.defined_regions.srex.mask(rasm)

    Potential error message:
    "coordinates not found. Either pass them manually as e.g. ``region.mask(ds.longitude, ds.latitude)`` or ``import cf_xarray`` to try and infer them automatically."

    import cf_xarray
    
    # SHOULD WORK
    regionmask.defined_regions.srex.mask(rasm)
  4. Do not use cf accessor if the coords are passed manually.

    Example:

    regionmask.defined_regions.srex.mask(rasm.xc, rasm.yc)
  5. import cf_xarray should not lead to silent behavior change.

  6. Therefore, warn (or error) if "old way" and cf_xarray would select different coordinates:

    rasm_renamed = rasm.rename(x="lon", y="lat")
    
    regionmask.defined_regions.srex.mask(rasm_renamed)
    • "old way" would select rasm_renamed["lon"]
    • "new way" would select rasm_renamed["xc"]
    • Can happen e.g. if the y-dim is named "lat" and the y-coords are named "latitude" - so no good to assume this is a seldom edge case.
  7. However, it is more likely that cf_xarray is right than my dumb way to do it.

    • I am not so sure what to do here
  8. It is somewhat tempting to make my algorithm more clever - however that would kind of mean implementing cf_xarray which is not the idea.

@mathause
Copy link
Member Author

mathause commented Aug 8, 2022

My current feeling is to implement #293 first and then add an option (either regionmask.set_options(use_cf=None) or mask(..., use_cf=None) with the options:

  • None -> use cf_xarray if it's unambiguous
  • "always"
  • "never"

@aaronspring
Copy link
Contributor

I like the general idea and the mask(ds, use_cf=bool) implementation

@dcherian
Copy link

Can't help with the automatic accessor registration, but you could use custom criteria to make the existing lat, lon a synonym for latitude, longitude. Just to add one more possibility to the mix... ;)

This is unambiguous in that if multiple variables satisfy the criteria for "latitude", then ds.cf["latitude"] will raise an error.

@jbusecke
Copy link
Contributor

I like this general idea a lot. I would always raise a warning in the case of potential ambiguity (this might be done by catching errors from cf_xarray or some logic in regionmask itself). This would mean that in simple cases "the magic just works" and if there are options to consider, the user is warned. To me that seems ideal.

@mathause mathause added enhancement mask API design dependencies Pull requests that update a dependency file labels Aug 25, 2022
@mathause
Copy link
Member Author

I finally came around to implementing this. The logic is a slightly complicated, but I think it is mostly internal and should not bother the users too much.

@mathause
Copy link
Member Author

And thanks for the feedback b.t.w.!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design dependencies Pull requests that update a dependency file enhancement mask
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants