Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region definitions assume [0, 360] longitudes range #229

Closed
spencerahill opened this issue Nov 8, 2017 · 12 comments
Closed

Region definitions assume [0, 360] longitudes range #229

spencerahill opened this issue Nov 8, 2017 · 12 comments

Comments

@spencerahill
Copy link
Owner

spencerahill commented Nov 8, 2017

Just noticed this while using Region.ts on data that is on a [-180, 180] longitude grid, using the following region:

sahel = Region(
    name='sahel',
    description='African Sahel',
    mask_bounds=[((10, 20), (0, 40)), ((10, 20), (342, 360))],
    do_land_mask=True
)

The (342, 360) bit gets dropped, since it is out of the array's (-180, 180) range, causing the region to be truncated to (0, 40). This is physically incorrect and thus a bug.

We need to translate all region definitions and the longitude arrays on which the regional reductions are operating to a standard longitude array, which by convention should be either (-180, 180) or (0, 360). This can be accomplished by noting that (0, 180) + 360*i corresponds to the Eastern Hemisphere and (180, 360) + 360*j corresponds to the Western hemisphere, where i and j are integers. E.g. i=0, j=-1 corresponds to the (-180, 180) grid.

So the steps would be

  1. Convert the region's longitude bounds to aospy's chosen reference longitude.
  2. Convert the array's longitudes to aospy's chosen reference longitude.
  3. Perform the reduction.

Once implemented, we should be able to support longitude arrays with arbitrary start and end values.

@spencerahill
Copy link
Owner Author

Returning to this now. It occurs to me that once this is implemented, there's no need to use mask_bounds for simple lat-lon rectangles, since they will always be cast to a standard longitude array. However, mask_bounds is still useful for non-contiguous regions and thus shouldn't be deprecated. We should note these in the docs.

Separately, we shouldn't modify the longitude values of the input data. So it's at the Region.mask_var step where this logic should go.

@spencerahill
Copy link
Owner Author

So we need to make some design choices about what Region will accept as arguments to lon_bounds, lat_bounds, and thus also mask_bounds. Currently, we have no checks in Region.__init__ about these:

aospy/aospy/region.py

Lines 151 to 154 in dded426

if lon_bounds and lat_bounds and not mask_bounds:
self.mask_bounds = [(lat_bounds, lon_bounds)]
else:
self.mask_bounds = mask_bounds

First, there is the obvious problem of things other than length-2 sequences being passed in. Of greater concern, currently the region will be silently empty if the two bounds are not in ascending order, given the logic in _add_to_mask:

aospy/aospy/region.py

Lines 9 to 14 in dded426

def _add_to_mask(data, lat_bounds, lon_bounds):
"""Add mask spanning given lat-lon rectangle."""
mask_lat = ((data[internal_names.LAT_STR] > lat_bounds[0]) &
(data[internal_names.LAT_STR] < lat_bounds[1]))
return mask_lat & ((data[internal_names.LON_STR] > lon_bounds[0]) &
(data[internal_names.LON_STR] < lon_bounds[1]))

If e.g. lon_bounds[0] > lon_bounds[1], this will always be False.

This is easy enough to fix in the current paradigm where longitudes are assumed to be [0, 360) both in the region definition and the input data, e.g. assert lon_bounds[0] < lon_bounds[1] etc. But it becomes trickier when we try to relax those assumptions, which is the point of this issue.

Suppose the region was defined with longitude bounds (-20, 10). If these get shifted to the 0-360 grid, they become (340, 10). So this will yield the empty region as just discussed. Sorting the longitudes also doesn't help, because then you've inverted the region: turning (340, 10) into (10, 340) makes the region span 330 degrees longitude (from 10E to 20W), rather than 30 degrees (from 20W to 10E)

So perhaps the solution is to not shift the region definitions to the (0, 360) grid, and instead shift the input data's longitudes to whatever 360 degree interval on which the Region is defined.

@spencerkclark
Copy link
Collaborator

Suppose the region was defined with longitude bounds (-20, 10). If these get shifted to the 0-360 grid, they become (340, 10).

What if we went with the convention that the left value always represents the "western" boundary of the region and the right value always represents the "eastern" boundary? Essentially we would need to add logic internally to interpret things like (340, 10) as (340, 360) and (0, 10). I think you should shift the region definitions to whatever convention the underlying data uses (so in this example in the -180 to 180 convention, the region becomes simpler; it is just (-20, 10) in longitude).

What do you have in mind for how to determine which longitude convention the underlying data uses? In simple cases, I think you could do it automatically (e.g. if data from the global domain is present), but in a limited domain setup where not all longitudes are represented (to be fair these use-cases are uncommon), you could run into ambiguities. A keyword argument (something like lon_format) on the Model object could be used to disambiguate things.

@spencerahill
Copy link
Owner Author

In simple cases, I think you could do it automatically (e.g. if data from the global domain is present), but in a limited domain setup where not all longitudes are represented (to be fair these use-cases are uncommon), you could run into ambiguities.

This is a fair point. We already have one user (@chuaxr) running limited domain. (And not even on a lon/lat grid, which points to a bigger issue we should tackle at some point of generalizing our coordinates. @chuaxr, have you been doing any regional averaging? If so, what do your Region object definitions look like?)

Separately, even for global domains, we can't count on (0, 360) and (-180, 180) being the only ones, as some GFDL ocean models use (-280, 80). See e.g. /archive/Hannah.Zanowski/CM2.6/ocean/annual_1yr/ocean.0001.ann.nc

What if we went with the convention that the left value always represents the "western" boundary of the region and the right value always represents the "eastern" boundary?

That's not a bad idea.

A keyword argument (something like lon_format) on the Model object could be used to disambiguate things.

Whether or not the longitudes are global/cyclic, we can always unambiguously identify W. Hem. and E. Hem. points. If they are cyclic, then we should adapt regions to the grid accordingly. If not, then we'll have to check that the region is fully defined on the given grid, c.f. #230.

So perhaps something like lon_cyclic? I agree that Model seems like the appropriate place for it.

@spencerkclark
Copy link
Collaborator

Whether or not the longitudes are global/cyclic, we can always unambiguously identify W. Hem. and E. Hem. points. If they are cyclic, then we should adapt regions to the grid accordingly. If not, then we'll have to check that the region is fully defined on the given grid, c.f. #230.

Good point! Yes, I think lon_cyclic would be a good name for the attribute on a Model object which would toggle this behavior.

@spencerahill
Copy link
Owner Author

Just thought of an additional wrinkle: Region objects can be useful outside of the standard aospy main script pipeline --- I have a project where I'm using them to compute regional averages on aospy ts output after the fact (because I first needed to reinterpolate the gridded data to a new grid before regionally averaging). There's no reason this couldn't be generalized to non-aospy-outputted data: the only things that the Region methods require of the input data array are: sfc_area and land_mask attributes, and LON_STR and LAT_STR as the names of the lon and lat dimensions, respectively.

I still think that lon_cyclic makes sense as a Proj/Model/Run attribute, but I think we should also add it as a kwarg to the Region methods, e.g. lon_cyclic=None:

  • If None, assume it's aospy data, and find it accordingly. If it can't be found, then raise.
  • If True or False, adapt the region mask accordingly.

While I'm at it, I'll add checks for sfc_area, land_mask, and the lat/lon strings also. The end result will be that the mask_var, ts, av, and std methods of Region can all accept non-aospy data.

Does that sound like a good idea?

@spencerkclark
Copy link
Collaborator

Yeah I like it; that sounds great!

@chuaxr
Copy link

chuaxr commented Mar 24, 2018

At the moment, the only Region I have defined is the whole domain. I mainly use this for the purpose of domain-averaged time-series. (As I recall, I ran into problems with the regular time series average trying to group by year.)

 domain = Region(
     name='domain',
     description='Entire domain',
     lat_bounds=(-1,2), #larger than 0,1
     lon_bounds=(-1,2), #larger than 0,1
     do_land_mask=False
  )

Note that I do the following in my pre-processing function:

ds.coords['lat']=('south_north',np.linspace(0,1,sn_size))
ds.coords['lon']=('west_east',np.linspace(0,1,we_size))

@spencerahill
Copy link
Owner Author

Thanks @chuaxr , that's helpful to know.

Let us know if you come across the problem again you're referring to.

@chuaxr
Copy link

chuaxr commented Mar 26, 2018

@spencerahill I encountered the time-averaging problem within the last month, and used the domain region as a workaround. This is probably related to #252.

@spencerahill
Copy link
Owner Author

What if we went with the convention that the left value always represents the "western" boundary of the region and the right value always represents the "eastern" boundary?

I am considering a breaking change to make this more explicit: switch from Region.lon_bounds and Region.lat_bounds to west_bound, east_bound, south_bound, and north_bound. This will prevent any confusion. But it also means that all existing Region objects anyone has defined will have to be modified. I think it's worth it, though...I have ~20 Region objects defined, and I suspect it will take <1 min per object to update.

Another consideration: As I mentioned above, at some point we probably want to generalize from the inherently lat-lon framework to be more general. Hard-coding in west, east, etc. would not help in that regard. But I think that' effort is far away enough to stick with the existing convention for now.

I'll go ahead with this approach unless anybody strongly objects.

@spencerkclark
Copy link
Collaborator

I am considering a breaking change to make this more explicit: switch from Region.lon_bounds and Region.lat_bounds to west_bound, east_bound, south_bound, and north_bound. This will prevent any confusion.

Yes, I like that idea; it shouldn't be too hard for me to modify things in my object library either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants