Region definitions assume [0, 360] longitudes range #229

spencerahill · 2017-11-08T22:43:04Z

Just noticed this while using Region.ts on data that is on a [-180, 180] longitude grid, using the following region:

sahel = Region(
    name='sahel',
    description='African Sahel',
    mask_bounds=[((10, 20), (0, 40)), ((10, 20), (342, 360))],
    do_land_mask=True
)

The (342, 360) bit gets dropped, since it is out of the array's (-180, 180) range, causing the region to be truncated to (0, 40). This is physically incorrect and thus a bug.

We need to translate all region definitions and the longitude arrays on which the regional reductions are operating to a standard longitude array, which by convention should be either (-180, 180) or (0, 360). This can be accomplished by noting that (0, 180) + 360*i corresponds to the Eastern Hemisphere and (180, 360) + 360*j corresponds to the Western hemisphere, where i and j are integers. E.g. i=0, j=-1 corresponds to the (-180, 180) grid.

So the steps would be

Convert the region's longitude bounds to aospy's chosen reference longitude.
Convert the array's longitudes to aospy's chosen reference longitude.
Perform the reduction.

Once implemented, we should be able to support longitude arrays with arbitrary start and end values.

The text was updated successfully, but these errors were encountered:

spencerahill · 2018-03-21T18:52:02Z

Returning to this now. It occurs to me that once this is implemented, there's no need to use mask_bounds for simple lat-lon rectangles, since they will always be cast to a standard longitude array. However, mask_bounds is still useful for non-contiguous regions and thus shouldn't be deprecated. We should note these in the docs.

Separately, we shouldn't modify the longitude values of the input data. So it's at the Region.mask_var step where this logic should go.

spencerahill · 2018-03-23T03:43:30Z

So we need to make some design choices about what Region will accept as arguments to lon_bounds, lat_bounds, and thus also mask_bounds. Currently, we have no checks in Region.__init__ about these:

aospy/aospy/region.py

Lines 151 to 154 in dded426

    
           if lon_bounds and lat_bounds and not mask_bounds: 
        
               self.mask_bounds = [(lat_bounds, lon_bounds)] 
        
           else: 
        
               self.mask_bounds = mask_bounds

First, there is the obvious problem of things other than length-2 sequences being passed in. Of greater concern, currently the region will be silently empty if the two bounds are not in ascending order, given the logic in _add_to_mask:

aospy/aospy/region.py

Lines 9 to 14 in dded426

    
           def _add_to_mask(data, lat_bounds, lon_bounds): 
        
               """Add mask spanning given lat-lon rectangle.""" 
        
               mask_lat = ((data[internal_names.LAT_STR] > lat_bounds[0]) & 
        
                           (data[internal_names.LAT_STR] < lat_bounds[1])) 
        
               return mask_lat & ((data[internal_names.LON_STR] > lon_bounds[0]) & 
        
                                  (data[internal_names.LON_STR] < lon_bounds[1]))

If e.g. lon_bounds[0] > lon_bounds[1], this will always be False.

This is easy enough to fix in the current paradigm where longitudes are assumed to be [0, 360) both in the region definition and the input data, e.g. assert lon_bounds[0] < lon_bounds[1] etc. But it becomes trickier when we try to relax those assumptions, which is the point of this issue.

Suppose the region was defined with longitude bounds (-20, 10). If these get shifted to the 0-360 grid, they become (340, 10). So this will yield the empty region as just discussed. Sorting the longitudes also doesn't help, because then you've inverted the region: turning (340, 10) into (10, 340) makes the region span 330 degrees longitude (from 10E to 20W), rather than 30 degrees (from 20W to 10E)

So perhaps the solution is to not shift the region definitions to the (0, 360) grid, and instead shift the input data's longitudes to whatever 360 degree interval on which the Region is defined.

spencerkclark · 2018-03-23T12:49:20Z

Suppose the region was defined with longitude bounds (-20, 10). If these get shifted to the 0-360 grid, they become (340, 10).

What if we went with the convention that the left value always represents the "western" boundary of the region and the right value always represents the "eastern" boundary? Essentially we would need to add logic internally to interpret things like (340, 10) as (340, 360) and (0, 10). I think you should shift the region definitions to whatever convention the underlying data uses (so in this example in the -180 to 180 convention, the region becomes simpler; it is just (-20, 10) in longitude).

What do you have in mind for how to determine which longitude convention the underlying data uses? In simple cases, I think you could do it automatically (e.g. if data from the global domain is present), but in a limited domain setup where not all longitudes are represented (to be fair these use-cases are uncommon), you could run into ambiguities. A keyword argument (something like lon_format) on the Model object could be used to disambiguate things.

spencerahill · 2018-03-23T19:06:29Z

In simple cases, I think you could do it automatically (e.g. if data from the global domain is present), but in a limited domain setup where not all longitudes are represented (to be fair these use-cases are uncommon), you could run into ambiguities.

This is a fair point. We already have one user (@chuaxr) running limited domain. (And not even on a lon/lat grid, which points to a bigger issue we should tackle at some point of generalizing our coordinates. @chuaxr, have you been doing any regional averaging? If so, what do your Region object definitions look like?)

Separately, even for global domains, we can't count on (0, 360) and (-180, 180) being the only ones, as some GFDL ocean models use (-280, 80). See e.g. /archive/Hannah.Zanowski/CM2.6/ocean/annual_1yr/ocean.0001.ann.nc

What if we went with the convention that the left value always represents the "western" boundary of the region and the right value always represents the "eastern" boundary?

That's not a bad idea.

A keyword argument (something like lon_format) on the Model object could be used to disambiguate things.

Whether or not the longitudes are global/cyclic, we can always unambiguously identify W. Hem. and E. Hem. points. If they are cyclic, then we should adapt regions to the grid accordingly. If not, then we'll have to check that the region is fully defined on the given grid, c.f. #230.

So perhaps something like lon_cyclic? I agree that Model seems like the appropriate place for it.

spencerkclark · 2018-03-24T11:07:57Z

Whether or not the longitudes are global/cyclic, we can always unambiguously identify W. Hem. and E. Hem. points. If they are cyclic, then we should adapt regions to the grid accordingly. If not, then we'll have to check that the region is fully defined on the given grid, c.f. #230.

Good point! Yes, I think lon_cyclic would be a good name for the attribute on a Model object which would toggle this behavior.

spencerahill · 2018-03-24T17:00:13Z

Just thought of an additional wrinkle: Region objects can be useful outside of the standard aospy main script pipeline --- I have a project where I'm using them to compute regional averages on aospy ts output after the fact (because I first needed to reinterpolate the gridded data to a new grid before regionally averaging). There's no reason this couldn't be generalized to non-aospy-outputted data: the only things that the Region methods require of the input data array are: sfc_area and land_mask attributes, and LON_STR and LAT_STR as the names of the lon and lat dimensions, respectively.

I still think that lon_cyclic makes sense as a Proj/Model/Run attribute, but I think we should also add it as a kwarg to the Region methods, e.g. lon_cyclic=None:

If None, assume it's aospy data, and find it accordingly. If it can't be found, then raise.
If True or False, adapt the region mask accordingly.

While I'm at it, I'll add checks for sfc_area, land_mask, and the lat/lon strings also. The end result will be that the mask_var, ts, av, and std methods of Region can all accept non-aospy data.

Does that sound like a good idea?

spencerkclark · 2018-03-24T20:21:32Z

Yeah I like it; that sounds great!

chuaxr · 2018-03-24T23:16:47Z

At the moment, the only Region I have defined is the whole domain. I mainly use this for the purpose of domain-averaged time-series. (As I recall, I ran into problems with the regular time series average trying to group by year.)

 domain = Region(
     name='domain',
     description='Entire domain',
     lat_bounds=(-1,2), #larger than 0,1
     lon_bounds=(-1,2), #larger than 0,1
     do_land_mask=False
  )

Note that I do the following in my pre-processing function:

ds.coords['lat']=('south_north',np.linspace(0,1,sn_size))
ds.coords['lon']=('west_east',np.linspace(0,1,we_size))

spencerahill · 2018-03-25T21:26:45Z

Thanks @chuaxr , that's helpful to know.

Let us know if you come across the problem again you're referring to.

chuaxr · 2018-03-26T01:44:22Z

@spencerahill I encountered the time-averaging problem within the last month, and used the domain region as a workaround. This is probably related to #252.

spencerahill · 2018-04-23T16:28:42Z

What if we went with the convention that the left value always represents the "western" boundary of the region and the right value always represents the "eastern" boundary?

I am considering a breaking change to make this more explicit: switch from Region.lon_bounds and Region.lat_bounds to west_bound, east_bound, south_bound, and north_bound. This will prevent any confusion. But it also means that all existing Region objects anyone has defined will have to be modified. I think it's worth it, though...I have ~20 Region objects defined, and I suspect it will take <1 min per object to update.

Another consideration: As I mentioned above, at some point we probably want to generalize from the inherently lat-lon framework to be more general. Hard-coding in west, east, etc. would not help in that regard. But I think that' effort is far away enough to stick with the existing convention for now.

I'll go ahead with this approach unless anybody strongly objects.

spencerkclark · 2018-04-23T16:33:08Z

I am considering a breaking change to make this more explicit: switch from Region.lon_bounds and Region.lat_bounds to west_bound, east_bound, south_bound, and north_bound. This will prevent any confusion.

Yes, I like that idea; it shouldn't be too hard for me to modify things in my object library either.

spencerahill added bug Region labels Nov 8, 2017

spencerahill added this to the before v1.0 milestone Nov 8, 2017

spencerahill mentioned this issue Nov 8, 2017

Regional reductions should raise if the region isn't fully defined on the given array #230

Open

spencerahill mentioned this issue Mar 25, 2018

Add to docs description of using aospy w/ non-global lat/lon data #261

Open

chuaxr mentioned this issue Mar 31, 2018

Unintuitive behavior in taking a time-series of sub-monthly data #262

Open

This was referenced Apr 23, 2018

Region: explicit W/E/S/N bounds and cyclic longitude support; Longitude class and longitude module #266

Merged

Proposal: utils.longitude module and Longitude class #272

Closed

spencerahill closed this as completed in #266 May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Region definitions assume [0, 360] longitudes range #229

Region definitions assume [0, 360] longitudes range #229

spencerahill commented Nov 8, 2017 •

edited

Loading

spencerahill commented Mar 21, 2018

spencerahill commented Mar 23, 2018

spencerkclark commented Mar 23, 2018

spencerahill commented Mar 23, 2018

spencerkclark commented Mar 24, 2018

spencerahill commented Mar 24, 2018

spencerkclark commented Mar 24, 2018

chuaxr commented Mar 24, 2018 •

edited by spencerahill

Loading

spencerahill commented Mar 25, 2018

chuaxr commented Mar 26, 2018

spencerahill commented Apr 23, 2018

spencerkclark commented Apr 23, 2018

Region definitions assume [0, 360] longitudes range #229

Region definitions assume [0, 360] longitudes range #229

Comments

spencerahill commented Nov 8, 2017 • edited Loading

spencerahill commented Mar 21, 2018

spencerahill commented Mar 23, 2018

spencerkclark commented Mar 23, 2018

spencerahill commented Mar 23, 2018

spencerkclark commented Mar 24, 2018

spencerahill commented Mar 24, 2018

spencerkclark commented Mar 24, 2018

chuaxr commented Mar 24, 2018 • edited by spencerahill Loading

spencerahill commented Mar 25, 2018

chuaxr commented Mar 26, 2018

spencerahill commented Apr 23, 2018

spencerkclark commented Apr 23, 2018

spencerahill commented Nov 8, 2017 •

edited

Loading

chuaxr commented Mar 24, 2018 •

edited by spencerahill

Loading