Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the "0 degree meridian" issue in subsetting #35

Closed
cehbrecht opened this issue Jul 24, 2020 · 7 comments
Closed

Fix the "0 degree meridian" issue in subsetting #35

cehbrecht opened this issue Jul 24, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@cehbrecht
Copy link
Collaborator

  • clisops version: 0.3.0
  • Python version:
  • Operating System:

Description

The core.subset module is missing the implementation for "crossing 0 degree meridian".

f"Input longitude bounds ({kwargs[lon]}) cross the 0 degree meridian but"

What I Did

@cehbrecht cehbrecht added the bug Something isn't working label Jul 24, 2020
@cehbrecht
Copy link
Collaborator Author

@Zeitsperre why do we need a special implementation for this? I would suppose it works on xarray ...

@Zeitsperre
Copy link
Collaborator

From our experience, it doesn't work right out of the box. You could try implementing a cleaner version without the handling but I think your results will be cut at the median line or the dateline, depending on the GCS of the netCDF data.

@agstephens
Copy link
Collaborator

We might be able to use the xarray method roll. This allows you view the coordinate space from a different frame of reference:

http://xarray.pydata.org/en/stable/generated/xarray.DataArray.roll.html

Hence you could:

  1. Detect it longitude was 0 to 359 in the input dataset
  2. Roll it to -180 to 180 if required
  3. Then run the subset

@Zeitsperre
Copy link
Collaborator

If I recall correctly, the reason why we didn't want to use the "roll" approach was because of the high processing load needed when processing large (>250GB) NetCDF files. It made much more sense to slice up the vector file to better suit the shape of the NetCDF than to change all the NetCDF data with roll.

@ellesmith88
Copy link
Collaborator

@agstephens I've been implementing this but haven't opened a PR yet as I've spotted a few issues with what I've done and I'm looking into them. Here are the changes I've made https://github.com/roocs/clisops/compare/cross_prime_meridian_fix

The issues are:

  1. res = lon.values[1] - lon.values[0] # this doesn't work for test data

    I'm not sure how well this will work with the test data as they have only a few longitudes e.g. 0, 250

  2. # check if the request is in bounds - return ds if it is
    if lon_min <= low and lon_max >= high:
    return ds

    Here I am checking whether the request is within the bounds of the dataset - however it is very strict and if I ask for a subset (0, 100) but the minimum longitude is 0.01 this tries to roll the dataset. The same issue might arise in the case of subsetting again after a first subset in a workflow.

  3. diff = -180 - lon.values[0]

    I'm calculating how to roll to end up with longitude -180 to 180, so I think I need to change this to be more generic.

@ellesmith88
Copy link
Collaborator

Here's what I've come up with:

  1. If there's only one longitude - return the dataset as it is and don't try to roll. Don't think this solves the test issue but does look at the possible case of one longitude.

  2. Change the check to something like if (lon_min <= low or np.isclose(low, lon_min, atol=0.5)) and (lon_max >= high or np.isclose(high, lon_max, atol=0.5)):

  3. Instead of rolling the first longitude value to -180, roll it to the lower bound of the requested subset longitude bounds e.g.

low, high = lon_bnds

first_element_value = low
diff, offset = calculate_offset(lon, first_element_value)
ds_roll = ds.roll(shifts={f"{lon.name}": offset}, roll_coords=False)
ds_roll.coords[lon.name] = ds_roll.coords[lon.name] + diff
return ds_roll

with

def calculate_offset(lon, first_element_value):

    # get resolution of data
    res = lon.values[1] - lon.values[0]

    # calculate how many degrees to move by to have lon[0] of rolled subset as lower bound of request
    diff = first_element_value - lon.values[0]

    # work out how many elements to roll by to roll data by 1 degree
    index = 1 / res

    # calculate the corresponding offset needed to change data by diff
    offset = int(diff * index)

    return diff, offset

@agstephens
Copy link
Collaborator

@ellesmith88: I've read through the code and it's looking great!

Here are my responses to your three points:

  1. If there is only one longitude: it's a special case, which could probably be solved by:

    • if actual_lon is not in lon_bnds:
      • if lon_bnds[0] > actual_lon: new_lon = actual_lon + 360
      • if lon_bnds[1] < actual_lon: new_lon = actual_lon - 360
      • then reassign the lon coord([new_lon])
      • NOTE: no need to roll here because data array remains the same
  2. Change the check to something like if (lon_min <= low or np.isclose(low, lon_min, atol=0.5)) and (lon_max >= high or np.isclose(high, lon_max, atol=0.5)):

  • looks like a great plan
  1. Instead of rolling the first longitude value to -180, roll it to the lower bound of the requested subset longitude bounds:
  • yes, I think that is the best approach, it should deal with all cases.

Great stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants