Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifications to CRS handling in xarray_.py provider #1578

Open
sjordan29 opened this issue Mar 4, 2024 · 4 comments
Open

Modifications to CRS handling in xarray_.py provider #1578

sjordan29 opened this issue Mar 4, 2024 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@sjordan29
Copy link
Contributor

Description
These lines of code in the xarray provider are causing issues for some of the EDR datasets I'm trying to serve.

I'm working with CF-compliant zarrs/netCDFs, and I see a few issues as the current code base and its ability to work with a broad range of datasets.

  1. epsg_code is not a required attribute for CF compliance, so this section of the code can result in a failure to build if you're working with a dataset with a CRS variable that does not have this attribute.
  2. The link we're filling in with EPSG codes ( http://www.opengis.net/def/crs/OGC/1.3/) doesn't actually reference any EPSG codes. Should this instead be referring to https://www.opengis.net/def/crs/EPSG/0/?
  3. Some datasets we would like to serve with pygeoapi are WGS84 and they still have a crs variable. These lines of code would change the crs_type attribute to 'ProjectedCRS', even though it should remain as GeographicCRS.
  4. CF conventions also don't require any particular name for the data variable that holds the CRS information. I'd recommend (potentially after checking to see if there's a variable named crs), looking to see if any of the spatio-temporal variables have a grid_mapping attribute, and use the information instead. This would allow us to dynamically search for the variable that contains coordinate reference system information, in case it is not named crs.

Screenshots/Tracebacks
Here's the error I got:

2024-03-01 13:05:15   File "/mambaforge/lib/python3.11/site-packages/pygeoapi/provider/xarray_.py", line 94, in __init__
2024-03-01 13:05:15     raise ProviderConnectionError(err)
2024-03-01 13:05:15 pygeoapi.provider.base.ProviderConnectionError: 'DataArray' object has no attribute 'epsg_code'
2024-03-01 13:05:15 ERROR: openapi.yml could not be generated ERROR
@sjordan29 sjordan29 added the bug Something isn't working label Mar 4, 2024
@sjordan29
Copy link
Contributor Author

I've started drafting a solution to this on our fork within _get_coverage_properties with the following

        grid_mapping = self.parse_grid_mapping()
        if grid_mapping is not None:
            crs_attrs = pyproj.CRS.to_cf(self._data[grid_mapping].attrs)
            epsg_code = crs_attrs.to_epsg()
            if epsg_code == 4326:
                pass
            elif epsg_code is None:
                LOGGER.debug('No epsg code parsed. Default CRS used.')
            else:
                properties['bbox_crs'] = f'https://www.opengis.net/def/crs/EPSG/0/{epsg_code}'  # noqa
                properties['inverse_flattening'] = \
                    crs_attrs['inverse_flattening']
                if crs_attrs['grid_mapping_name'] != 'latitude_longitude':
                    properties['crs_type'] = 'ProjectedCRS'

        elif 'crs' in self._data.variables.keys():
            try:
                properties['bbox_crs'] = f'https://www.opengis.net/def/crs/EPSG/0/{self._data.crs.epsg_code}'  # noqa

                properties['inverse_flattening'] = self._data.crs.\
                    inverse_flattening

                properties['crs_type'] = 'ProjectedCRS'
            except KeyError:
                LOGGER.debug('Unable to parse CRS. Assuming default WGS84.')

Where

    def parse_grid_mapping(self):
        spatiotemporal_dims = (self.time_field, self.y_field, self.x_field)
        grid_mapping_name = None
        for var_name, var in self._data.variables.items():
            if all(dim in var.dims for dim in spatiotemporal_dims):
                try:
                    grid_mapping_name = self._data[var_name].attrs['grid_mapping']
                except KeyError:
                    LOGGER.debug()
        return grid_mapping_name

Some key changes / notes:

  • Rather than assuming the variable containing coordinate reference system information is in a variable named crs, we instead first look to see if there is grid_mapping information contained within the spatiotemporal variables. If there is, look for CRS information within that variable. This is aligned with CF conventions.
    • We can then try to parse the EPSG code using pyproj, which is already a requirement.
  • If no grid mapping exists, then I return to the original code where we look for a variable named crs, and pull the epsg code and inverse flattening from there. I added try/except logic, so that this would not necessarily fail if these attributes were not included, and it would just default back to WGS84.
    • I'm still wondering if we should not necessarily assume crs_type of ProjectedCRS in this case -- an xarray dataset could include a crs variable that is still a GeographicCRS.
  • I updated the opengis link when using EPSG codes to f'https://www.opengis.net/def/crs/EPSG/0/{self._data.crs.epsg_code} -- it was previously http://www.opengis.net/def/crs/OGC/1.3/{self._data.crs.epsg_code}, which doesn't appear to lead anywhere.

Question moving forward:

  • What if we can't parse an EPSG code from the CRS variable -- for example, in the case of custom CRS information (e.g., with a custom datum). Could we put use WKT or ProjJson representation in that case? This information is ultimately used in the coverage json response for the id (see spec here). All the spec says is the value of id MUST be a string and SHOULD be a common identifier for the reference system, so perhaps a WKT string representation would suffice?
    • I'm also considering datasets with longitudes ranging from 0-360 under this case (rather than -180 to 180) -- I'm struggling to find an EPSG code for such datasets.

Would love thoughts/ideas/feedback on this solution @dblodgett-usgs @tomkralidis. Also tagging @ptomasula @aufdenkampe and @kieranbartels for input / visibility.

@dblodgett-usgs
Copy link

Keep it simple is going to rule the day here. We should support a fairly narrow (yet common) set of conventions and allow a configuration override of either an epsg code or a WKT string. I think the current implementation is that fairly narrow (yet common) set of conventions atleast as a starting point.

Re: ProjectedCRS -- we should not assume projected if there is a CRS -- only if it is not a lon/lat CRS.

IMHO, 0-360 should nearly always be converted to -180 - 180 for use. If there isn't a set of utility functions that make that translation on the fly we'll either need to drop support for those datasets in EDR or get the utilities written.

If we can't parse any CRS info, we have to assume that the data are WGS84 or fall back on a configuration element that supports an epsg code or WKT string.

@sjordan29
Copy link
Contributor Author

Great -- sounds good @dblodgett-usgs. I will continue down this path, and will add in that configuration functionality.

I don't think utility functions exist to make the translation to -180-180, so those will need to get added. We can exclude 0-360 datasets in the short-term.

@webb-ben
Copy link
Member

@dblodgett-usgs pretty easy to implement that in python.

def normalize(angle):
  angle = angle % 360
  return angle if angle < 180 else angle - 360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants