Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of CRS objects in coordinates #8

Closed
djhoese opened this issue May 21, 2019 · 9 comments
Closed

Handling of CRS objects in coordinates #8

djhoese opened this issue May 21, 2019 · 9 comments
Assignees
Labels
question Further information is requested

Comments

@djhoese
Copy link
Contributor

djhoese commented May 21, 2019

I started looking more at 2D coordinate arrays (especially when they are dask arrays) and using a pyproj CRS object as a coordinate and ran in to a weird situation. I'm hoping @snowman2 or someone else can help me decipher what is happening here.

from pyproj import CRS
import xarray as xr
import dask.array as da

crs1 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=25 +lat_1=25')
crs2 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=35 +lat_1=35')

a = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'yc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'xc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'crs': crs1})
b = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'yc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'xc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'crs': crs2})
c = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'yc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'xc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'crs': crs1})

# Adding DataArrays with different CRS coordinates
a + b

# Results in:
# <xarray.DataArray 'zeros-e5723e7f9121b7ac546f61c19dabe786' (y: 5, x: 5)>
# dask.array<shape=(5, 5), dtype=float64, chunksize=(2, 2)>
# Coordinates:
#     yc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
#     xc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
# Dimensions without coordinates: y, x

# Adding DataArrays with the same CRS coordinates
a + c

# Results in:
# <xarray.DataArray 'zeros-e5723e7f9121b7ac546f61c19dabe786' (y: 5, x: 5)>
# dask.array<shape=(5, 5), dtype=float64, chunksize=(2, 2)>
# Coordinates:
#     yc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
#     xc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
#     crs      object +proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=25 +lat_1=25 +type=crs
# Dimensions without coordinates: y, x

I would have expected a + b to not be allowed since they differ and are not equal. Thoughts? Is this expected?

@djhoese djhoese added the question Further information is requested label May 21, 2019
@djhoese djhoese self-assigned this May 21, 2019
@snowman2
Copy link
Collaborator

I would have expected a + b to not be allowed since they differ and are not equal.

Interesting. It appears that the crs coordinate was lost in that operation. So, in that sense it failed. It must have dropped the parts that weren't equal and kept the parts that were equal.

Well, that is the extent of my insight on this one for now. Glad to see this moving forward 👍

@djhoese
Copy link
Contributor Author

djhoese commented May 29, 2019

New discovery tonight while working on something else, xarray makes the coordinates DataArrays no matter what and in the case of an object it becomes an object "scalar" array. So you have to do data_arr.coords['crs'].item() to get the original CRS object back. Maybe this is a problem and/or the reason the CRS gets dropped in the calculations described above.

<xarray.DataArray 'crs' ()>
array(<CRS: +proj=geos +a=6378137.0 +b=6356752.31414 +h=357860 ...>
Name: unknown
Ellipsoid:
- semi_major_metre: 6378137.00
- semi_minor_metre: nan
- inverse_flattening: 298.26
Area of Use:
- UNDEFINED
Prime Meridian:
- longitude: 0.0000
- unit_name: degree
- unit_conversion_factor: 0.01745329
Axis Info:
- Easting[E] (east) EPSG:9001 (metre)
- Northing[N] (north) EPSG:9001 (metre)
, dtype=object)
Coordinates:
    crs      object +proj=geos +a=6378137.0 +b=6356752.31414 +h=35786023.0 +lon_0=-89.5 +sweep=x +units=m +type=crs

@djhoese
Copy link
Contributor Author

djhoese commented May 29, 2019

Another data point on figuring this out:

In [27]: b = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'y': da.arange(1, 6, chunks=3), 'x': da.arange(2, 7, chunks=3), 'crs': crs2, 'test': 2, 'test2': 2})         

In [28]: a = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'y': da.arange(1, 6, chunks=3), 'x': da.arange(2, 7, chunks=3), 'crs': crs1, 'test': 1, 'test2': 2})         

In [29]: a + b                                                                                                                                                                              
Out[29]: 
<xarray.DataArray 'zeros-e5723e7f9121b7ac546f61c19dabe786' (y: 5, x: 5)>
dask.array<shape=(5, 5), dtype=float64, chunksize=(2, 2)>
Coordinates:
  * y        (y) int64 1 2 3 4 5
  * x        (x) int64 2 3 4 5 6
    test2    int64 2

Basically, xarray drops any non-dimensional coordinates that are not equal. So, as xarray sits right now, there is no easy way to force the CRS coordinate to be checked as far as I can tell.

@dopplershift
Copy link

This seems like a bug in xarray.

@djhoese
Copy link
Contributor Author

djhoese commented Jun 3, 2019

@dopplershift Do you know if you've gotten coordinates to behave like this in the past? I created this issue a couple days ago: pydata/xarray#2996

@dopplershift
Copy link

No, but I remember the coordinate approach being advocated because it shouldn't disappear like that.

@djhoese
Copy link
Contributor Author

djhoese commented Jun 5, 2019

@dopplershift This page http://xarray.pydata.org/en/stable/computation.html#coordinates seems to match what I'm seeing. Specifically non-dimensional coordinates (like my crs and test and test2 coordinates above) will not be preserved if there are conflicts.

@dopplershift
Copy link

😠

The rationale in the docs makes sense, but that doesn't make me happy about it for this case.

@djhoese
Copy link
Contributor Author

djhoese commented Jul 10, 2023

I'm going to close this. I merged #26 today and it copies the behavior of rioxarray regarding CRS handling. That is, .geo.crs will give you a pyproj CRS object, but this is dynamically determined in most cases. If the exact instance of the accessor is still alive then a private ._crs will contain the pyproj CRS object that was found before.

If you, the user, would like to persist this discovery you can use .geo.write_crs to store the CRS information in a CF-friendly way (a grid_mapping variable/coordinate with WKT information). I'm still working on the documentation to reflect the changes done in #26, but while I wait for CI on other PRs I thought I'd start cleaning up some of these issues.

@djhoese djhoese closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants