Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rasterio's transform instead of homemade coordinates #1712

Merged
merged 20 commits into from Jan 26, 2018

Conversation

fmaussion
Copy link
Member

@fmaussion fmaussion commented Nov 12, 2017

@fmaussion
Copy link
Member Author

OK, I think this is ready for review (@shoyer or @jhamman)

@shoyer
Copy link
Member

shoyer commented Nov 13, 2017

@maaleske would you mind taking a look at this to see if it looks right to you?

if LooseVersion(rasterio.__version__) < '1.0':
transform = riods.affine
else:
transform = riods.transform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use attribute lookup fallback instead, e.g.,

try:
    transform = riods.transform
except AttributeError:
    transform = riods.affine

That's usually slightly more robust than explicitly checking versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transform attribute exists on the older version as well, but there it is a tuple specifying the transform in GDAL format. Maybe it'd be better to check whether transform is an instance of Affine instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be better to check whether transform is an instance of Affine instead?

I don't know... Reading the transform attribute with rio < 1.0 raises a warning, reading the affine attribute with rio >= 1.0 raises a warning, so we'll have to catch the warnings in order to test this. Would that be more elegant?

coords['x'] = np.linspace(start=x0 + dx/2, num=nx,
stop=(x0 + (nx - 1) * dx) + dx/2)
x, _ = (np.arange(nx), np.zeros(nx)) * transform
_, y = (np.zeros(ny), np.arange(ny)) * transform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hardly an expert here... but is there a reason why this "obvious" version doesn't work?

x, y = (np.arange(nx), np.arange(ny)) * transform

(it might be worth explaining in a comment!)

Copy link
Contributor

@maaleske maaleske Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would only work if the image was square (width == height), in which case the transform would be applied to the coordinates on the diagonal. The np.zeros version transforms the coordinate points of the edges (that could also probably be np.zeros_like?)

There's a slightly deeper problem here actually, since given a rotated /sheared dataset, we can in any case only generate correct coordinates for the edge pixels:

import numpy as np
from affine import Affine

transform = Affine.rotation(30.0)
x = np.arange(5)

(x, np.zeros_like(x)) * transform
# (array([ 0.        ,  0.8660254 ,  1.73205081,  2.59807621,  3.46410162]),
#  array([ 0. ,  0.5,  1. ,  1.5,  2. ]))

(x, np.ones_like(x)) * transform
# (array([-0.5       ,  0.3660254 ,  1.23205081,  2.09807621,  2.96410162]),
#  array([ 0.8660254,  1.3660254,  1.8660254,  2.3660254,  2.8660254]))

Unless there's some way to implement spatial indexing using the Affine transform, I think there should be a warning thrown when the transform has non-zero cross-coordinate elements.

Edit: The most general way would probably be this:

xy = np.meshgrid(np.arange(nx), np.arange(ny)) * transform

and make it a multidimensional coordinate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most general way would probably be this:
xy = np.meshgrid(np.arange(nx), np.arange(ny)) * transform
and make it a multidimensional coordinate.

Yes, but I'd like to avoid this when the coordinates are regular (again, I haven't met a single geotiff file which isn't regular in it's reference coordinate system: when defined in UTM for example, the x, y coordinates are 1D, the lon lats are not, as in the documentation example: http://xarray.pydata.org/en/latest/auto_gallery/plot_rasterio.html#sphx-glr-auto-gallery-plot-rasterio-py).

Do you know a way to know for sure if we need 2d coordinates or not?

(another compelling reason is that many geotiff files are huge, and having non-lazy 2d coordinates isn't an option)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check for transform.is_rectilinear to determine whether 1d coordinates can be used.

I still think it would be best to always distinguish the geocoordinates from the pixel coordinates regardless of the dataset, i.e. x, y should always be from 0.5 to width/height, with possible other coordinates added if crs/transform is found. Is there a way to build a lazy 2d coordinate array that would work as a DataArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check for transform.is_rectilinear to determine whether 1d coordinates can be used.

Thanks! Will have a look

I still think it would be best to always distinguish the geocoordinates from the pixel coordinates regardless of the dataset, i.e. x, y should always be from 0.5 to width/height

I don't understand why this would be useful. For pixel based selection xarray has .isel, and the coordinates are the correct representation of the data in space. Do you have an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this would be useful. For pixel based selection xarray has .isel, and the coordinates are the correct representation of the data in space. Do you have an example?

You're right that isel is enough for pixel selection (though then you don't have center-of-pixel pixel coordinates). I guess I'm used to having x and y always denote pixel coordinates, with any other coordinates being explicitly labeled, and I'm a bit worried about the meaning of the coordinate changing if I change the dataset transform without it being reflected in the coordinate name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I change the dataset transform without it being reflected in the coordinate name

If geotiffs had a general way to name the coordinates I would be glad to include it here. But the way I see it the [x, y] pair we are producing here can be understood as Cartesian coordinates in a given reference system.

Most of the cases I'm dealing with, the reference system is a proper map projection (in this case [x, y] are "eastings" and "northings", in m) or an equirectangular projection (in which case [x, y] are "longitudes" and "latitudes").

We agreed before (this comment) that xarray should not care about coordinate transforms because they're too specialized, but I think that parsing the cartesian coordinates is OK (provided that we are able to compute them in a robust way and know if the coords are 1D or 2D).

Would you be in favor of removing the coordinates altogether? (this is possible in the xarray data model).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaning slightly towards removal; I mostly deal with non-referenced images where each person will place (0,0) in a different corner of the image by default, so having less coordinates to worry about is usually for the better (especially since the rasterio convention on dealing with transformless data is not really clearly defined).

We agreed before (this comment) that xarray should not care about coordinate transforms because they're too specialized, but I think that parsing the cartesian coordinates is OK (provided that we are able to compute them in a robust way and know if the coords are 1D or 2D).

I agree on not caring about specialized stuff in xarray, but it does make it necessary to check that all the required data is accessible for doing it later (which is what I'm struggling a bit with #1583). If you do use the transform, I think you should commit to it; If I were doing stuff involving both rectilinear and non-rectilinear stuff, not having the code work on one of the two would raise my suspicions on it's correctness for the other case (and I'm not too sure if we aren't still missing something important w.r.t. the transform).

Also: There is probably a desync happening currently with the pixel center coordinates and the transform attribute; Since the coordinates are shifted by a half, shouldn't the reverse be applied to the relevant elements in the transform? Otherwise when trying to use the included transform for generating the lon/lat/etc. coordinates the results will be off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaning slightly towards removal;

I think that for most geoscientific purposes, the coordinates are quite useful. Would a parse_coordinates=True/False kwarg be a possible compromise?

Also: There is probably a desync happening currently with the pixel center coordinates and the transform attribute; Since the coordinates are shifted by a half, shouldn't the reverse be applied to the relevant elements in the transform? Otherwise when trying to use the included transform for generating the lon/lat/etc. coordinates the results will be off.

I needed some time to understand what you mean here, but now I get it. I guess you are right, but at the moment users start to play with the transform attribute, they probably don't need our coordinates either (a possible case for the parse_coordinates kwarg).

Btw I replaced the code with the simpler:

        x, _ = (np.arange(nx)+0.5, np.zeros(nx)+0.5) * transform
        _, y = (np.zeros(ny)+0.5, np.arange(ny)+0.5) * transform

(i.e. no Affine shift), which does the same thing.

See also: rasterio/rasterio#1199

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmaussion Sorry, I had missed the shift you had in the transform previously (transform.translate(0.5, 0.5)); That was more correct, while the new one matches the old code with the desync issue. You are right that if the transform has already been used to generate the coordinates it should not really be needed anymore, and vice versa, but I would still expect it to match the coordinates I'm seeing regardless of the case, but doing the shift should be documented clearly.

I think that for most geoscientific purposes, the coordinates are quite useful. Would a parse_coordinates=True/False kwarg be a possible compromise?

I'm really fine with any version that doesn't generate incorrect coordinates, regardless of parameters. If you do add a parameter it might be better to name it geo_coords to be clear on what you are actually generating.

@fmaussion
Copy link
Member Author

I added a parse_coordinates=True kwarg and added a check for 2D coordinates.

I proposed an implementation for the 2D case, but it feels a bit clunky. A second option would be to not parse 2D coords and raise a Warning in case of parse_coordinates=True and 2D coordinates.
(I don't think that 2D coords happen very often).

Thoughts?

@snowman2
Copy link
Contributor

Would it be possible to put the coordinate generation into a separate function? It would be nice to pass in the affine, width, height, and is_2d as args and get back the spatial coordinates.

@fmaussion
Copy link
Member Author

@snowman2 I added an example in the code on how to do this - can be useful for downstream libraries

@maaleske @jhamman this is ready for review / opinions. This PR still closes #1686 in a backwards compatible manner and adds a special case for irregular coordinates. I don't think that 2D coordinates happen very often (never encountered one in my work), but xarray should handle them now. The PR also makes some minor other modifications.

@maaleske
Copy link
Contributor

@fmaussion Seems okay to me. The rasterio version check is a bit iffy but I can't think of a way to do it better.

I do agree with @snowman2 though, having the coordinate generation separately exposed would be nice (and easier to write unit tests for). Something like

def xy_pixel_coordinates(nx, ny, transform=None)
    transform = Affine.identity if not transform else transform

    # xarray coordinates are pixel centered
    x, _ = (np.arange(nx)+0.5, np.zeros(nx)+0.5) * transform
    _, y = (np.zeros(ny)+0.5, np.arange(ny)+0.5) * transform
    return x,y

would probably suffice, though I don't know which module this would fall under. I'd imagine it might be useful for backends besides rasterio, though.

@fmaussion
Copy link
Member Author

Thanks @maaleske for having a look.

having the coordinate generation separately exposed would be nice

I agree, but this could be done in a separate PR as this needs another kind of discussion: do we want these kind of domain-specific tools in the codebase (and where?), and is Affine sufficiently used to justify this? Here we use Affine because it is a rasterio dependency, but I don't know if it is used outside of it.

@fmaussion fmaussion requested a review from jhamman January 3, 2018 16:44
@fmaussion
Copy link
Member Author

I wonder if someone from @pydata/xarray wants to have a look -- if possible I'd like to get this in before the next point release. Thanks!

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have minor comments but generally this looks good to me

transform = riods.transform
if transform.is_rectilinear:
# 1d coordinates
parse = True if (parse_coordinates is None) else parse_coordinates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could remove parentheses if you like here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Create a geotiff file
with warnings.catch_warnings():
# This warning is expected
warnings.filterwarnings('ignore', category=UserWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which specific warning is expected here?

pytest.assert_warns() might be more appropriate, if we want to test the warning as part of the API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rasterio which sends a warning here. I explain it in more detail know, and catch only this specific warning.

@fmaussion
Copy link
Member Author

Thanks for the review @shoyer ! I will wait for #1817 to be merged before going on here.

@fmaussion fmaussion merged commit 015daca into pydata:master Jan 26, 2018
@fmaussion
Copy link
Member Author

All green! Will try to get to the tests in the upcoming days

@fmaussion fmaussion mentioned this pull request Feb 6, 2018
4 tasks
snowman2 added a commit to erdc/gazar that referenced this pull request Jul 15, 2018
snowman2 added a commit to erdc/gazar that referenced this pull request Jul 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants