Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New infer_intervals keyword for pcolormesh #1079

Merged
merged 9 commits into from
Nov 10, 2016
Merged

Conversation

fmaussion
Copy link
Member

Addresses #781

@jhamman what do you think? I'm not sure if infer_interval_breaks is the best name for the keyword but I didn't come up with anything better than that. (maybe infer_coord_intervals?)

@shoyer
Copy link
Member

shoyer commented Nov 4, 2016

I would call this just infer_intervals. breaks in the helper function name is referring to an implementation detail of how we store the intervals (by using breaks).

@ocefpaf
Copy link
Contributor

ocefpaf commented Nov 4, 2016

Is infer_intervals creating coordinate bounds for plotting? It looks like it. What about when we do have cell boundaries in the dataset? (See http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#cell-boundaries). Will those be ignored and the inferred one used?

(Sorry if I am only making noise here and this does not make sense. In that case just ignore my comment.)

@fmaussion
Copy link
Member Author

Is infer_intervals creating coordinate bounds for plotting? It looks like it.

Yes, it is a special case for pcolormesh though

What about when we do have cell boundaries in the dataset?

Currently the entire xarray workflow is built upon the implicit assumption that the coordinates are always at the grid-point center ( @shoyer correct me if I'm wrong). I'm not sure if the additional complexity added by cell boundaries is on the xarray devs priority list...

@ocefpaf
Copy link
Contributor

ocefpaf commented Nov 4, 2016

Yes, it is a special case for pcolormesh though

A special case that I love. In the old Matlab day we had horrible hacks to get pcolor to work properly 😄

I'm not sure if the additional complexity added by cell boundaries is on the xarray devs priority list...

That is fine. Maybe a warning in the docs would be nice. So people can revert to a "manual" plotting to get the boundaries right.

@fmaussion
Copy link
Member Author

In the old Matlab day we had horrible hacks to get pcolor to work properly

pcolormesh and imshow are indeed very consistent in xarray:

import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xarray as xr

da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['lat', 'lon'],
                  coords = {'lat': np.linspace(0, 30, 4),
                            'lon': np.linspace(-20, 20, 5)})

f = plt.figure(figsize=(5, 8))

ax = plt.subplot(3, 1, 1, projection=ccrs.PlateCarree())
da.plot.imshow(ax=ax, transform=ccrs.PlateCarree())
lon, lat = np.meshgrid(da.lon.values, da.lat.values)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('imshow')

ax = plt.subplot(3, 1, 2, projection=ccrs.PlateCarree())
da.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree())
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh')

ax = plt.subplot(3, 1, 3, projection=ccrs.PlateCarree())
da.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), infer_intervals=False)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh - no intervals')

figure_1

In most cases setting infer_intervals=False won't be a good idea, but maybe for some exotic projections (at the pole in @jhamman example) it might be easier this way (albeit a bit "wrong").

@shoyer
Copy link
Member

shoyer commented Nov 4, 2016

We definitely ignore cell boundaries -- they don't (yet) have any place in the xarray data model.

It would be nice to have a built in "interval" data type, along with an IntervalIndex, that we could use for storing and doing lookups along interval axes. At one point I was working on this for pandas (see pandas-dev/pandas#7640), but then abandoned my effort when it seemed like it would be too much work to get it finished.

@ocefpaf
Copy link
Contributor

ocefpaf commented Nov 4, 2016

We definitely ignore cell boundaries -- they don't (yet) have any place in the xarray data model.

Even though I would love to have that functionality I do not believe it is high priority. cell boundaries is a corner case, at least in my field of work, and 99% of the time it is OK to infer the intervals. Also, the name of the key word is quite clear: infer_intervals.

Maybe you could only add a note in the docs mentioning that the cell boundaries might exist?

"""
Pseudocolor plot of 2d DataArray

Wraps matplotlib.pyplot.pcolormesh
"""

if not hasattr(ax, 'projection'):
if infer_intervals:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this change. I'm not sure I like infer_intervals to default to True. This is going to break a bunch of our plotting code. My preference, and I'm open to other ideas at this point, would be to default to infer_intervals=None with something like this logic:

if (infer_intervals is None) and (hasattr(ax, 'projection'))
    infer_intervals = False
    warnings.warn("infer_intervals=None is deprecated, please specify either True or False",
                  DeprecationWarning)
if infer_intervals:
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand... infer_intervals=True was the de-facto default until V0.7.2, and from the example above it is clear that setting it to False is almost never what you want to do. Can you provide an example where this is a good thing?

I agree your concerns about breaking old code though. How about keeping the current behavior for one more version, but issuing a warning that in the next version (probably 0.9.1 or 0.10), it is going to be set to True per default?

For my purposes this is going to imply that I'm going to use imshow() and not pcolormesh anymore for the time being.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting infer_intervals=False is exactly what you want if you have irregularly spaced coordinates (see example here: #781 (comment)).

I think something like the logic I showed above may be a good compromise. I'm happy to transition to always specifying infer_intervals=False we should be more transparent about the change (hence the warning along with a None for a default value. Would you be happier with this logic?

if (infer_intervals is None):
    if (hasattr(ax, 'projection'))
        infer_intervals = False
        warnings.warn("infer_intervals=None is deprecated, please specify either True or False",
                      DeprecationWarning)
    else:
        infer_intervals = True
if infer_intervals:
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, with irregular coordinates none of the solutions is satisfying. infer_intervals=True crops one column, while infer_intervals=False crop one column and one line of data:

import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xarray as xr

lon, lat = np.linspace(-20, 20, 5), np.linspace(0, 30, 4)
lon, lat = np.meshgrid(lon, lat)

da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['y', 'x'],
                  coords = {'lat': (('y', 'x'), lat),
                            'lon': (('y', 'x'), lon)})

f = plt.figure()

ax = plt.subplot(2, 1, 1, projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), infer_intervals=True)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh')

ax = plt.subplot(2, 1, 2, projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), infer_intervals=False)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh - no intervals')

plt.show()

figure_1-1

I'll try to have a closer look later next week

@fmaussion fmaussion changed the title New infer_interval_breaks keyword for pcolormesh New infer_intervals keyword for pcolormesh Nov 4, 2016
@fmaussion
Copy link
Member Author

fmaussion commented Nov 6, 2016

After giving this a few more thoughts:

  1. if the coordinates are 1D (i.e. the grid is regular in the scipy sense: the coordinates don't have to be evenly spaced), infer_intervals should be set to True by default. This should make everybody happy (including @jhamman and me), and this will cover a huge majority of the geoscientific grids.
  2. if the coordinates are 2D, what xarray was doing until now was wrong. _infer_interval_breaks was inferring intervals on the first axis only and ignoring the second, so regardless of whether we were plotting on a projection or not, the plot output was cropped (see example above)

So what do we do with this? @jhamman 's suggestion is to set infer_intervals to False, which is equivalent to saying: "this plot is going to be a bit wrong but this is not xarray's problem". I guess that it's fine to do so, but we'd have to be happy with the following:

import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xarray as xr

lon, lat = np.linspace(-20, 20, 5), np.linspace(0, 30, 4)
lon, lat = np.meshgrid(lon, lat)
lon += lat/10
lat += lon/10
da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['y', 'x'],
                  coords = {'lat': (('y', 'x'), lat),
                            'lon': (('y', 'x'), lon)})

ax = plt.subplot(1, 2, 1, projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), infer_intervals=False)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('2d irreg case - pcolor')

ax = plt.subplot(1, 2, 2, projection=ccrs.PlateCarree())
da.plot.contourf('lon', 'lat', ax=ax, transform=ccrs.PlateCarree())
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('2d irreg case - contourf')

figure_1-3

Note that one line and one column of data disappeared (so that the colorbar range doesn't correspond to what we see) and the coordinates are not understood as being in the cell center. The advantage of this choice are twofold:

  • it's kind of coherent with how contourf would interpret the coordinates (see example above)
  • if we agree on "1D = infer per default" and "2D = don't infer per default", we might even get rid of the new keyword altogether.

Let's call this option "Option A".

[edited]

I've been playing around with an "Option B", which includes making _infer_interval_breaks accept an axis kwarg. Regardless of the implementation, we could reach the following result:

figure_1

[\edited]

This plot is closer to what I think the grid actually represents, but I am biased towards my usage of xarray (gridded geoscientific models). The disadvantages of option B are:

  • it adds a bit of complexity
  • if you think about it, the problem of plotting irregular grids is not really an xarray problem. If there was a perfect way to handle an irregular grids without having bounded coordinates, I guess that matplotlib would have a tool for it.

I currently tend to go for option A, since option B is a bit awkward and users who have bounded coordinated available somewhere (as suggested by @ocefpaf ) will have to implement their own plotting routine anyway.

@fmaussion
Copy link
Member Author

I've come up with a cleaner implementation which doesn't require scipy and produces a very accurate plot:
figure_1

I'm going to clean all this later today and update my PR, I think that we can come to something nice.

@jhamman : I wonder if the problem you are having in your example is related to flaoting point issues at the pole. Would you mind sending me the file and the code that produces the plot so that I can have a look at it?

@jhamman
Copy link
Member

jhamman commented Nov 7, 2016

You can get data on the same grid (different variable, same coords) from the sample data ds = xr.tutorial.load_dataset('rasm').isel(time=0). From there, my example should work.

@fmaussion
Copy link
Member Author

@jhamman @shoyer this is ready for another review (the test failure seems unrelated).

@ocefpaf I added a note to the docs

@jhamman OK now I get why it clearly doesn't make sense to infer anything in your dataset (plot below). Just out of curiosity: does RASM have a map projection on which it has regular coordinates, too?

figure_1-1

@fmaussion
Copy link
Member Author

Actually, you wouldn't like to infer intervals on these coordinates at all, regardless if you are on a map or not. Should I remove the if hasattr(ax, 'projection') altogether?

@fmaussion
Copy link
Member Author

Should I remove the if hasattr(ax, 'projection') altogether?

Finally I think it's ok to keep it as it is now, since the data above is likely to always be plotted on a map anyways.

@rabernat
Copy link
Contributor

rabernat commented Nov 8, 2016

This looks like a really nice PR.

There is some intersection with my "multidimensional coordinates" tutorial:
http://xarray.pydata.org/en/stable/examples/multidimensional-coords.html

At the least, we should standardize the terminology in the docs. Are these "irregular coordinates" or "multidimensional coordinates"? I don't particularly care, but we should be consistent.

@fmaussion
Copy link
Member Author

@rabernat thanks for the link! I'll update the docs with a link to your example and with "multidimensional" instead of "irregular".

@fmaussion
Copy link
Member Author

fmaussion commented Nov 8, 2016

@rabernat : couldn't your tutorial also use the ipython directive and run at build time instead of using the parsed literals? I see that the other examples are also static, is there a reason for this?

@rabernat
Copy link
Contributor

rabernat commented Nov 8, 2016

is there a reason for this?

My laziness / ignorance of the documentation build process.

@fmaussion
Copy link
Member Author

OK. Let's see what the gurus say about this, maybe there is a reason to keep it like this

@shoyer
Copy link
Member

shoyer commented Nov 8, 2016

Yes, could certainly run this at doc build time instead of using a notebook. The advantage of the notebook is that it's easier to edit/assemble, and also easier to download and adjust.

In an ideal world, I think we would check in a notebook with cleared output and then run it and convert to HTML at doc build time. Looks like that's exactly what nbsphinx does if anyone wants to take a go at setting that up. Could be a much nicer way to write the docs than RST.

@fmaussion
Copy link
Member Author

fmaussion commented Nov 8, 2016

Note that in that case, multidimensional-coords and monthly-means are already formatted in rst.

@shoyer
Copy link
Member

shoyer commented Nov 8, 2016

Right, we used a static ipynb -> rst converter

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this fix. I know my suggestions lead to a bunch of complicated logic but I think this is the best solution for now. Thanks for putting in the hard work to get here.

last = coord[-1] + deltas[-1]
return np.r_[[first], coord[:-1] + deltas, [last]]
if axis == 0:
deltas = 0.5 * (coord[1:, ...] - coord[:-1, ...])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write an axis agnostic version of this with np.diff, np.take and np.concatenate, e.g.,

deltas = 0.5 * np.diff(coord, axis)
first = np.take(coord, [0], axis)
last = np.take(coord, [-1], axis)
trim_last = tuple(slice(None, -1) if n == axis else slice(None)
                  for n in range(coord.ndim))
return np.concatenate([first, coord[trim_last] + deltas, last], axis)

Maybe a slight improvement? (Up to you on whether to use it)

first = coord[0] - deltas[0]
last = coord[-1] + deltas[-1]
return np.r_[[first], coord[:-1] + deltas, [last]]
if axis == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an assert or error to verify that axis is one of the expected values.

else:
# we have to infer the intervals on both axes
x = _infer_interval_breaks(x, axis=1)
x = _infer_interval_breaks(x, axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever!

@fmaussion
Copy link
Member Author

I had to modify your code a little bit for this to work, but this is now accepting any axis. Quite a useful function actually!

last = np.take(coord, [-1], axis=axis) + np.take(deltas, [-1], axis=axis)
trim_last = tuple(slice(None, -1) if n == axis else slice(None)
for n in range(coord.ndim))
return np.concatenate([first, coord[trim_last] + deltas, last], axis=axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty! Loved the use of axis here!

@@ -112,6 +112,17 @@ def test__infer_interval_breaks(self):
self.assertArrayEqual(pd.date_range('20000101', periods=4) - np.timedelta64(12, 'h'),
_infer_interval_breaks(pd.date_range('20000101', periods=3)))

# make a bounded 2D array that we will center and re-infer
xref, yref = np.meshgrid(np.arange(6), np.arange(5))
cx = (xref[1:, 1:] + xref[:-1, :-1]) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Python 2, I think this will still have integer dtype. Can you divide by 2.0 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a from __future__ import division instead

@@ -1,3 +1,5 @@
from __future__ import division
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this to the top of literally every single Python file in xarray?

That will keep this easier to reason about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes OK. shouldn't I do this in a separate PR? I'll do it tomorrow though, it's getting late in Innsbruck...

@shoyer
Copy link
Member

shoyer commented Nov 10, 2016

Yes, thanks!

On Thu, Nov 10, 2016 at 2:43 PM, Fabien Maussion notifications@github.com
wrote:

@fmaussion commented on this pull request.

In xarray/test/test_plot.py #1079:

@@ -1,3 +1,5 @@
+from future import division

Yes OK. shouldn't I do this in a separate PR? I'll do it tomorrow though,
it's getting late in Innsbruck...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1079, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKS1vqdXfCSLnXeKwewm2EH6rqzvp1fks5q854MgaJpZM4Ko_0w
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants