New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image should warn about unevenly spaced dimensions #1869

Closed
rabernat opened this Issue Sep 12, 2017 · 27 comments

Comments

Projects
None yet
5 participants
@rabernat

rabernat commented Sep 12, 2017

Gridded scientific datasets commonly have dimensions that are unevenly spaced. (For example, an ocean temperature dataset has more vertical levels near the surface than in the deep.) We can easily visualize these datasets correctly with matlplotlib using a pcolormesh or contour plot. But I am unsure about the status of support for such datasets in holoviews. The documentation mentions support for irregularly sampled continuous coordinates, but I couldn't find a clear example.

Here is an example of what I mean using xarray. Consider the following dataset

import xarray as xr
import numpy as np
# monotonically increasing but unevenly spaced coordinates
x = np.arange(20)**2
y = np.arange(10)**2
z = y[:,np.newaxis] * x
data = xr.DataArray(z,
            dims=['y', 'x'],
            coords={'y': y, 'x': x},
            name='randomdata')
data.plot()

xarray renders this correctly, with grid boxes of variable size
image

However, in holoviews, the irregular spacing seems to be lost when I convert to image

import holoviews as hv
%%opts Image (cmap='viridis')
hv.Image(data)

image
all the grid boxes are the same size.

Am I missing something obvious?

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 12, 2017

@rabernat

This comment has been minimized.

rabernat commented Sep 12, 2017

Great @jbednar, hv.QuadMesh((x, y, z[:-1,:-1])) works! I would say that qualifies as "something obvious". 😝

From a user perspective, coming from xarray to holoviews, I guess I would have liked to get some kind of error or warning when trying to create an image with unevenly spaced dimensions.

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 12, 2017

No problem. That does look like a case we should warn, which is even clearer if you increase the nonlinearity in your example (e.g. **7 instead of **2).

@philippjfr philippjfr changed the title from images with unevenly spaced dimensions to Image should warn about unevenly spaced dimensions Sep 12, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Sep 12, 2017

Repurposed this issue to add such a warning, the warning/error should presumably also mention QuadMesh.

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 12, 2017

I'm not sure how strict we should be on "evenly spaced"; to avoid spurious floating-point issues there presumably needs to be some tolerance on just how even they need to be, but that means we'll need some arbitrary threshold. Maybe warn if any cell's width is more than 10% different from the average spacing?

@rabernat

This comment has been minimized.

rabernat commented Sep 12, 2017

Note that xarray had to confront similar issues...for example, inferring intervals for pcolormesh when only cell-center coordinates are given. There could be some useful code snippets here:
https://github.com/pydata/xarray/blob/master/xarray/plot/plot.py#L645
https://github.com/pydata/xarray/blob/master/xarray/plot/plot.py#L620

@jlstevens

This comment has been minimized.

Member

jlstevens commented Sep 12, 2017

This topic has come up a number of times and personally, I've come to the conclusion I would rather make the expectation of even sampling clear to the user rather than try to enforce it (which always bumps against some ad hoc heuristic and judgements about what is 'evenly spaced enough' as well as floating point issues).

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 12, 2017

How would you make that clear to the user except via a warning? Putting it in the docs is not enough, in my opinion.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Sep 12, 2017

I consider this sort of thing to be a part of the semantics of the element, which would be declared in the docs. There is no easy solution but I'm not sure how much trying to validate the input is going to be helpful given there is no way to do it that is always the correct thing to do.

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 12, 2017

Seems to me like there should a threshold where it is quite clear that the spacing is not uniform, and that more people will be helped by a warning at that level than would be annoyed by it. I don't think we should assume people will study all the docs, which is unrealistic.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Sep 12, 2017

It also seems a fairly expensive check to run, assuming users are using elements appropriately most of the time.

Maybe warn if any cell's width is more than 10% different from the average spacing?

This seems reasonable except for the ad hoc threshold. Why is 10% more reasonable than 5% or 1%? Maybe our reasoning should be based on what is clearly visually different as opposed to the exact data values? Where is this threshold set, or is it hard coded?

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 12, 2017

Worst case, the threshold is a parameter on Image, and the user can change it.

I'm not particularly defending 10%; maybe it's 50%, but I do defend the idea that there is some level above which everyone reasonable would agree that this is clearly nonuniform spacing and not the intended usage of Image. So if there is any ambiguity, increase the threshold. But that will still catch nearly every actual problem.

Seems like this check is only run on the x and y axes, not the data, so it doesn't seem particularly expensive to me.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Sep 12, 2017

Worst case, the threshold is a parameter on Image, and the user can change it.

That isn't an option but it could be a config parameter - ideally one which allows this checking to be turned off.

Honestly, there are lots of places in holoviews where I think we can have clear wins in terms of validation, better exceptions etc but I don't think this is one of them. Our time would be better spent fixing issues such as #1867 and #1868 where arbitrary thresholds aren't needed. I really don't like this kind of 'soft' validation that may or may not be appropriate given the properties of the input data.

@rabernat

This comment has been minimized.

rabernat commented Sep 12, 2017

Hopefully I can offer some perspective as a new user. (BTW, thanks to all of the developers for your excellent and hard work on this amazing project.)

I am coming to holoviews as an experienced xarray user and developer. I wanted to bring my xarray data into holoviews, so I read the user guide on gridded datasets. In fact, the docs do not appear to have a direct example of how to do this. However the very first example said, "Let us start by declaring an Image with explicit x- and y-coordinates." I recognized this as similar to xarray's data model, so I tried creating an image from my xarray data as hv.Image(data). It worked, as in there were no errors or warnings. However, I soon discovered that holoviews.Image was misrepresenting my data.

I would rather make the expectation of even sampling clear to the user

This was most certainly not clear to me. There is an indirect reference to "regularly sampled arrays" in the getting started Gridded Data page, but the interpretation of "regular" is ambiguous. Morever, my thinking was why does Image even bother keeping track of explicit x- and y-coordinates if they are ignored in visualization? The option to pass explicit coordinates to Image creates the expectation that they will be respected.

I think the solution is to actually forbid the creation of Image objects with explicit x and y coordinates if you don't plan to render such images correctly.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Sep 12, 2017

Thanks for your perspective!

User feedback is really invaluable for making such decisions.

why does Image even both keeping track of explicit x- and y-coordinates if they are ignored in visualization?

That is a good point.

Originally, Image only supported numpy arrays where this wasn't an issue. Once it was extended to support xarray, the explicit x- and y- coordinates became redundant (except at the corners to define the overall bounds).

The option to pass explicit coordinates to Image creates the expectation that they will be respected.

I agree! One 'solution' would be to say that Image can only accept numpy arrays and only QuashMesh supports xarrays with explicit coordinates. This would be rather limiting which is why we let Image operate the way it does (ignoring the exact coordinates) leading to this particular problem.

I wouldn't object to a percentage threshold parameter in hv.config. It won't be a suitable threshold all of the time and I suppose if we only use it to issue warnings, then it might help people enough of the time to be worthwhile.

Jim suggested 10% or 50% but really, I am thinking that more than a 1% deviation suggests the data isn't really 'regular'. My issue with a threshold is there really isn't a 'correct' answer to pick!

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 13, 2017

I have no opinion about the actual threshold, because I think that people usually only switch to irregular sampling when regular sampling becomes very inappropriate for what they are doing. Regular sampling is just usually so much easier! So I would expect datasets to mainly fall into either the "exactly or very nearly exactly regular sampling" or "very irregular sampling" camps, with a nearly excluded middle. Sure, the middle ground isn't entirely empty, because e.g. one could have a nonlinear projection that is very nearly regular over the range you are looking at, but still, people will probably eventually get an warning that lets them know they are doing it wrong with just about any reasonable threshold we set. So I'm not bothered about the actual value of the threshold, just that there is one.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Sep 13, 2017

Sure, the middle ground isn't entirely empty, because e.g. one could have a nonlinear projection that is very nearly regular over the range you are looking at.

That is exactly the case I had in mind and why I suggested a lower threshold. The idea is that if there is no warning, we can be confident that the output will be visually 'good enough' i.e the Quadmesh output should look identical. I think 1% might be a reasonable value as this should be generous compared to the variation in spacing when regularly sampling at float32 precision.

@jbednar

This comment has been minimized.

Member

jbednar commented Sep 13, 2017

Sounds good to me!

@philippjfr philippjfr added this to the v1.9 milestone Sep 23, 2017

@philippjfr philippjfr self-assigned this Oct 7, 2017

@philippjfr philippjfr modified the milestones: v1.9, v1.10 Oct 31, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Nov 30, 2017

So I'm coming back to this because we will now have a solid alternative for non-equally and irregularly sampled datasets in the form of the revamped QuadMesh (see #2160). IMO even 1% is too large, the sampling differences should not be larger than numerical precision, I'll do some testing but this should be enough:

assert np.unique(np.round(np.diff(coords), sys.float_info.dig)).size == 1
@jbednar

This comment has been minimized.

Member

jbednar commented Nov 30, 2017

All I was arguing was that there should be some threshold; the tighter it can be without spurious warnings, the better!

@jlstevens

This comment has been minimized.

Member

jlstevens commented Dec 11, 2017

The tighter the threshold, the better. @philippjfr Do you think we will be able to close this issue when PR #2160 is merged?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Dec 11, 2017

Yes, that's the goal.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Dec 20, 2017

Here's the actual validation I think I'll be going with:

xvals = np.unique(np.diff(self.dimension_values(0, expanded=False)))
if len(xvals)>1 and np.abs(xvals.min()-xvals.max()) > sys.float_info.epsilon*10:
    raise ValueError("%s dimension %s is not evenly sampled, "
                     "please use the QuadMesh element for "
                     "unevenly or irregularly sampled data." %
                     (type(self).__name__, self.get_dimension(0)))

ValueError: Image dimension x is not evenly sampled, please use the QuadMesh element for unevenly or irregularly sampled data.

@ea42gh

This comment has been minimized.

Contributor

ea42gh commented Dec 20, 2017

so how would I turn the warning off if I actually wanted to use hv.Image?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Dec 20, 2017

It's not a warning, it's an error. You shouldn't use hv.Image for irregularly or unevenly sampled data, you should use hv.QuadMesh instead.

@jbednar

This comment has been minimized.

Member

jbednar commented Dec 20, 2017

Or, if you really wanted to use Image, you could strip out the coordinate information from the xarray, right? But you should really know what you are doing, in that case.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Feb 5, 2018

This has now been implemented for a while and seems to be working okay after some teething problems with validation that was too strict.

@philippjfr philippjfr closed this Feb 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment