Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cnorm option #4567

Merged
merged 98 commits into from Dec 1, 2020
Merged

Implement cnorm option #4567

merged 98 commits into from Dec 1, 2020

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Aug 25, 2020

Implements cnorm option to switch between 'linear', 'log' and 'eqhist'.

Screen Shot 2020-08-25 at 4 09 15 PM

  • Add tests
  • Update docs
  • Handle clashes with logz option

@philippjfr philippjfr added the type: enhancement Minor feature or improvement to an existing feature label Aug 25, 2020
@jlstevens
Copy link
Contributor

jlstevens commented Aug 25, 2020

Summarizing a discussion between @jbednar @philippjfr and myself:

  • We can (eventually) deprecate clipping_colors with options for minimum, maximum and nan colors.
  • These have clear semantics but won't ever apply when auto-ranging: only with explicit ranges can you be above/below the highest/lowest value (really the semantics of 'over' and 'under' are clearer than 'max' and 'min')
  • For rasterized output, specifically with the 'count' aggregator we can set the dimension range to (1, None) where we have defined 1 as the lowest true count because 0 really is missing data (this is datashader semantics).
  • We can set default options for image/raster types in HoloViews such that values under/over the declared range are transparent (instead of extending the colormap which is Bokeh's default behavior).
  • Note, we wouldn't set these defaults for any other elements (which would continue to follow Bokeh semantics) and we already set the nan color to transparent for image types in HoloViews (overriding Bokeh's default which is gray).
  • The naming is up for debate with suggestions including 1) following bokeh's low_color, high_color and nan_color 2) max_color, min_color and nan_color or 3) cunder, cover and cnan.
  • Personally I would like some variant of the third option where the concept of being above/below the max/min is clear with the c prefix to group with the other colormap options like cnorm, clim, cmap, cformatter etc. Another variant of this might be cbelow, cabove and cnan or clower, cupper and cnan.
  • The value of None would mean (follow the defaults) and we would keep the string 'transparent' for transparent pixels. That way, someone who wants the Bokeh default behavior for an image could use image.opts(cbelow=None) or similar.
  • We are happy with cnorm and the default will be 'linear'.

Open issue:

  • One problem we have is that 'fire' is our default datashader colormap which has a white peak value. If you have transparent pixels, you will show the plot background which is typically white. This looks quite horrible! Two options would be to 1) set the bgcolor to something other than white for images (not a great solution!) or 2) use a different colormap than fire which we have discussed in the past but would break backwards compatibility.

Other than option naming and the issue with the 'fire' colormap, I think this sounds like a good plan!

@jbednar
Copy link
Member

jbednar commented Aug 25, 2020

Sounds good. Yes, I think there's already an open issue for dealing with fire; it's really not a suitable colormap for the default white-background pages, particularly not when missing values are possible. I also prefer a variant of option 3 for the naming, because "low"/"high" and "max"/"min" are ambiguous or misleading; the specified behavior does not occur at the low or min value, it occurs below or under the min value. So names like below or under are more accurate and easier to reason about; it's clearer that you're defining what happens to values below or under the range being colormapped, not what happens at that lower bound. Given that matplotlib uses over and under for this purpose, I'd favor those terms, but note that unfortunately cover is going to be read as "cover" (cuvver, not "c"-over), by any new person. So I guess I slightly favor "cbelow" and "cabove", ugly as those are.

@jlstevens
Copy link
Contributor

Looking for alternatives, the only other one I'll throw out there is chigher and clower...

@jlstevens
Copy link
Contributor

One issue where colormaps is discussed is #2487. Here is the relevant bit:

image

@jbednar
Copy link
Member

jbednar commented Aug 26, 2020

Sure, I have no preference between chigher/clower and cabove/cbelow. Yes, #2487 (comment) is the prior issue about colormaps, which remains unaddressed.

@philippjfr
Copy link
Member Author

You have the strongest opinions here so would you mind coming up with a proposal for the colormap changes? Then we can veto if we hate it 😆

@jbednar
Copy link
Member

jbednar commented Aug 28, 2020

No, I just want to complain! :-) Sigh. I'll try to think of something suitable, but please ping me if I haven't done so in a reasonable time.

@jlstevens
Copy link
Contributor

jlstevens commented Sep 7, 2020

I thought a bit about this issue and here is the best suggestion I have right now:

  1. Pick a colormap that is suitable for use with transparency and make it available in HoloViews.
  2. Add an hv.config option that when set, enables this new colormap as the default for the datashader shading operations and for Images (off by default). We could also target the Images affected using the group/label mechanism...
  3. Have a page describing the issue that we can be linked to and update the pages where we need the better colormap to use hv.config to switch to the new cmap.
  4. Mention this option in the release notes and state that we will switch away from fire sometime in future.
  5. Eventually switch away from fire by toggling the hv.config default.

Don't love any of this but that seems the best approach to me right now...

Edit: Pinging @jbednar as requested!

@jlstevens
Copy link
Contributor

After discussion with @jbednar, @philippjfr we came to an agreement that clipping_colors is currently being abused for the purpose of making zero counts transparent, resulting in needless discussion about the min and max clipping colors.

The new suggestion is to remove setting a dimension range at all and to be explicit and clear about what we want to express: a special value that indicates missing/no data. This isn't needed for floats which naturally have a no data value (NaN) but is needed for integer data which has no such special value. Zero counts mean no data for the datashader count aggregator but there is plenty of data in the wild that use special 'nodata' values such as -32767 or -9999.

Therefore, if you need transparency for a count aggregate after using rasterize, you would typically do something like:

rasterize(element).opts(cnorm='eqhist', nodata=0)

I think this is clear, explicit with no more mysterious implicit behavior or abuse of clipping options.

@jbednar
Copy link
Member

jbednar commented Sep 15, 2020

Sounds good! We'll have to decide what to about integers vs. floats for nodata, given that float comparison is unreliable. I think there are two plausible approaches: (1) don't support floats as nodata values, and silently skip applying nodata to float arrays, or (2) accept floats as nodata values and apply nodata to float arrays, but in our docs warn that only some floats are likely to be useful (including NaN, which we'd have to check for explicitly, and up to 7-digit integers; see https://en.wikipedia.org/wiki/Single-precision_floating-point_format).

I can see arguments for either approach. For (1), if it can be achieved, only applying the nodata translation to integer arrays would mean that users (or we) could set this option to 0 and never mess with it. Not sure if rasterize sets a unique HoloViews group and label by default, but if it does, seems like the option could be enabled quite specifically for Datashader output and almost never interfere with anything else (just signed ints, if those occur). But it would require detecting integer arrays and handling them specially.

For (2), I can easily imagine data originally integer showing up in workflows as floats, and given that short integers work fine as floats, it may be more convenient to simply ignore whether it's float or not and accept e.g. nodata=-9999 or nodata=-9999.0 and not make any distinction.

I don't think I have much preference, except to not prefer option (3) of not allowing floats while also not selectively applying nodata to integers, which seems like a restriction without a benefit.

@jlstevens
Copy link
Contributor

jlstevens commented Sep 16, 2020

I think there is an option (that makes sense to me) that isn't 1) or 2):

  1. Declare the plot option nodata = param.Integer(None, allow_None=True) so only integer values can be specified. Then whether we apply it to float arrays is orthogonal to that i.e nodata=1 is allowed but nodata=1.1 isn't (regardless of the array dtype). Given that I don't see any reason not to apply it to float arrays, assuming the float/int comparison is sensible.

I have no opinion on that but nodata = param.Integer(None, allow_None=True) as the plot option declaration is what makes sense to me.

@jbednar
Copy link
Member

jbednar commented Sep 16, 2020

My only opinion about option 4 is that it doesn't let people declare the obvious nodata = np.nan; instead we have to tell people that nodata = None means use np.nan for floats and nothing for integers, and that there's no way to turn off np.nan handling?

@jlstevens
Copy link
Contributor

That is right - I'm of the opinion that as float have a dedicated NaN value, there is no need for nodata for floats (the semantics is clearly defined without needing magic values). Any thoughts @philippjfr ?

@jlstevens
Copy link
Contributor

@jbednar @philippjfr Regarding the default datashader colormap issue, why not just use the blueish colormap that is already the default for hvplot? If we are breaking backwards compatibility of colormaps in HoloViews, might as well make it consistent with something else.

Unless the default for hvplot is also problematic?

@philippjfr
Copy link
Member Author

For reference here's the colormap in various scenarios:

bokeh_plot - 2020-09-21T122532 514

bokeh_plot - 2020-09-21T122556 128

bokeh_plot - 2020-09-21T122559 228

@jlstevens
Copy link
Contributor

What about some raster with transparency overlaid on a tile source?

@jbednar
Copy link
Member

jbednar commented Sep 25, 2020

Jean-Luc and I discussed this issue today and came to the conclusion that any changes to the colormap are independent of this PR in its current state. Now that we have decided that rasterize won't generate transparent Images for the default count aggregation, then most of the colormap problems go away, because fire is already a suitable colormap for cases with no transparency. With the default nodata=None, the plot is actually independent of the page background color, and so it can continue to use fire; ambiguity only happens for nodata=0, which a user is only likely to add when they want to overlay the plot onto something like a web tile server, and many of those may already be dark enough that fire is again appropriate.

To summarize, with the defaults (nodata=None, cmap=Fire) rasterize should work well (no ambiguity of peak color vs. background showing through, no sharp edge between white background and weak data points). The defaults will not allow overlaying the results on a web tile source or other data, but the user must explicitly enable nodata=0 in that case and can at the same time select a suitable colormap for whatever they are overlaying on, if it's not something dark.

@jbednar
Copy link
Member

jbednar commented Sep 25, 2020

Note that there will still be issues for non-uint aggregations (not Count or Any), where NaN will automatically be transparent and a fire colormap may be inappropriate on a white background. Even so, in that case users will very often want to select their own colormap anyway, e.g. a diverging one if values can be both positive and negative, as they can be for many non-Count aggregations. So I'm not particularly concerned about what may need to be adjusted once people are using a non-uint aggregation; once they mess with agg then they will also often need to set cmap appropriately.

@jlstevens
Copy link
Contributor

One thing to point out about nodata: it is a shortcut to replace a value with NaN which means you get the following behavior with hover:

image

image

Ideally, the hover would show the original value (i.e 0) but Bokeh doesn't support transparency masks afaik. This also means in the float case, the nodata values map to NaN which will show up in hover the same way as the original NaN values (if any).

@jlstevens
Copy link
Contributor

jlstevens commented Sep 28, 2020

I suppose for the integer case, you could smooth over the hover issue with a custom hover tool (i.e formatter) that knows that NaN really is the nodata value? If we did this, it might be a reason to disallow nodata for the float case?

Deciding to implement this (if we do) is orthogonal to the rest of the PR though even if it is a nice extra to consider.

@philippjfr
Copy link
Member Author

One thing to point out about nodata: it is a shortcut to replace a value with NaN which means you get the following behavior with hover:

You could use a custom hover formatter here.

@jlstevens
Copy link
Contributor

You could use a custom hover formatter here.

That is exactly what I was suggesting in my previous comment. Do you think we should?

@philippjfr
Copy link
Member Author

Ah sorry, missed that. It's not a huge amount of effort so seems worth doing.

@jlstevens
Copy link
Contributor

jlstevens commented Dec 1, 2020

@jbednar When you pull you will want to change normalization to cnorm in the notebook to avoid the new warnings.

Edit: With the new cmap, the gif needs updating too:

image

@jbednar
Copy link
Member

jbednar commented Dec 1, 2020

Remaining to-do:

  • HeatMap on each of the different backends is using acmap of RdBu or RdYlBu_r instead of kbc_r. I think this should change, as those two colormaps differ, are not perceptually uniform, and assume symmetrically divergent data which isn't a valid assumption.
  • Jim to finish up his local Large Data and Styling changes and push, either in this PR if done in time or separately.

Remaining issues not expected to be addressed in this PR, but worth noting for later:

  • It would be nice to have a custom hover formatter so that a count of 0 doesn't show up as NaN on hover.
  • We should still consider whether the default aggregator for Path and Curve should be any; for isolated lines any will work much better, but for large numbers of overlaid paths or curves it won't. We can worry about that independently of this PR.
  • We might want to put shade_cmap = mpl.colors.LinearSegmentedColormap.from_list("shade_cmap", ["lightblue", "darkblue"]) somewhere in our colormap definitions so that people can more easily port datashade() calls to use rasterize() or to replicate plots first done in Datashader's own API.
  • We could consider mapping a single color from Curve, Path, Points, etc. down to cmap using Datashader's support for mapping single-colors as values using the alpha channel. That way if someone overlays red, blue, yellow, etc. lines and calls rasterize, the resulting images will still have the same base color per element. But we don't have time to evaluate this idea now.
  • We have not addressed the idea of transferring options from Elements to their rasterized (or potentially datashaded) equivalents. There's a mechanism for it, but someone would need to work out all the implications, and the current mechanism of transfer_options doesn't allow partial mappings, transformed mappings, etc. that would be needed in practice.

@jlstevens
Copy link
Contributor

Tests are passing. Time to hit merge on this rather large and long running PR!

@jlstevens jlstevens merged commit 9db19b5 into master Dec 1, 2020
@jbednar jbednar mentioned this pull request Dec 2, 2020
8 tasks
@philippjfr philippjfr deleted the philippjfr/cnorm branch April 25, 2022 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants