Fix BoundaryNorm for multiple colors and one region #17830

dstansby · 2020-07-03T15:48:41Z

Fixes #17579. I don't understand why the scaling is scalefac = (self.Ncmap - 1) / (self._n_regions - 1), and the logic when the number of colours is > the number of regions is not well defined in the dosctring so I don't know what it should be. This prevents the case with 1 region and > 1 colours from crashing though, and gives a sensible result.

I also changed a variable name from _N to make the code more readable.

tacaswell · 2020-07-03T17:45:26Z

I think @efiring is the local expert on BoundaryNorm. It looks like this been the scaling logic from the beginning (83614eb).

The logic for this is:

the np.digitize is going to return, for values within the valid range, going to return values in [1, self._n_regions]
we shift the sesult of diigitize down by 1 to get it back to 0-indexed so our value is in [0, self._n_regions-1]
if we have more colors than bins we need to "stretch" the values to cover the full range (which is going to drop some of the colors in the middle) so we divide b self._n_regions - 1 to get our values to [0, 1] and then multiply by self._Ncmap -1 to take use to [0, ncolors-1] and cast as int to take us to the space we can use to directly slice into a lookup table to get the color

The reason this code path is important is so that we don't make user down-sample the default color maps for use with boundary norm, the re scaling implicitly down samples (as some bins will just never be hit). The opposite case (where the user gives us more bins that colors) we raise because implicitly merging the users bins is something we should not do.

dstansby · 2020-07-03T17:57:44Z

Thanks for the clear explanation!

tacaswell · 2020-07-03T18:30:04Z

@dstansby I took the liberty of pushing that last GH post as in-line comments and did a bit more internal re-naming to make it clearer.

efiring · 2020-07-03T21:18:38Z

lib/matplotlib/colors.py

-            scalefac = (self.Ncmap - 1) / (self._N - 1)
+        # if we have more colors than regions, stretch the region index
+        # computed above to full range of the color bins.  This will
+        # make use skip some of the colors in the middle of the


Maybe change "This will...full range" to "The first region is mapped to the first color in the colormap, and the last region to the last color." Or leave the sentence out entirely.

efiring · 2020-07-03T21:43:53Z

lib/matplotlib/colors.py

+                # special case the 1 region case.  What we should do here
+                # is a bit undefined (as _any_ color in the colormap is
+                # justifiable) we go with not adjusting the value (which will
+                # either be 0 or 1 depending on if we are extending down)


I think this comment is also confusing. The divisor will be 0 only if there are no extensions and 2 boundaries, or one extension and only 1 boundary. So we have to pick one color out of the map. BoundaryNorm is supposed to work like contourf, so we should do what contourf does, which is pick the middle of the colormap. But we can't do that with scalefac, because iret is filled with zeros. It has to be directly filled with (self.Ncmap -1) // 2.

efiring · 2020-07-03T21:49:12Z

Actually, filled contours require a minimum of 2 boundaries, and I think that BoundaryNorm should do likewise.

tacaswell · 2020-07-03T22:43:18Z

I pushed the changes to the docstrings, but the change in behavior to go to the middle rather than the bottom for the 1 region case seems like something we should think about a bit more?

efiring · 2020-07-03T23:34:13Z

@tacaswell, thinking more is fine, but at the moment I don't see any reasonable arguments for anything other than the middle color in the one-region case without extensions, hence matching contourf. Before it raised an exception, so we are free to choose anything. In the one-region case with one boundary and one extension, we could

Block it by requiring at least 2 boundaries.
Choose the middle, to keep thing simple.
Get fancy, and choose the last color if it is a "max" extension and the first color if it is a "min" extension.

I haven't checked to see whether the colorbar would even work with nothing but an extension. I expect it wouldn't. I favor option 1; I don't see that a BoundaryNorm with only one boundary makes sense. It's marginal even with 2 boundaries.

tacaswell · 2020-07-03T23:43:47Z

Before it raised an exception, so we are free to choose anything.

🤦 I had missed that detail and was focused on preserving the API from @dstansby 's first commit.

I don't think we should allow the 1 boundary but with extensions case so option 1 here (already enforced), but we will still need to adjust the return value for "in bin".

efiring · 2020-07-03T23:58:04Z

I think this will do it:

if self.Ncmap > self._n_regions:
    if self._n_regions == 1:
        iret = np.full(iret.shape, (self.Ncmap - 1) // 2, dtype=np.int16)
    else:
        scalefac = (self.Ncmap - 1) / (self._n_regions - 1)
        iret = (iret * scalefac).astype(np.int16)

tacaswell · 2020-07-04T22:04:46Z

Sorry I hijacked your relatively simple PR @dstansby .

This never worked before so is not an API change.

tacaswell · 2020-07-04T22:23:39Z

One concern I have about this is that if you add an extension just up or down you move the color drastically:

In [4]: from matplotlib.colors import BoundaryNorm

In [5]: edges = [2, 5]

In [7]: import numpy as np

In [8]: data = np.arange(10)

In [9]: BoundaryNorm(edges, 7)(data)
Out[9]: 
masked_array(data=[-1, -1, 3, 3, 3, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [10]: BoundaryNorm(edges, 7, extend='both')(data)
Out[10]: 
masked_array(data=[-1, -1, 3, 3, 3, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [11]: BoundaryNorm(edges, 7, extend='min')(data)
Out[11]: 
masked_array(data=[-1, -1, 6, 6, 6, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [12]: BoundaryNorm(edges, 7, extend='max')(data)
Out[12]: 
masked_array(data=[-1, -1, 0, 0, 0, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [13]:

but the behavior of the extend='min' and extend='max' is consistent with master so maybe we should not be worried about that (or if we do want to change that we need to be more careful about deprecation etc).

efiring

Slight docstring enhancement suggestion, but also fine as-is.

efiring · 2020-07-04T22:47:59Z

lib/matplotlib/colors.py

@@ -1429,9 +1429,9 @@ def __init__(self, boundaries, ncolors, clip=False, *, extend='neither'):
        Parameters
        ----------
        boundaries : array-like
-            Monotonically increasing sequence of boundaries
+            Monotonically increasing sequence of boundaries.


Might be worth saying "...sequence of at least 2 boundaries."

efiring · 2020-07-04T22:59:46Z

@tacaswell, the single-region case is not very useful, and there are many ways it could be handled, with or without adding extensions, so I think the present version is fine. Anyone who wants fine control over a small number of colors should be using their own ListedColormap so they can exactly match colors to regions. The stretching logic is there to make things easy, so people can use the standard 256-entry colormaps, not to match whatever any particular person might actually think they want--which would be impossible, of course.

I now have a bunch of commits in this branch and should not review it.

tacaswell · 2020-07-05T19:10:52Z

I am still 👍 on these changes, but someone who is not me should do a final review and merge (I am 👍 if @dstansby wants to review my changes and merge the wholething)

dstansby

Can't approve through the UI since this is technically my PR, but I approve 👍

dstansby added 2 commits July 3, 2020 16:17

Make variable name clearer

4e6a206

Fix BoundaryNorm for multiple colors and one region

590f401

dstansby changed the title ~~Single boundary~~ Fix BoundaryNorm for multiple colors and one region Jul 3, 2020

tacaswell added this to the v3.4.0 milestone Jul 3, 2020

tacaswell previously approved these changes Jul 3, 2020

View reviewed changes

DOC: add inline explanation of the re-scaling logic in BoundrayNorm

f8017d4

QuLogic requested a review from efiring July 3, 2020 20:32

efiring requested changes Jul 3, 2020

View reviewed changes

tacaswell added 2 commits July 3, 2020 18:31

DOC: re-word comments

20ef927

API: add a check that we have at least one region

64fff76

tacaswell requested a review from efiring July 3, 2020 22:50

ENH: pick the middle color only 1 region with BoundaryNorm

a5f5317

This never worked before so is not an API change.

efiring approved these changes Jul 4, 2020

View reviewed changes

DOC: make doc string slightly more precise

39b5dfc

tacaswell mentioned this pull request Jul 7, 2020

BoundaryNorm yield a ZeroDivisionError: division by zero #17579

Closed

dstansby commented Jul 12, 2020

View reviewed changes

dstansby merged commit c12dafd into matplotlib:master Jul 12, 2020

dstansby deleted the single-boundary branch July 12, 2020 17:14

keewis mentioned this pull request Jul 15, 2020

failing upstream-dev CI: matplotlib pydata/xarray#4226

Closed

QuLogic mentioned this pull request Feb 28, 2021

⚠️ Nightly upstream-dev CI failed ⚠️ pydata/xarray#4947

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BoundaryNorm for multiple colors and one region #17830

Fix BoundaryNorm for multiple colors and one region #17830

dstansby commented Jul 3, 2020

tacaswell commented Jul 3, 2020

dstansby commented Jul 3, 2020

tacaswell commented Jul 3, 2020

efiring Jul 3, 2020

efiring Jul 3, 2020

efiring commented Jul 3, 2020

tacaswell commented Jul 3, 2020

efiring commented Jul 3, 2020

tacaswell commented Jul 3, 2020

efiring commented Jul 3, 2020

tacaswell commented Jul 4, 2020

tacaswell commented Jul 4, 2020

efiring left a comment

efiring Jul 4, 2020

efiring commented Jul 4, 2020

tacaswell commented Jul 5, 2020

dstansby left a comment

Fix BoundaryNorm for multiple colors and one region #17830

Fix BoundaryNorm for multiple colors and one region #17830

Conversation

dstansby commented Jul 3, 2020

tacaswell commented Jul 3, 2020

dstansby commented Jul 3, 2020

tacaswell commented Jul 3, 2020

efiring Jul 3, 2020

Choose a reason for hiding this comment

efiring Jul 3, 2020

Choose a reason for hiding this comment

efiring commented Jul 3, 2020

tacaswell commented Jul 3, 2020

efiring commented Jul 3, 2020

tacaswell commented Jul 3, 2020

efiring commented Jul 3, 2020

tacaswell commented Jul 4, 2020

tacaswell commented Jul 4, 2020

efiring left a comment

Choose a reason for hiding this comment

efiring Jul 4, 2020

Choose a reason for hiding this comment

efiring commented Jul 4, 2020

tacaswell commented Jul 5, 2020

dstansby left a comment

Choose a reason for hiding this comment