Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BoundaryNorm for multiple colors and one region #17830

Merged
merged 7 commits into from
Jul 12, 2020

Conversation

dstansby
Copy link
Member

@dstansby dstansby commented Jul 3, 2020

Fixes #17579. I don't understand why the scaling is scalefac = (self.Ncmap - 1) / (self._n_regions - 1), and the logic when the number of colours is > the number of regions is not well defined in the dosctring so I don't know what it should be. This prevents the case with 1 region and > 1 colours from crashing though, and gives a sensible result.

I also changed a variable name from _N to make the code more readable.

@dstansby dstansby changed the title Single boundary Fix BoundaryNorm for multiple colors and one region Jul 3, 2020
@tacaswell tacaswell added this to the v3.4.0 milestone Jul 3, 2020
@tacaswell
Copy link
Member

I think @efiring is the local expert on BoundaryNorm. It looks like this been the scaling logic from the beginning (83614eb).

The logic for this is:

  • the np.digitize is going to return, for values within the valid range, going to return values in [1, self._n_regions]
  • we shift the sesult of diigitize down by 1 to get it back to 0-indexed so our value is in [0, self._n_regions-1]
  • if we have more colors than bins we need to "stretch" the values to cover the full range (which is going to drop some of the colors in the middle) so we divide b self._n_regions - 1 to get our values to [0, 1] and then multiply by self._Ncmap -1 to take use to [0, ncolors-1] and cast as int to take us to the space we can use to directly slice into a lookup table to get the color

The reason this code path is important is so that we don't make user down-sample the default color maps for use with boundary norm, the re scaling implicitly down samples (as some bins will just never be hit). The opposite case (where the user gives us more bins that colors) we raise because implicitly merging the users bins is something we should not do.

tacaswell
tacaswell previously approved these changes Jul 3, 2020
@dstansby
Copy link
Member Author

dstansby commented Jul 3, 2020

Thanks for the clear explanation!

@tacaswell
Copy link
Member

@dstansby I took the liberty of pushing that last GH post as in-line comments and did a bit more internal re-naming to make it clearer.

@QuLogic QuLogic requested a review from efiring July 3, 2020 20:32
scalefac = (self.Ncmap - 1) / (self._N - 1)
# if we have more colors than regions, stretch the region index
# computed above to full range of the color bins. This will
# make use skip some of the colors in the middle of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change "This will...full range" to "The first region is mapped to the first color in the colormap, and the last region to the last color." Or leave the sentence out entirely.

# special case the 1 region case. What we should do here
# is a bit undefined (as _any_ color in the colormap is
# justifiable) we go with not adjusting the value (which will
# either be 0 or 1 depending on if we are extending down)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is also confusing. The divisor will be 0 only if there are no extensions and 2 boundaries, or one extension and only 1 boundary. So we have to pick one color out of the map. BoundaryNorm is supposed to work like contourf, so we should do what contourf does, which is pick the middle of the colormap. But we can't do that with scalefac, because iret is filled with zeros. It has to be directly filled with (self.Ncmap -1) // 2.

@efiring
Copy link
Member

efiring commented Jul 3, 2020

Actually, filled contours require a minimum of 2 boundaries, and I think that BoundaryNorm should do likewise.

@tacaswell
Copy link
Member

I pushed the changes to the docstrings, but the change in behavior to go to the middle rather than the bottom for the 1 region case seems like something we should think about a bit more?

@tacaswell tacaswell requested a review from efiring July 3, 2020 22:50
@efiring
Copy link
Member

efiring commented Jul 3, 2020

@tacaswell, thinking more is fine, but at the moment I don't see any reasonable arguments for anything other than the middle color in the one-region case without extensions, hence matching contourf. Before it raised an exception, so we are free to choose anything. In the one-region case with one boundary and one extension, we could

  1. Block it by requiring at least 2 boundaries.
  2. Choose the middle, to keep thing simple.
  3. Get fancy, and choose the last color if it is a "max" extension and the first color if it is a "min" extension.

I haven't checked to see whether the colorbar would even work with nothing but an extension. I expect it wouldn't. I favor option 1; I don't see that a BoundaryNorm with only one boundary makes sense. It's marginal even with 2 boundaries.

@tacaswell
Copy link
Member

Before it raised an exception, so we are free to choose anything.

🤦 I had missed that detail and was focused on preserving the API from @dstansby 's first commit.

I don't think we should allow the 1 boundary but with extensions case so option 1 here (already enforced), but we will still need to adjust the return value for "in bin".

@efiring
Copy link
Member

efiring commented Jul 3, 2020

I think this will do it:

if self.Ncmap > self._n_regions:
    if self._n_regions == 1:
        iret = np.full(iret.shape, (self.Ncmap - 1) // 2, dtype=np.int16)
    else:
        scalefac = (self.Ncmap - 1) / (self._n_regions - 1)
        iret = (iret * scalefac).astype(np.int16)

@tacaswell
Copy link
Member

Sorry I hijacked your relatively simple PR @dstansby .

This never worked before so is not an API change.
@tacaswell
Copy link
Member

One concern I have about this is that if you add an extension just up or down you move the color drastically:

In [4]: from matplotlib.colors import BoundaryNorm

In [5]: edges = [2, 5]

In [7]: import numpy as np

In [8]: data = np.arange(10)

In [9]: BoundaryNorm(edges, 7)(data)
Out[9]: 
masked_array(data=[-1, -1, 3, 3, 3, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [10]: BoundaryNorm(edges, 7, extend='both')(data)
Out[10]: 
masked_array(data=[-1, -1, 3, 3, 3, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [11]: BoundaryNorm(edges, 7, extend='min')(data)
Out[11]: 
masked_array(data=[-1, -1, 6, 6, 6, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [12]: BoundaryNorm(edges, 7, extend='max')(data)
Out[12]: 
masked_array(data=[-1, -1, 0, 0, 0, 7, 7, 7, 7, 7],
             mask=[False, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999,
            dtype=int16)

In [13]: 

but the behavior of the extend='min' and extend='max' is consistent with master so maybe we should not be worried about that (or if we do want to change that we need to be more careful about deprecation etc).

Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight docstring enhancement suggestion, but also fine as-is.

@@ -1429,9 +1429,9 @@ def __init__(self, boundaries, ncolors, clip=False, *, extend='neither'):
Parameters
----------
boundaries : array-like
Monotonically increasing sequence of boundaries
Monotonically increasing sequence of boundaries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth saying "...sequence of at least 2 boundaries."

@efiring
Copy link
Member

efiring commented Jul 4, 2020

@tacaswell, the single-region case is not very useful, and there are many ways it could be handled, with or without adding extensions, so I think the present version is fine. Anyone who wants fine control over a small number of colors should be using their own ListedColormap so they can exactly match colors to regions. The stretching logic is there to make things easy, so people can use the standard 256-entry colormaps, not to match whatever any particular person might actually think they want--which would be impossible, of course.

@tacaswell tacaswell dismissed their stale review July 5, 2020 19:09

I now have a bunch of commits in this branch and should not review it.

@tacaswell
Copy link
Member

I am still 👍 on these changes, but someone who is not me should do a final review and merge (I am 👍 if @dstansby wants to review my changes and merge the wholething)

Copy link
Member Author

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't approve through the UI since this is technically my PR, but I approve 👍

@dstansby dstansby merged commit c12dafd into matplotlib:master Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BoundaryNorm yield a ZeroDivisionError: division by zero
3 participants