Refactor hist for less numerical errors #22773

oscargus · 2022-04-03T13:08:05Z

PR Summary

Should help with #22622

Idea is to do computation on the edges rather than the widths and then do diff on the result. This may be numerically better (or not...). Or rather, it is probably numerically worse, but will give visually better results...

Probably the alternative approach of providing a flag to bar/barh, making sure that adjacent bars are actually exactly adjacent may be a better approach, but I wanted to see what comes out of this first...

PR Checklist

Tests and Styling

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

New features are documented, with examples if plot related.
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
Documentation is sphinx and numpydoc compliant (the docs should build without error).

jklymak · 2022-04-03T14:06:42Z

Is the numerical problem the diff? Would it make sense to just convert the numpy bin edges to float64 before the diff?

oscargus · 2022-04-03T14:21:41Z

Is the numerical problem the diff?

Hard to say. But the problem is that one does quite a bit of computations and at some stage there are rounding errors that leads to that there are overlaps or gaps between edges. So postponing diff will reduce the risk that this happens (on the other hand, one may get cancellations as a result, but I do not think that will happen more now since the only things we add here are about the same order of magnitude).

Would it make sense to just convert the numpy bin edges to float64 before the diff?

Yes, or even float32, but as argued in the issue, one tend to use float16 for memory limited environments, so not clear if one can afford it.

Here, I am primarily trying to see the effect of it. As we do not deal with all involved computations here, some are also in bar/barh, the better approach may be to use a flag, "fill", or something that makes sure that all edges are adjacent if set (I'm quite sure a similar problem can arise if feeding bar-edges in float16 as well.

oscargus · 2022-04-03T14:37:00Z

It seems like we do not have any test images that are negatively affected by this at least... But it may indeed not be the best solution to the problem.

oscargus · 2022-04-03T14:58:42Z

Ahh, but even if the data to hist is float16, the actual histogram array doesn't have to be that... And that is probably much smaller compared to the data. So probably a simpler fix is to change the data type of the histogram data before starting to process it...

jklymak · 2022-04-04T07:15:36Z

I think you just want another type catch here (I guess I'm not sure the difference between float and "float64"), or at least that fixes the problem for me.

diff --git a/lib/matplotlib/axes/_axes.py b/lib/matplotlib/axes/_axes.py
index f1ec9406ea..88d90294a3 100644
--- a/lib/matplotlib/axes/_axes.py
+++ b/lib/matplotlib/axes/_axes.py
@@ -6614,6 +6614,7 @@ such objects
             m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
             tops.append(m)
         tops = np.array(tops, float)  # causes problems later if it's an int
+        bins = np.array(bins, float)  # causes problems is float16!
         if stacked:
             tops = tops.cumsum(axis=0)
             # If a stacked density plot, normalize so the area of all the

timhoffm · 2022-04-05T10:54:26Z

I guess I'm not sure the difference between float and "float64"

Numpy accepts builtin python types and maps them to numpy types:

https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types
(scroll a bit to "Built-in Python types").

The mapping can be platform specific. E.g. int maps to np.int64 on linux but np.int32 on win.
float maps on x86 linux and win to np.float64. But I don't know if that's true on arm etc.

Refactor hist for less numerical errors

728cca3

oscargus force-pushed the refactorhist branch from 8d98685 to 728cca3 Compare April 3, 2022 14:11

oscargus mentioned this pull request May 14, 2022

Fix issue with hist and float16 data #23047

Merged

2 tasks

jklymak closed this in #23047 May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor hist for less numerical errors #22773

Refactor hist for less numerical errors #22773

oscargus commented Apr 3, 2022 •

edited

jklymak commented Apr 3, 2022

oscargus commented Apr 3, 2022

oscargus commented Apr 3, 2022

oscargus commented Apr 3, 2022

jklymak commented Apr 4, 2022

timhoffm commented Apr 5, 2022 •

edited

Refactor hist for less numerical errors #22773

Refactor hist for less numerical errors #22773

Conversation

oscargus commented Apr 3, 2022 • edited

PR Summary

PR Checklist

jklymak commented Apr 3, 2022

oscargus commented Apr 3, 2022

oscargus commented Apr 3, 2022

oscargus commented Apr 3, 2022

jklymak commented Apr 4, 2022

timhoffm commented Apr 5, 2022 • edited

oscargus commented Apr 3, 2022 •

edited

timhoffm commented Apr 5, 2022 •

edited