-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC]: clarify that markevery[float]
considers path length along drawn line
#27842
Comments
From the docs:
So I think you should be using a value > 1 (instead of 0.1). If you set e.g. Regardless it would be nice to either warn or error if a value < 1 is passed here. |
@dstansby um, no. Float has separate semantics
But that said, I still believe the plot is correct. Markevery still always chooses actual data points, and the noise just shows through in them. |
Well 0.1 means about 10 data points but as the noise increases there are clearly far more than 10 data points. That said, I consider this a pretty bad visualization technique. I'm not shocked it doesn't work well, particularly for logarithmic axes. So I agree this is a bug, but I'm not sure how fixable it is. |
There's a real use case: The loss during training an ML model is usually smooth at the beginning and progressively begins to fluctuate, which makes it a bit noisy. Plus, the loss tends to slow down very quickly and so must be plotted on a log scale x-axis. This means that you get exponentially more points towards the end. On top of that, when plotting different training scenarios in a single figure, the jiggles makes reading and interpreting the plots very difficult. It's very useful in such cases to only see finite number of markers for each plot. |
I can appreciate signals with different signal to noise ratios, but I don't think counting on Matplotlib's heuristic here is the right approach. I would plot the raw data, and then decimate manually based on what works properly for the data interpretation.
If I had this problem, I'd smooth out the jiggles rather than subsampling them, as you are just going to alias the jiggles to the subsampled data. |
Fair enough, but even ignoring the jiggles, for even slightly noisy data that are linearly sampled but plotted on log scale, |
Agreed that it's not reliable! Someone could try and fix, but.... |
Without having time to dig into this; I suspect that we are sampling with a fixed distance along the data curve. While a smooth curve has a total length of approximately the width of the Axes, a noisy curve is much longer and thus gets more data points. |
That sounds likely. To my point that this should be done manually but he user, that is one heuristic that works for data where spacing along the line is reasonable. Here the OP wants the spacing in x, which is a different algorithm. |
I'll label this as feature request, as existing options are working as intended. |
Changing to distance along the x-axis will do weird things for spirals and other cases where the y values are not a function (in the math sense) of x. |
markevery[float]
doesn't work as expected when y-data is noisymarkevery[float]
considers path length not distance along x-axis
One action here is to better document that [float] sub samples along the path. I’m -0.5 on adding subsampling along x. First, subsampling is generally a questionable operation and needs to be employed with great care - there are often better aggregation techniques. Second, there’s likely only a very small subset of cases where subsampling is reasonable and subsampling along x is better than the existing subsampling along the path. Third, it’s not too hard to create a numpy mask for that yourself. And finally, I don’t see how we can fit this additional semantics in the existing markevery API (and IMHO additional keywords would be overboard here). |
I agree - I think expanding this functionality is the wrong direction, and if anything I think we should discourage it, and rather encourage folks to figure out their subsampling on their own. But I could be convinced if there is prior art that shows this being done in a robust way.... |
I agree with @timhoffm the right path here is better documentation
I'm reading this as the markers are spaced by approximately I think we do have to have this functionality internally because getting it right (leaving aside that small noise on top of a big value on log scale makes it go funny) requires knowing the details of our transform stack and the current view limits. With
|
markevery[float]
considers path length not distance along x-axismarkevery[float]
considers path length along drawn line
You all make valid points. As an outsider (user vs. MPL dev) though, I see |
Bug summary
markevery[float]
is supposed to result in evenly distributed markers, but when y-data is noisy, its effect seems to "gradually" fade away.Code for reproduction
Actual outcome
Expected outcome
All subplots be like the one with 0 noise level.
Additional information
No response
Operating system
Ubuntu
Matplotlib Version
3.8.3
Matplotlib Backend
matplotlib_inline.backend_inline
Python version
3.10.13
Jupyter version
No response
Installation
pip
The text was updated successfully, but these errors were encountered: