[DOC]: clarify that `markevery[float]` considers path length along drawn line #27842

ma-sadeghi · 2024-03-01T10:28:07Z

Bug summary

markevery[float] is supposed to result in evenly distributed markers, but when y-data is noisy, its effect seems to "gradually" fade away.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(1e3, 1e7, 1000)

fig, ax = plt.subplots(nrows=3, ncols=3, figsize=(9, 9), layout="constrained")

for i, ax_i in enumerate(ax.flatten()):
    ax_i.set_title(f"noise level = {i/5:.1f}")
    y = np.exp(x**0.1) + np.random.rand(len(x)) * i/2
    ax_i.plot(x, y, 'b.-', markevery=0.1, ms=10, markerfacecolor="r")
    ax_i.set_xscale("log")
    ax_i.set_yscale("log")

Actual outcome

Expected outcome

All subplots be like the one with 0 noise level.

Additional information

No response

Operating system

Ubuntu

Matplotlib Version

3.8.3

Matplotlib Backend

matplotlib_inline.backend_inline

Python version

3.10.13

Jupyter version

No response

Installation

pip

The text was updated successfully, but these errors were encountered:

dstansby · 2024-03-01T14:30:09Z

From the docs:

e.g., if every=5, every 5-th marker will be plotted.

So I think you should be using a value > 1 (instead of 0.1). If you set e.g. markevery=4 does it behave as expected?

Regardless it would be nice to either warn or error if a value < 1 is passed here.

timhoffm · 2024-03-01T14:36:55Z

@dstansby um, no. Float has separate semantics

every=0.1, (i.e. a float): markers will be spaced at approximately equal visual distances along the line; the distance along the line between markers is determined by multiplying the display-coordinate distance of the axes bounding-box diagonal by the value of every.

But that said, I still believe the plot is correct. Markevery still always chooses actual data points, and the noise just shows through in them.

jklymak · 2024-03-01T15:37:55Z

Well 0.1 means about 10 data points but as the noise increases there are clearly far more than 10 data points.

That said, I consider this a pretty bad visualization technique. I'm not shocked it doesn't work well, particularly for logarithmic axes. So I agree this is a bug, but I'm not sure how fixable it is.

ma-sadeghi · 2024-03-01T16:57:00Z

There's a real use case: The loss during training an ML model is usually smooth at the beginning and progressively begins to fluctuate, which makes it a bit noisy. Plus, the loss tends to slow down very quickly and so must be plotted on a log scale x-axis. This means that you get exponentially more points towards the end. On top of that, when plotting different training scenarios in a single figure, the jiggles makes reading and interpreting the plots very difficult. It's very useful in such cases to only see finite number of markers for each plot.

jklymak · 2024-03-01T17:20:42Z

I can appreciate signals with different signal to noise ratios, but I don't think counting on Matplotlib's heuristic here is the right approach. I would plot the raw data, and then decimate manually based on what works properly for the data interpretation.

the jiggles makes reading and interpreting the plots very difficult. It's very useful in such cases to only see finite number of markers for each plot.

If I had this problem, I'd smooth out the jiggles rather than subsampling them, as you are just going to alias the jiggles to the subsampled data.

ma-sadeghi · 2024-03-01T19:32:05Z

Fair enough, but even ignoring the jiggles, for even slightly noisy data that are linearly sampled but plotted on log scale, markevery is pretty helpful, but seems to be sensitive to noise, and hence not reliable.

jklymak · 2024-03-01T21:14:33Z

Agreed that it's not reliable! Someone could try and fix, but....

timhoffm · 2024-03-02T00:55:48Z

Without having time to dig into this; I suspect that we are sampling with a fixed distance along the data curve. While a smooth curve has a total length of approximately the width of the Axes, a noisy curve is much longer and thus gets more data points.

jklymak · 2024-03-02T16:25:41Z

That sounds likely. To my point that this should be done manually but he user, that is one heuristic that works for data where spacing along the line is reasonable. Here the OP wants the spacing in x, which is a different algorithm.

dstansby · 2024-03-04T20:02:34Z

Above comments are correct - if instead the distance is parallel to the x-axis, instead of along the curve (it's a one line diff, see bottom of this comment) the output is what I think is desired:

I guess we could add another input type to markevery to do this, perhaps allow dict[Literal["x", "y"], float] so something like {"x": 0.1} would do spacing evenly parallel to the x-axis?

diff --git a/lib/matplotlib/lines.py b/lib/matplotlib/lines.py
index 5e5c19c9f6..752b805e4f 100644
--- a/lib/matplotlib/lines.py
+++ b/lib/matplotlib/lines.py
@@ -179,7 +179,8 @@ def _mark_every_path(markevery, tpath, affine, ax):
             delta = np.empty((len(disp_coords), 2))
             delta[0, :] = 0
             delta[1:, :] = disp_coords[1:, :] - disp_coords[:-1, :]
-            delta = np.hypot(*delta.T).cumsum()
+            # Calculate distance parallel to x-axis
+            delta = delta[:, 0].cumsum()
             # calc distance between markers along path based on the Axes
             # bounding box diagonal being a distance of unity:
             (x0, y0), (x1, y1) = ax.transAxes.transform([[0, 0], [1, 1]])

dstansby · 2024-03-04T20:03:27Z

I'll label this as feature request, as existing options are working as intended.

tacaswell · 2024-03-05T02:22:26Z

Changing to distance along the x-axis will do weird things for spirals and other cases where the y values are not a function (in the math sense) of x.

timhoffm · 2024-03-05T06:17:23Z

One action here is to better document that [float] sub samples along the path.

I’m -0.5 on adding subsampling along x. First, subsampling is generally a questionable operation and needs to be employed with great care - there are often better aggregation techniques. Second, there’s likely only a very small subset of cases where subsampling is reasonable and subsampling along x is better than the existing subsampling along the path. Third, it’s not too hard to create a numpy mask for that yourself. And finally, I don’t see how we can fit this additional semantics in the existing markevery API (and IMHO additional keywords would be overboard here).

jklymak · 2024-03-05T17:10:14Z

I agree - I think expanding this functionality is the wrong direction, and if anything I think we should discourage it, and rather encourage folks to figure out their subsampling on their own. But I could be convinced if there is prior art that shows this being done in a robust way....

tacaswell · 2024-03-05T20:42:26Z

I agree with @timhoffm the right path here is better documentation

distances along the line; the distance along the line between markers is determined by multiplying the display-coordinate distance of the axes bounding-box diagonal by the value of every.

I'm reading this as the markers are spaced by approximately every*axes_diagonal_in_screen_space pixels along the path length along the line. This is not the clearest, but I also would not say it is wrong.

I think we do have to have this functionality internally because getting it right (leaving aside that small noise on top of a big value on log scale makes it go funny) requires knowing the details of our transform stack and the current view limits. With markevery=0.1 as you pan/zoom around which points we mark will dynamically update.

import matplotlib.pyplot as plt
import numpy as np

r = np.linspace(1, 10, 256)
theta = np.linspace(0, 6*n.pi, 256)
theta = np.linspace(0, 6*np.pi, 256)
plt.plot(r*np.cos(theta), r*np.sin(theta), markevery=.1, marker='o')
plt.show()

ma-sadeghi · 2024-03-06T17:25:54Z

You all make valid points. As an outsider (user vs. MPL dev) though, I see markevery simply as a way to declutter a multi-line figure with many markers. From this perspective, the implementation shouldn't get in the way of the goal.

dstansby added the New feature label Mar 4, 2024

tacaswell changed the title ~~[Bug]: markevery[float] doesn't work as expected when y-data is noisy~~ [FEAT]: markevery[float] considers path length not distance along x-axis Mar 5, 2024

tacaswell changed the title ~~[FEAT]: markevery[float] considers path length not distance along x-axis~~ [DOC]: clarify that markevery[float] considers path length along drawn line Mar 5, 2024

tacaswell added Documentation Documentation: API files in lib/ and doc/api and removed New feature labels Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC]: clarify that `markevery[float]` considers path length along drawn line #27842

[DOC]: clarify that `markevery[float]` considers path length along drawn line #27842

ma-sadeghi commented Mar 1, 2024

dstansby commented Mar 1, 2024

timhoffm commented Mar 1, 2024

jklymak commented Mar 1, 2024

ma-sadeghi commented Mar 1, 2024

jklymak commented Mar 1, 2024

ma-sadeghi commented Mar 1, 2024

jklymak commented Mar 1, 2024

timhoffm commented Mar 2, 2024

jklymak commented Mar 2, 2024

dstansby commented Mar 4, 2024

dstansby commented Mar 4, 2024

tacaswell commented Mar 5, 2024

timhoffm commented Mar 5, 2024

jklymak commented Mar 5, 2024

tacaswell commented Mar 5, 2024

ma-sadeghi commented Mar 6, 2024

[DOC]: clarify that markevery[float] considers path length along drawn line #27842

[DOC]: clarify that markevery[float] considers path length along drawn line #27842

Comments

ma-sadeghi commented Mar 1, 2024

Bug summary

Code for reproduction

Actual outcome

Expected outcome

Additional information

Operating system

Matplotlib Version

Matplotlib Backend

Python version

Jupyter version

Installation

dstansby commented Mar 1, 2024

timhoffm commented Mar 1, 2024

jklymak commented Mar 1, 2024

ma-sadeghi commented Mar 1, 2024

jklymak commented Mar 1, 2024

ma-sadeghi commented Mar 1, 2024

jklymak commented Mar 1, 2024

timhoffm commented Mar 2, 2024

jklymak commented Mar 2, 2024

dstansby commented Mar 4, 2024

dstansby commented Mar 4, 2024

tacaswell commented Mar 5, 2024

timhoffm commented Mar 5, 2024

jklymak commented Mar 5, 2024

tacaswell commented Mar 5, 2024

ma-sadeghi commented Mar 6, 2024

[DOC]: clarify that `markevery[float]` considers path length along drawn line #27842

[DOC]: clarify that `markevery[float]` considers path length along drawn line #27842