Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Nondeterminism in SVG clipPath element id attributes #27831

Open
jayaddison opened this issue Feb 28, 2024 · 6 comments · May be fixed by #27833
Open

[Bug]: Nondeterminism in SVG clipPath element id attributes #27831

jayaddison opened this issue Feb 28, 2024 · 6 comments · May be fixed by #27833
Labels
backend: svg Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! status: confirmed bug
Milestone

Comments

@jayaddison
Copy link

jayaddison commented Feb 28, 2024

Bug summary

Hello - I'm trying to make the astroplan documentation build reproducibly in Debian, and have found a snag: despite configuring the svg.hashsalt to successfully make path identifiers in some SVG plots generated by matplotlib deterministic, there is a remaining problem that clipPath identifiers are nondeterministic.

The cause appears to be the use of Python object ID (via id(...) calls here) when generating clippath IDs.

Code for reproduction

import matplotlib as mpl
import matplotlib.pyplot as plt
import sys

mpl.rcParams["svg.hashsalt"] = "fixed-salt"

fig = plt.gcf()
fig.add_subplot(projection="polar")

plt.savefig(sys.stdout.buffer, format="svg")

Actual outcome

Differences appear in the clipPath identifiers and their references from elsewhere in the SVG output.

Expected outcome

When an svg.hashsalt value is configured, the SVG output should be deterministic.

Additional information

This could arguably be an enhancement request rather than a bug.

Configuring a static PYTHONHASHSEED value does not help to produce deterministic results.

Operating system

Debian GNU/Linux (trixie)

Matplotlib Version

3.6.3

Matplotlib Backend

TkAgg

Python version

Python 3.11.8

Jupyter version

N/A

Installation

Linux package manager

@tacaswell tacaswell added this to the v3.10.0 milestone Feb 28, 2024
@tacaswell tacaswell added Good first issue Open a pull request against these issues if there are no active ones! Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues labels Feb 28, 2024
Copy link

Good first issue - notes for new contributors

This issue is suited to new contributors because it does not require understanding of the
Matplotlib internals. To get started, please see our contributing
guide
.

We do not assign issues. Check the Development section in the sidebar for linked pull
requests (PRs). If there are none, feel free to start working on it. If there is an open PR, please
collaborate on the work by reviewing it rather than duplicating it in a competing PR.

If something is unclear, please reach out on any of our communication
channels
.

@tacaswell
Copy link
Member

tacaswell commented Feb 28, 2024

I agree with the analysis. We need a way to uniquely identify the clip paths within a single rendering (so that we only write the clip path once and can reuse it by reference).

Options that won't work:

  • use hash instead of id because we do not define a custom __eq__ or __hash__ on Path objects (and if we did, we would have to declare Path object un-hashable because they can be mutable) so this will still not be deterministic.
  • use str instead of id because that will eventually call repr on numpy arrays which can truncate for very long paths which could lead to incorrectly re-using a path.

I think the right path here is to add an additional bit of state tracking that maps (clip)paths to a monotonically increasing integer. We would then use that integer in the dictkey and eventually the url(...) id in the svg.

self._clip_path_count = defaultdict(lambda c=count(): next(c))
....
clip_id = self._clip_path_count[clippath] 

may too cute, but I think would work. It looks like we set up all of the state in __init__ and create a new render in the print_svg method so we can do the same with this.


This needs tests, both of the determinism (see the existing determinism tests for how to get the test suite do to that) and that we continue to correctly re-use paths. These tests cases should also be extended to the pdf/ps tests to make sure we do not have the same issue with non-rectangular clipping there. Finally, in addition to the polar case, the determinism tests should probably include one of the complex clip demo examples and something with multiple clip paths (a 2x2 grid of axes each with fun clipping would work).


Marking as good first issue as fixing this will only require touching a small section of the code and does not have any API implications, but medium difficulty as it requires a good grasp of OO and understanding a relatively complex implementation.

jayaddison added a commit to jayaddison/matplotlib that referenced this issue Feb 29, 2024
…tangular clip-paths

This is intended to reduce disruption/changes for existing SVGs that _are_ already rendered reproducibily; it is the Python id(...) calls in particular that bugreport matplotlib#27831 indicates are a problem.
@DimitrisAntoniou99
Copy link

Hello, I would like to work on this issue if it's not entirely finished! I noticed that it's still open.

@jayaddison
Copy link
Author

the determinism tests should probably include one of the complex clip demo examples and something with multiple clip paths (a 2x2 grid of axes each with fun clipping would work)

I missed this quoted part of the task description previously; it seems important, but I'm not familiar enough yet with matplotlib to figure out how to add this to the test coverage in #27833.

@jayaddison
Copy link
Author

the determinism tests should probably include one of the complex clip demo examples and something with multiple clip paths (a 2x2 grid of axes each with fun clipping would work)

@tacaswell could you clarify what the 2x2 grid-of-axes test coverage should do? I think I have #27833 nearly ready to bring back out of 'draft' status, but haven't yet implemented this test.

@tacaswell
Copy link
Member

I had the image from

def test_clipping():
exterior = mpath.Path.unit_rectangle().deepcopy()
exterior.vertices *= 4
exterior.vertices -= 2
interior = mpath.Path.unit_circle().deepcopy()
interior.vertices = interior.vertices[::-1]
clip_path = mpath.Path.make_compound_path(exterior, interior)
star = mpath.Path.unit_regular_star(6).deepcopy()
star.vertices *= 2.6
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True, sharey=True)
col = mcollections.PathCollection([star], lw=5, edgecolor='blue',
facecolor='red', alpha=0.7, hatch='*')
col.set_clip_path(clip_path, ax1.transData)
ax1.add_collection(col)
patch = mpatches.PathPatch(star, lw=5, edgecolor='blue', facecolor='red',
alpha=0.7, hatch='*')
patch.set_clip_path(clip_path, ax2.transData)
ax2.add_patch(patch)
ax1.set_xlim([-3, 3])
ax1.set_ylim([-3, 3])
in my head as one of them.

I suggest looking at git grep set_clip_path in the examples and tests and pick out a few example to make sure that it works with complex cases (or make the case that your existing tests are exhaustive enough).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: svg Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! status: confirmed bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants