Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Log scaled barplots in PDF result in infinite rectangles #28175

Open
eabila opened this issue May 6, 2024 · 5 comments
Open

[Bug]: Log scaled barplots in PDF result in infinite rectangles #28175

eabila opened this issue May 6, 2024 · 5 comments

Comments

@eabila
Copy link

eabila commented May 6, 2024

Bug summary

When saving log scaled barplots as PDFs, the bars get saved as rectangles with infinite size.
Log_scale_barplot

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

categories = ['A', 'B', 'C', 'D', 'E']
values = [800, 1200, 2000, 3000, 5000]

plt.bar(categories, values)
plt.yscale('log')
plt.savefig('Log_scale_barplot.pdf')

Actual outcome

Rectangles with infinite size

Expected outcome

Rectangles size capped at axis

Additional information

No response

Operating system

Linux, OS/X

Matplotlib Version

3.8.4

Matplotlib Backend

No response

Python version

No response

Jupyter version

No response

Installation

None

@oscargus
Copy link
Contributor

oscargus commented May 6, 2024

Which tool are you using? I imported the plot in Inkscape and in neither of the import options (poppler/internal) it happens.

Next step is to actually check the PDF content, but that will have to wait.

(I also tried SVG to try to get some idea of where the problem happens, and the SVG also looked OK.)

@eabila
Copy link
Author

eabila commented May 6, 2024

Sorry, I forgot to mention this: I am using Adobe Illustrator here.

@ksunden
Copy link
Member

ksunden commented May 7, 2024

As far as I can tell, it isn't actually rendering incorrectly, correct?

A bar plot on a log scale is an odd thing because there is an implied zero, and zero on log scale cannot exist. You CAN set a baseline e.g. plt.bar(..., bottom=1) which will set it to something that is bounded on the axis. Note, however, that adding bottom explicitly will change the automatic limits to include the bottom value.

I actually can see the path extending when I load the PDF into inkscape using the internal importer. The generated groups/top level thing that you click on is clipped, but if you go down to the thing labeled pathXX (the leaf of the object tree) it does show as extending beyond. The SVG directly created from MPL requires a little more finesse to see, but it does actually have a large (but finite) stop value (~300k pts) if you look into what is actually stored in the path in the xml, just the clip path is composed with it at a low level, so even in inkscape you don't see it as extending unless you know where to look (I was able to see it in the XML editor). The PDF imported path is quite close (though there does seem to be a difference in what is considered the origin here, so it is actually -300k pts, rather than +300k pts).

That said, it is correct as far as I'm concerned, and absent additional context as to why it is problematic (e.g. a common pdf program actually rendering it incorrectly or something that is explicitly out of PDF spec... here it seems to display fine, it is only in the editor that you see it is extended, which is the correct fundamental bar for what was plotted.), I'm inclined to say it is fine. A bar graph (by default) is from zero to a number, and zero on a log scale is at -inf... and it draws correctly.

I do not see a particular reason we should be clipping paths when the vector formats we support already have clipping provided (and we use it, and it renders correctly). I believe we do something for lines where we remove some points, but that is about not bloating file size with points that are outside of the frame (IIRC, we clip to the Figure, not the axes to make sure we don't miss anything). There is no such file size concern for these paths, though.

@ceesem
Copy link

ceesem commented May 20, 2024

While they display correctly, these objects are difficult to work with when editing. Often in biological sciences, for example, it is necessary to build complex figures out of a number of panels in a manner that is difficult to do within matplotlib alone. Bar plots with these infinite points become difficult to work with in this context for things like snapping alignment or auto-sizing figure boards. While some of this could be blamed on decisions made by Adobe, I do think that pdf objects with points at infinity are unusual and, while mathematically correct, not what one expects to work with.

@tacaswell
Copy link
Member

TL;DR I see two paths to get this done that may be reasonable (but obviously that would need to be re-evaluated based on someone actually doing it and seeing what it looks like). We won't know if the cost of carrying the implementation is worth the pay off, my gut is that if someone tries to push this it has a 50/50 chance of getting in (the risks I see are the complexity spinning out of control, the issues with sorting out what the right pre-clip region is, and once it is done it being harder / more complicated for the user to use than just manually setting a bottom on the bar call).


I also agree we are not doing a "wrong" thing, but in some contexts we could be doing a better thing. This issue rhymes with the bug reports of "I plotted 1M points with scatter, zoomed in so I can see 10 and my pdf is huge" because all of the points are included in the output (not just the visible ones) in that we are doing less work on the mpl side and deferring to the program that consumes the vector output to do the clipping.

We cannot clip at artist creation time (the user may change back to linear scale or change the limits so any value we pick will be "wrong"). In principle we may know enough at draw time, but it is not obvious to me under what conditions we should clip and to what value we should clip. Pre-clipping should be opt-in however we do it.

We can not naively clip to the bounding box as that would lead to funny results if the bar has a different edge color. e.g. you might expect these three bars to look the same:

fig, ax = plt.subplots()
ax.bar([1, 2, 3], [4, 4+.05, 4+10], bottom=[0,-.05, -10], lw=15, edgecolor='k')
ax.set_ylim(0, 5)

so

but they do not because the stroking on the edge of the path. It is possible (but not trivial) to sort out how far out we have to go to make sure we don't falsely add an edge (it has to be computed from the lw as for any fixed expansion of the clip box, I can pick a view limit + lw combination that breaks).

That said, we do have the code to clip a path and

def draw(self, renderer):
# docstring inherited
if not self.get_visible():
return
path = self.get_path()
transform = self.get_transform()
tpath = transform.transform_path_non_affine(path)
affine = transform.get_affine()
self._draw_paths_with_artist_properties(
renderer,
[(tpath, affine,
# Work around a bug in the PDF and SVG renderers, which
# do not draw the hatches if the facecolor is fully
# transparent, but do if it is None.
self._facecolor if self._facecolor[3] else None)])
is probably the right place to put in an (optional) call to that code. We would have to add a bit of state to each artist to indicate if it should pre-clip in vector backends. The pro of this is that it gives you artist-level granularity and it "just works" with all vector backends. The con is that now we have another bit of state floating around that only matters in some contexts and we have to do this to each artist we want to have this functionality.

It may also be true that we know enough in

def draw_path(self, gc, path, transform, rgbFace=None):
# docstring inherited
self.check_gc(gc, rgbFace)
self.file.writePath(
path, transform,
rgbFace is None and gc.get_hatch_path() is None,
gc.get_sketch_params())
self.file.output(self.gc.paint())
to do it there (the clip state is someplace in the gc I hope we can either pull it back out and if not keep a second copy around) and then the state if paths should be pre-clipped lives on the renderer. The pro of this is that it keeps the state for the vector backends with the vector backends (and we have pillow_kwargs for backend-specific inputs to fig.savefig already so adding one for pre-clipping is reasonable) and it "just works" for anything that is a Path (which is most things eventually), but we have to implement this per-backend we want to add it to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants