-
-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up the PDF backend #992
Comments
Thanks for the suggestion! There's an initial attempt at jkseppan:pdf-context, which isn't complete yet (doesn't do hatching, probably breaks usetex). Unfortunately, that version is slower and produces larger output than master. I tried with the test_speed2.py script provided by Gökhan Sever, and on my laptop (with nums=2, i.e. two pages of output), master takes about 16.3 seconds: python ../test_speed2.py 15.87s user 0.44s system 99% cpu 16.339 total and the refactored code takes about 17.8 seconds: python ../test_speed2.py 17.32s user 0.45s system 99% cpu 17.797 total Looking at the output, it seems that the code setting up the graphics context is doing quite a bit of repeated work (e.g. setting alpha to one value, then immediately to another; and it seems to wrap every tick mark in multiple layers of contexts, with identical setup for each one). Because the pdf backend has an explicit stack of graphics contexts, it can output only the part of the setup that is different from the previous drawing command. This saves on I/O, which is probably a net win, even though keeping track of the stack means more computation. |
Hi Jouni, |
So I found pdf.compression in matplotlibrc, set it to 0 for uncompressed PDF, which gave me PDF output that is more or less human-readable. Then I could look at the PDF output with the current PDF backend and the refactored one. Most if not all of the difference is coming from the tick marks. Skipping the tick marks reduces the running time from about 17.2 seconds to 13.1 seconds for the current PDF backend; the refactored PDF backend has approximately running time if we skip the tick marks. The tick marks are drawn in the draw() method of the Axis class in lib/matplotlib/axis.py as follows:
so each tick is drawn independently. We may be able to speed up the PDF backend and other backends if we can make use of what is conserved between the ticks. |
@mdehoon Is this still a problem? |
Yes I think so, but I haven't found time to look at this in more detail. It may be better to keep this open until we can resolve it. But if you prefer to close issues that are not immediately actionable, then that is OK with me too. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
Recently there was a thread on the matplotlib-users mailing list about the speed of the PDF backend:
http://sourceforge.net/mailarchive/forum.php?thread_name=4FF926B8.1030202%40hawaii.edu&forum_name=matplotlib-users
Whereas the Mac OS X and the Cairo backends make use of new_gc and gc.restore to keep track of the graphics context, the PDF backend uses check_gc and an internal stack of graphics contexts. Since nowadays matplotlib has gc.restore functionality, I don't think that that is needed any more. Removing these bits of code from the PDF backend may speed up this backend, as well as result in smaller PDF files.
See this revision for when gc.restore was added to matplotlib:
http://matplotlib.svn.sourceforge.net/viewvc/matplotlib?view=revision&revision=7112
The principles behind graphics with Mac OS X and Cairo are very similar to those of PDF, so one could simply look at those backends and use the corresponding code in the PDF backend.
Probably the Cairo backend is easiest to follow, as it is in pure Python.
Below is an outline of what is needed. The key point is that since nowadays each call to new_gc is balanced by a call to gc.restore, we can simply print out Op.gsave and Op.grestore directly without having to push and pop GraphicsContextPdf instances. The other operations on the GraphicsContextPdf object can also be executed directly.
In the RendererPDF class:
In the init method,
self.gc = self.new_gc()
should be replaced with
self.gc = GraphicsContextPdf(self.file)
The check_gc method and all calls to this method can be removed.
In the draw_image method, we need to still do the clipping, and we need to add something like
clippath, clippath_trans = gc.get_clip_path()
and output Op.clip appropriately somewhere between the Op.gsave and Op.grestore. (It may help to look at the Cairo backend to see how this method is implemented there).
In the draw_path method, if rgbFace is not None, then between an Op.gsave and an Op.grestore we need to change the fill color by Op.setrgb_nonstroke. (Comparing to the Cairo backend may help here).
In the new_gc method, instead of creating a new GraphicsContextPdf object, we simply apply Op.gsave to the current graphics context object. This is the important part.
In the GraphicsContextPdf class:
We need to add a method "restore" which emits Op.grestore. Since in matplotlib each call to new_gc is balanced with a call to restore, each Op.gsave is balanced by an Op.grestore.
The following methods should be renamed, and the corresponding PDF operation can now be executed directly:
capstyle_cmd ---> set_capstyle
joinstyle_cmd ---> set_joinstyle
linewidth_cmd ---> set_linewidth
dash_cmd ---> set_dashes
alpha_cmd ---> set_alpha
rgb_cmd ---> set_foreground
For the method hatch_cmd, I am not quite sure how to implement it for PDF. Hatching was not implemented for the Cairo background, but it was implemented for the MacOSX backend (in the C code), so it must be possible in the PDF backend also.
The push and pop methods can be removed.
Instead of the method clip_cmd, we new need a method set_clip_path. This method can apply Op.clip directly, without having to search for the appropriate graphics context.
The delta method and the copy_properties methods can be removed.
The repr method can be removed.
The .parent attribute can be removed.
The text was updated successfully, but these errors were encountered: