Skip to content
This repository

Make tests faster #778

Merged
merged 8 commits into from over 1 year ago
 +54,225 57,609

5 participants

Michael Droettboom Paul Ivanov Thomas Robitaille Jouni K. Seppänen Phil Elson
Michael Droettboom
Owner

This is just a placeholder for some work that would be nice to get done.

1) The baseline images are converted from PDF and SVG to PNG every run. These could be cached between runs to save a good portion of runtime

2) Look into running tests in parallel. Nose supports this, but it wasn't working well last time I tried.

3) Maybe we could use lower resolution PNGs and convert the PDFs and SVGs at lower resolution to make things faster. This would also hopefully gloss over some of the text-related false positives we see on a regular basis.

Any other ideas? Let's brainstorm!

Paul Ivanov
Collaborator

this sounds great - a year ago we made the tests run faster, but since then, I think the fine grained nature of the text / mathtext tests made them pretty slow again (most of the time is spent in test_mathtext on my machine.)

there's also the issue (related to 1.) of calling inkscape multiple times - I think we've tried looking into this before about leaving a process running and batching those calls together.

it would also be nice to limit the testing to one of PDF, SVG, or PNG - I do this by hand to speed things up.

Thomas Robitaille

If (and this is a big if) we can make freetype behave predictably (as we've started to in #779, though more work needs to be done), then would one not expect the SVG files to be identical, in which case we could just compare the contents of the file and not convert them to PNG? (same for PDF)

Michael Droettboom
Owner

The freetype issue is totally orthogonal to SVG and PDF, since the freetype renderer is not used by vector backends. The SVG backend only uses the freetype TrueType file parser to get the curves, which come directly from the file and shouldn't be impacted by the version of freetype. The PDF backend doesn't even use that -- it uses a library called ttconv.

The conversion of SVG and PDF to PNG will be necessary for the foreseeable future.

Since both PDF and SVG use dictionaries and hash-based ids to some extent, the ordering of the elements changes between runs. The backends are constantly tweaked to fix bugs and improve efficiencies etc. which can change the structure of the file in radical ways while (if all goes well) not impacting the final result. The only way to look at the final result in an objective way is to render it to a bitmap and compare it in that space.

Michael Droettboom
Owner

Here's the first chunk of changes to make the tests faster. As these things tend to be finicky across systems, I'd appreciate as much testing as possible.

1) The conversions from PDF and SVG to PNG are cached. By default, this is to ~/.matplotlib/test-cache, but the location can be configured (as usual) with the MPLCONFIGDIR environment variable. The files are named based on the MD5 cache of the source file, so caching should work and have benefit even when flipping between different git branches. For this reason, only the baseline images are cached, since they have a fixed MD5 hash. Test result images can not be cached because the ordering of the elements in them may change between identical runs at the whim of dictionary ordering. This would result in the size of the cache exploding. Still, I get about a 33% speedup overall with this technique.

2) For a smaller gain, the histogram of the images that is used for comparison is replaced by bincount if Numpy >= 1.6 (the first version to have a minlength argument to bincount). This was the largest time sink after ghostscript and inkscape when running tests.

3) If available, it will use the python bindings to the ghostscript API (python-ghostscript). This prevents firing off separate processes and reinitializing ghostscript many times. The speedup is pretty modest, unfortunately, but it still may be worth leaving in. pip install ghostscript to get the bindings.

4) Lastly, unrelated to speed, this changes the file-naming scheme so rather than expected-foo.png, we have foo-expected.png. When displaying the files in alphabetical order in a thumbnail viewer (such as gthumb) the relevant files are grouped together. This helps to more efficiently scan for problems.

Jouni K. Seppänen
Collaborator

It might be useful to leave ghostscript and inkscape running as separate processes. For ghostscript, you'd start it with a command line like

gs -sDEVICE=png16m -dNOPAUSE -sOutputFile=page%d.png

and send input lines like

(foo/bar/baz.pdf) run
(another/path/to/file.pdf) run

Trying this from the command line, it seems to work, but I don't know if there are subtle problems where e.g. one pdf file causes a font to be loaded that affects the rendering of the next file. Error recovery is another big question. I think you'd want to use something like pexpect to wait for gs prompts to know that the latest file has been completely written.

Inkscape has a --shell option that allows you to send it input lines like "input.svg --export-png=output.png".

Jouni K. Seppänen
Collaborator

Our tests are mainly end-to-end regression tests: you run a script and check that the output stays the same as it was before. That's useful for noticing regressions, but tests of a smaller scope would be faster to run. We could for example set up a test suite that includes predefined sequences of backend function calls and checks that the output for each sequence stays the same. That would allow skipping the frontend code when you are working on a specific backend. Likewise, we could have a stub backend that only records the calls made by the frontend code, and compares that sequence to a predefined "correct" sequence for a given test case.

A related thought: if the freetype version differences are problematic, we could make most test cases not include any text output (disable tick labels, etc), and have a smaller number of cases that specifically test text rendering, and include a larger set of correct answers for just those tests.

Jouni K. Seppänen
Collaborator

Hey, it shouldn't be too hard to make a checksummer that's aware of pdf syntax and tolerates trivial changes like reordering of dictionary elements and ignores things like the CreationDate timestamp. That might speed the usual case up quite a bit.

Michael Droettboom
Owner
mdboom commented

@jkseppan: These are some good ideas.

I had implemented using Inkscape around a year ago, but ultimately reverted it. Inkscape leaked memory with each render, so it needed to be restarted periodically, and I wasn't terribly comfortable with coming up with a heuristic for how often to do that. See 9086ef1 for the original kick at that can. It may be that current versions of Inkscape are less leaky, and it's also possible that my implementation could be improved. I can't find my old numbers, but I don't recall that it made a huge improvement in runtime, but that may have been that the memory leaks were causing problems on a memory-constrained machine.

I haven't attempted a similar approach with ghostscript, but it's nice to know it can be done.

I think removing text except where it is necessary is a good idea. There are cases where the ticking itself is being tested and we want to make sure the numbers and layout of the ticks are correct, but for most tests the ticking is superfluous.

Michael Droettboom
Owner
mdboom commented

I have an initial commit that coverts all of the PDF's to PNG's using the same gs process. The speedup isn't terribly dramatic: It went from 4:30 to 4:15 wall clock time on my machine. It would be great if this could be tested -- I have no idea how robust this is to gs crashing etc.

Michael Droettboom
Owner
mdboom commented

Rebased on current master

Jouni K. Seppänen
Collaborator

Tests pass on a Macbook Pro running OS X 10.6.8, Python 2.7.2, GPL Ghostscript 8.71:

Ran 1090 tests in 615.499s

OK (KNOWNFAIL=2)

Jouni K. Seppänen
Collaborator

I tried this out on Amazon EC2. I compared the runtime of python -c 'import matplotlib; matplotlib.test(1)' of this commit ("after") and this commit with e062691 reverted ("before"). I used an Amazon Linux AMI that has Python 2.6, installed various prerequisites and compiled matplotlib from git. I couldn't find a version of Inkscape for that distribution, so the numbers don't include svg conversion.

On a small instance (single core, pretty slow), "before" took 702 seconds and "after" took 706 seconds, so it was slightly slower but probably within measurement noise.

On a high-cpu medium instance (dual core, each supposedly 2.5 times faster than the small instance), "before" took 452 seconds and "after" took 357 seconds. Just to be sure, I ran "before" again, and it took 446 seconds, so I don't think it was a fluke.

Based on this, it looks like this makes sense on dual-core but doesn't help on single-core. Perhaps the overhead from starting a new process is similar to that from context switches on single-core.

Phil Elson
Collaborator
pelson commented

Since both PDF and SVG use dictionaries and hash-based ids to some extent, the ordering of the elements changes between runs.

Depending on how much of an overhead the conversion/storage of pdf&svg -> png is, we could always use an OrderedDict for python >= 2.7 (there are loads of OrderedDics recipes pre that version) to give the svg/pdf output a consistent ordering.

added some commits
Michael Droettboom Change generated filenames. Put the phrases "expected" and "failed-di…
…ff" toward the end of the filename so files are more appropriately grouped when viewed in file managers and thumbnail tools (eg. gthumb)
b58aa94
Michael Droettboom Use bincount rather than histogram for comparing images on Numpy >= 1…
….6. Continue to use histogram for earlier Numpy versions.
2ad49ae
Michael Droettboom Cache results of previous {PDF|SVG} -> PNG conversions. We can only c…
…ache the baseline images, since the test results are never the same twice given different dictionary orderings in the file.
b3c7bcd
Michael Droettboom Use python-ghostscript to convert PDF and EPS to PNG (if available). d638c9b
Michael Droettboom Use the same gs process to convert all of the images from PDF to PS bf23f41
Michael Droettboom Add a new kwarg to the image_comparison decorator, remove_text, which…
… removes titles and ticks from a figure before comparison. Use this throughout to make more of the tests less dependent on text differences.
342de60
Michael Droettboom Remove obsolete docs about using ghostscript library since using it d…
…oesn't seem to offer any speed advantage.
bd8eaeb
Michael Droettboom Fix tests after rebase 16f450f
Phil Elson
Collaborator
pelson commented

@mdboom: What kind of state is this in? I'm prepared to go through these changes if its going to save us all time in the long run.

Michael Droettboom
Owner

@jkseppan: The real advantage of this PR is on subsequent runs, since it caches the results of conversions. For a real speed comparison, you should run each test run twice (cleaning ~/.matplotlib before each pair of runs).

@pelson: I think this is good to merge. It would be nice to be able to get some predictable ordering in SVG and PDF such that the PNG conversions of the test results (rather than just the baseline images) could also be done. But there's no reason that can't be done as a follow-up in a separate PR.

Phil Elson pelson commented on the diff
lib/matplotlib/testing/compare.py
@@ -127,7 +172,7 @@ def comparable_formats():
127 172
    on this system.'''
128 173
    return ['png'] + converter.keys()
129 174
 
130  
-def convert(filename):
  175
+def convert(filename, cache):
1
Phil Elson Collaborator
pelson added a note

Some high level comments about cache wouldn't go amiss here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Phil Elson pelson commented on the diff
lib/matplotlib/testing/compare.py
((8 lines not shown))
221 284
 
222 285
    # compare the resulting image histogram functions
223  
-   rms = 0
224  
-   bins = np.arange(257)
225  
-   for i in xrange(0, 3):
226  
-      h1p = expectedImage[:,:,i]
227  
-      h2p = actualImage[:,:,i]
  286
+   expected_version = version.LooseVersion("1.6")
  287
+   found_version = version.LooseVersion(np.__version__)
  288
+
  289
+   # On Numpy 1.6, we can use bincount with minlength, which is much faster than
  290
+   # using histogram
  291
+   if found_version >= expected_version:
  292
+      rms = 0
  293
+
  294
+      for i in xrange(0, 3):
2
Phil Elson Collaborator
pelson added a note

magic number. Perhaps use the length of the last dimension.

Michael Droettboom Owner
mdboom added a note

It's not really a magic number -- all of this code is predicated on the images being RGB, and there's not much point in being more general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Phil Elson pelson commented on the diff
lib/matplotlib/testing/compare.py
@@ -216,24 +279,42 @@ def compare_images( expected, actual, tol, in_decorator=False ):
216 279
    actualImage, expectedImage = crop_to_same(actual, actualImage, expected, expectedImage)
217 280
 
218 281
    # normalize the images
219  
-   expectedImage = image_util.autocontrast( expectedImage, 2 )
220  
-   actualImage = image_util.autocontrast( actualImage, 2 )
  282
+   # expectedImage = image_util.autocontrast( expectedImage, 2 )
2
Phil Elson Collaborator
pelson added a note

How come these got commented out?

Michael Droettboom Owner
mdboom added a note

It didn't seem to be necessary, particularly after the tests were changed to use non-antialiased text. This was just slowing things down. But I agree it might be better to just remove the lines of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Phil Elson pelson commented on the diff
lib/matplotlib/testing/compare.py
@@ -216,24 +279,42 @@ def compare_images( expected, actual, tol, in_decorator=False ):
216 279
    actualImage, expectedImage = crop_to_same(actual, actualImage, expected, expectedImage)
217 280
 
218 281
    # normalize the images
219  
-   expectedImage = image_util.autocontrast( expectedImage, 2 )
220  
-   actualImage = image_util.autocontrast( actualImage, 2 )
  282
+   # expectedImage = image_util.autocontrast( expectedImage, 2 )
  283
+   # actualImage = image_util.autocontrast( actualImage, 2 )
221 284
 
222 285
    # compare the resulting image histogram functions
1
Phil Elson Collaborator
pelson added a note

Is a histogram function a helpful thing outside of this testing framework? Could/Should it be factored into the main code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Phil Elson pelson commented on the diff
lib/matplotlib/testing/decorators.py
@@ -98,6 +100,16 @@ def setup_class(cls):
98 100
 
99 101
         cls._func()
100 102
 
  103
+    @staticmethod
  104
+    def remove_text(figure):
2
Phil Elson Collaborator
pelson added a note

docstring. also, does the axes.texts list need to be cleared?

Phil Elson Collaborator
pelson added a note

also, does the axes.texts list need to be cleared?

Ignore that, the docstring below explains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Phil Elson
Collaborator

On the whole I agree with removing the texts on some of the test results. However I wonder if the text in the symlog2 should remain. (there may be more as I go through them)

Phil Elson
Collaborator

I've got to the stage of reviewing the actual test_*.py files, which I will try to do tomorrow, but I have no probs with the rest of the changes.

Phil Elson
Collaborator

@mdboom: I'm going to merge this while it is still fresh (and mergable). For some reason github isn't showing all the diffs (the test files for instance) so I had to read the diffs on my clone. There are no show stoppers, but I do think it is worth addressing some of my comments either on this PR, or in a seperate one (if they need code).

Good stuff.

Phil Elson pelson merged commit 1456f05 into from
Phil Elson pelson closed this
Michael Droettboom mdboom referenced this pull request
Merged

Test framework cleanups #1056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 8 unique commits by 1 author.

Jun 13, 2012
Michael Droettboom Change generated filenames. Put the phrases "expected" and "failed-di…
…ff" toward the end of the filename so files are more appropriately grouped when viewed in file managers and thumbnail tools (eg. gthumb)
b58aa94
Michael Droettboom Use bincount rather than histogram for comparing images on Numpy >= 1…
….6. Continue to use histogram for earlier Numpy versions.
2ad49ae
Michael Droettboom Cache results of previous {PDF|SVG} -> PNG conversions. We can only c…
…ache the baseline images, since the test results are never the same twice given different dictionary orderings in the file.
b3c7bcd
Michael Droettboom Use python-ghostscript to convert PDF and EPS to PNG (if available). d638c9b
Michael Droettboom Use the same gs process to convert all of the images from PDF to PS bf23f41
Michael Droettboom Add a new kwarg to the image_comparison decorator, remove_text, which…
… removes titles and ticks from a figure before comparison. Use this throughout to make more of the tests less dependent on text differences.
342de60
Michael Droettboom Remove obsolete docs about using ghostscript library since using it d…
…oesn't seem to offer any speed advantage.
bd8eaeb
Michael Droettboom Fix tests after rebase 16f450f
Something went wrong with that request. Please try again.