Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Fix image comparison #1291

Merged
merged 20 commits into from Feb 28, 2013

Conversation

Projects
None yet
7 participants
Contributor

mgiuca-google commented Sep 21, 2012

Fixes the compare_image RMS calculation algorithm, so that it computes the RMS of the difference between corresponding pixels, as opposed to the RMS of the histograms between the two images.

See discussion on Issue 1287.

Note: This is not yet ready to merge, since it breaks a lot of tests. Some negotiation is required to figure out whether to update the expected output for each test, or bump up the tolerance.

@pelson pelson and 1 other commented on an outdated diff Sep 21, 2012

lib/matplotlib/testing/compare.py
actualImage, expectedImage = crop_to_same(actual, actualImage, expected, expectedImage)
- # compare the resulting image histogram functions
- expected_version = version.LooseVersion("1.6")
- found_version = version.LooseVersion(np.__version__)
+ # convert to signed integers, so that the images can be subtracted without
+ # overflow
+ expectedImage = expectedImage.astype(np.int32)
+ actualImage = actualImage.astype(np.int32)
+
+ # calculate the per-pixel errors, then compute the root mean square error
+ num_values = reduce(operator.mul, expectedImage.shape)
@pelson

pelson Sep 21, 2012

Member

np.prod(expectedImage.shape) would do the trick here (obviously your version works, but feels less numpy-y).

@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

Ah much better.

@pelson pelson and 1 other commented on an outdated diff Sep 21, 2012

lib/matplotlib/testing/compare.py
actualImage, expectedImage = crop_to_same(actual, actualImage, expected, expectedImage)
- # compare the resulting image histogram functions
- expected_version = version.LooseVersion("1.6")
- found_version = version.LooseVersion(np.__version__)
+ # convert to signed integers, so that the images can be subtracted without
+ # overflow
+ expectedImage = expectedImage.astype(np.int32)
+ actualImage = actualImage.astype(np.int32)
@pelson

pelson Sep 21, 2012

Member

Wouldn't int16 do?

>>> np.int16(0) - np.int16(255)
-255
@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

Yeah. I was trying to make it work for 16-bit PNGs as well, but the rest of the code won't support that anyway. Changed to int16.

@pelson pelson and 1 other commented on an outdated diff Sep 21, 2012

lib/matplotlib/testing/compare.py
else:
- rms = 0
- bins = np.arange(257)
-
- for i in xrange(0, 3):
- h1p = expectedImage[:,:,i]
- h2p = actualImage[:,:,i]
-
- h1h = np.histogram(h1p, bins=bins)[0]
- h2h = np.histogram(h2p, bins=bins)[0]
-
- rms += np.sum(np.power((h1h-h2h), 2))
+ histogram = np.histogram(absDiffImage, bins=np.arange(257))[0]
@pelson

pelson Sep 21, 2012

Member

Could this not be np.arange(256) rather than 257?

@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

No, because the bins argument to histogram expects an array with all of the boundary values, from the minimum value of the first bin, to the maximum value of the last bin. The resulting histogram has len(bins)-1 values, so passing an array with 257 elements results in a histogram of 256 elements that counts the number of values of each byte.

np.arange(256) would mean that the 254 and 255 values would be merged into a single bin.

@pelson pelson and 1 other commented on an outdated diff Sep 21, 2012

lib/matplotlib/testing/compare.py
- rms = np.sqrt(rms / (256 * 3))
+ sumOfSquares = sum(count*(i**2) for i, count in enumerate(histogram))
@pelson

pelson Sep 21, 2012

Member

Nice line. I think it can be simplified with np.sum(histogram * np.arange(len(histogram))**2)

e.g:

>>> a = np.array([0, 3, 2, 4]) 
>>> i = np.arange(len(a))
>>> a
array([0, 3, 2, 4])
>>> i
array([0, 1, 2, 3])
>>> i**2
array([0, 1, 4, 9])
>>> a * i**2
array([ 0,  3,  8, 36])
>>> np.sum(a * i**2)
47
@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

Awesome.

@pelson pelson and 1 other commented on an outdated diff Sep 21, 2012

lib/matplotlib/testing/compare.py
- rms = np.sqrt(rms / (256 * 3))
+ sumOfSquares = sum(count*(i**2) for i, count in enumerate(histogram))
+ rms = np.sqrt(float(sumOfSquares) / num_values)
@pelson

pelson Sep 21, 2012

Member

This is the standard rms, right? So another step would be needed to scale the rms to have a standard magnitude irrespective of the number of pixels?

@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

That's what / num_values is doing. It is calculating the (geometric) average of errors across all pixels, so you have already divided by the number of pixels. To test this, see my all127 vs all128 test, which has an RMS error of exactly 1.0, because all of the pixels have a difference of 1.

@pelson pelson and 2 others commented on an outdated diff Sep 21, 2012

lib/matplotlib/tests/test_compare_images.py
+ im2 = 'cosine_peak-nn-img-minorchange.png'
+ image_comparison_expect_rms(im1, im2, tol=10, expect_rms=None)
+
+ # Now test with no tolerance.
+ image_comparison_expect_rms(im1, im2, tol=0, expect_rms=2.99949)
+
+def test_image_compare_scrambled():
+ """Test comparison of an image and the same image scrambled."""
+ # This expects the images to compare completely different, with a very large
+ # RMS.
+ # Note: The image has been scrambled in a specific way, by having each
+ # colour component of each pixel randomly placed somewhere in the image. It
+ # contains exactly the same number of pixels of each colour value of R, G
+ # and B, but in a totally different position.
+ im1 = 'cosine_peak-nn-img.png'
+ im2 = 'cosine_peak-nn-img-scrambled.png'
@pelson

pelson Sep 21, 2012

Member

We could compute this file, rather than having to store it. Not sure if we would want to do such a thing (the only reason I have is to reduce the number of extra files needed in the repo). @mdboom : any feeling on whether we need to be particularly cautious of growing the repo too much?

@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

Yep, I could replace this with the algorithm I used to scramble the file. If you are concerned about the repo size, I probably should not be copying the other image either -- note that it is an exact copy of baseline_images/test_delaunay/cosine_peak-nn-img.png. Let me know if you want me to make either of these changes.

@pelson

pelson Jan 17, 2013

Member

any feeling on whether we need to be particularly cautious of growing the repo too much?

IMHO We should be cautious, but that doesn't mean we have to be frugal. Do what you think is best here - if its practical and not overly prohibitive, then I would prefer to programatically generate some of the data - otherwise what's here is fine.

@mdboom

mdboom Jan 17, 2013

Owner

We don't need to worry about going over github quotas. The only reason for concern is the extra disk space/bandwidth for everyone's local clones, but I don't think we need to worry at the moment.

@pelson pelson commented on the diff Sep 21, 2012

lib/matplotlib/tests/test_compare_images.py
@@ -0,0 +1,74 @@
+from __future__ import print_function
+from matplotlib.testing.compare import compare_images
+from matplotlib.testing.decorators import _image_directories
+import os
+import shutil
+
+baseline_dir, result_dir = _image_directories(lambda: 'dummy func')
+
@pelson

pelson Sep 21, 2012

Member

I would like to see a pixel shift test (e.g. the whole image shifted across by 1 pixel).

Similarly, it would be nice to test a sub-region with a pixel shift. (i.e. move the x axis across by 1 pixel).

@mgiuca-google

mgiuca-google Sep 24, 2012

Contributor

Done. The RMS errors for the particular cases I chose are 22 (for the whole image) and 13 (for the sub image), out of 255. This is above the default tolerance, but it's what I would have expected, given that a lot of pixels don't match. This is expected, but unfortunate, since it is not a major difference in the images from a human standpoint.

Member

pelson commented Sep 21, 2012

@mgiuca-google : This is really good stuff, thank you!

As you can see, I have raised a couple of questions, but in principle, I think this will be a beneficial change. As I hinted at in my comment on the original issue, I probably wouldn't call the original image comparison test "broke", just that it has some characteristics which may not be ideal for our image testing requirements. On that basis, I wonder if it is worth us maintaining the two functions side by side, primarily so that other users who may want to do image comparison could decide which algorithm to use. This may be a contentious issue, as inevitably it will increase the amount of code that mpl has to maintain...

One nitpick observation: you have built on code which is obviously not PEP8 compliant, resulting in you own code not being strictly PEP8 compliant (although you have followed the guiding principle: "A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important."). I would certainly find it an improvement if you were to rename the variables you have added/touched to be more PEP8-y (i.e. from camelCase to underscored_variables).

On the whole, pretty awesome!

Member

WeatherGod commented Sep 21, 2012

Just something I have come across today in my work that might be relevant is the MapReady toolkit: http://www.asf.alaska.edu/downloads/software_tools

In it, there is a program called "diffimage" (which, because this is a geoprocessing tool, does a bit more than we are looking for), but has the following description:

DESCRIPTION:
   1. diffimage calculates image statistics within each input image
      and calculates the peak signal-to-noise (PSNR) between the two
      images.
   2. diffimage then lines up the two images, to slightly better
      than single-pixel precision, and determines if any geolocation
      shift has occurred between the two and the size of the shift.
      Because an fft-based match is utilized it will work with images of
      any size, but is most efficient when the image dimensions are
      near a power of 2.  The images need not be square.
   3. diffimage then compares image statistics and geolocation shift
      (if it occurred) and determines if the two images are different from
      each other or not.
   4. If there are no differences, the output file will exist but will be
      of zero length.  Otherwise, a summary of the differences will be placed
      in both the output file and the log file (if specified.)

So, what is interesting is the use of the fourier-transform as part of the image differentiating technique. Don't know if that might be an interesting avenue to pursue or not. Cheers!

Member

dmcdougall commented Sep 21, 2012

On Fri, Sep 21, 2012 at 6:44 PM, Benjamin Root notifications@github.comwrote:

Just something I have come across today in my work that might be relevant
is the MapReady toolkit:
http://www.asf.alaska.edu/downloads/software_tools

In it, there is a program called "diffimage" (which, because this is a
geoprocessing tool, does a bit more than we are looking for), but has the
following description:

DESCRIPTION:

  1. diffimage calculates image statistics within each input image
    and calculates the peak signal-to-noise (PSNR) between the two
    images.
  2. diffimage then lines up the two images, to slightly better
    than single-pixel precision, and determines if any geolocation
    shift has occurred between the two and the size of the shift.
    Because an fft-based match is utilized it will work with images of
    any size, but is most efficient when the image dimensions are
    near a power of 2. The images need not be square.
  3. diffimage then compares image statistics and geolocation shift
    (if it occurred) and determines if the two images are different from
    each other or not.
  4. If there are no differences, the output file will exist but will be
    of zero length. Otherwise, a summary of the differences will be placed
    in both the output file and the log file (if specified.)

So, what is interesting is the use of the fourier-transform as part of the
image differentiating technique. Don't know if that might be an interesting
avenue to pursue or not. Cheers!


Reply to this email directly or view it on GitHubhttps://github.com/matplotlib/matplotlib/pull/1291#issuecomment-8775091.

Interesting! Good find.

Damon McDougall
http://www.damon-is-a-geek.com
B2.39
Mathematics Institute
University of Warwick
Coventry
West Midlands
CV4 7AL
United Kingdom

Contributor

mgiuca-google commented Sep 24, 2012

Thanks for your comments, @pelson. I have taken care of them.

I would certainly find it an improvement if you were to rename the variables you have added/touched to be more PEP8-y (i.e. from camelCase to underscored_variables).

I have renamed absDiffImage and sumOfSquares to PEP-8 style, but I didn't want to touch expectedImage and actualImage since that will make my patch look much bigger than it should.

@WeatherGod good find. I will have a look at that tool later on. The main improvement I'd be interested in over the RMSE algorithm I implemented would be whether it can detect minor pixel shifts and assign a small penalty (whereas RMS assigns a large penalty because it just thinks that all of the pixels have changed). It shoulds like step 2 (lining up the two images) is designed to solve this, but again, we need to be able to deal with sub-image shifts, not just whole-image shifts. The new test cases Phil suggested that I add are helpful in judging this requirement. They currently output 22 and 13 respectively. I'd expect them to output some positive value, but much smaller, perhaps about 4 and 2, respectively.

Contributor

mgiuca-google commented Oct 18, 2012

@WeatherGod wrote:

Just something I have come across today in my work that might be relevant is the MapReady toolkit:
http://www.asf.alaska.edu/downloads/software_tools

I'm not sure if you're advocating using this tool or just borrowing the idea. If you meant the former, I had a brief look at the license agreement and it is incompatible with Matplotlib. It seems to be basically the BSD license, but with the additional BSD-incompatible clause:

Redistribution and use of source and binary forms are for noncommercial purposes only.

Member

pelson commented Jan 14, 2013

@mgiuca-google - if you wouldn't mind rebasing this, I'd like to see if we can get this merged in the next couple of weeks. Previous commenter from #1287 were @mdboom, @dmcdougall, @WeatherGod so ideally we would get either a 👍, 👎 or and explicit abstinence from those before we actually press the merge button (other commenter more than welcome too!).

Cheers,

Owner

mdboom commented Jan 14, 2013

I'm definitely in favor of this in principle. Once this is rebased and we have something to test again, I'd like to kick the tires one more time (since accidentally breaking the test suite would be a major problem). Assuming all that goes well, I'd say this is good to go.

@NelleV NelleV and 1 other commented on an outdated diff Jan 14, 2013

lib/matplotlib/tests/test_compare_images.py
+
+baseline_dir, result_dir = _image_directories(lambda: 'dummy func')
+
+# Tests of the image comparison algorithm.
+def image_comparison_expect_rms(im1, im2, tol, expect_rms):
+ """Compare two images, expecting a particular RMS error.
+
+ im1 and im2 are filenames relative to the baseline_dir directory.
+
+ tol is the tolerance to pass to compare_images.
+
+ expect_rms is the expected RMS value, or None. If None, the test will
+ succeed if compare_images succeeds. Otherwise, the test will succeed if
+ compare_images fails and returns an RMS error almost equal to this value.
+ """
+ from nose.tools import assert_almost_equal
@NelleV

NelleV Jan 14, 2013

Contributor

It would be better to have those imports at the top of the file

@mgiuca-google

mgiuca-google Jan 16, 2013

Contributor

Done. Thanks for spotting.

@NelleV NelleV and 2 others commented on an outdated diff Jan 14, 2013

lib/matplotlib/testing/compare.py
@@ -280,44 +280,35 @@ def compare_images( expected, actual, tol, in_decorator=False ):
# open the image files and remove the alpha channel (if it exists)
expectedImage = _png.read_png_int( expected )
actualImage = _png.read_png_int( actual )
+ expectedImage = expectedImage[:,:,:3]
@NelleV

NelleV Jan 14, 2013

Contributor

Minor PEP8 nitpick: you have have spaces after ,. You could run flake8 on this file, but as I don't think it is pep8 compliant at all, you'll have a lot of warnings that are not on lines of code you changed.

@mgiuca-google

mgiuca-google Jan 16, 2013

Contributor

I don't think PEP8 has an opinion on this either way; it's fairly specialized syntax for Numpy. The Numpy manual explicitly uses this syntax with no spaces: see z[1,:,:,2]. Besides, as you say, this file is not PEP8-compliant to begin with (see the previous two lines).

@NelleV

NelleV Jan 16, 2013

Contributor

I don't know about the numpy manual, but numpy's code itself doesn't follow pep8 convention. If you want a pep8 fullproof codebase, you should check out scikit-learn.

I can fix the non pep8 compliance when I pep8-tify this file. It's just that merging pep8 compliance patches starting now gives me less work to do later :)

@pelson

pelson Jan 17, 2013

Member

I agree with @NelleV on this - its not your responsibility to PEP8 this file, but ideally any new code should, within reason, be as PEP8 as possible. My take on this example is that there should be spaces after the commas (PEP8 quote: Avoid extraneous whitespace in the following situations: Immediately before a comma, semicolon, or colon - there is no explicit mention of whitespace after the comma, but generally that is the accepted form).

Member

dmcdougall commented Jan 14, 2013

I'm definitely in favor of this in principle. Once this is rebased and we have something to test again, I'd like to kick the tires one more time (since accidentally breaking the test suite would be a major problem). Assuming all that goes well, I'd say this is good to go.

I agree with this. The negotiation @mgiuca-google mentions in this PR message I think should be carefully considered particularly given the issues we have comparing images of rasterised text across different versions of freetype.

Contributor

mgiuca-google commented Jan 16, 2013

Hey guys,

Thanks for bumping this, Phil. I've done a merge up to the current head. (You said "rebase" and I'm not sure if you actually prefer a rebase instead of a merge -- I'm personally not a fan but if you really want a rebase, let me know and I'll do that.) You guys should be able to pull this branch and run the tests.

The other change I made was in commit 5e22c11, I deleted the section which Michael added in 1283feb, which retries a failed comparison, removing any pixel differences of 1. This change was presumably made to work around the fact that if you have a lot of pixels with a slightly different colour, you will get a big error, such as in my all-127 versus all-128 case. My branch fixes that issue, so I don't think we need this extra case. Let me know if there is another good reason for it.

Now this still fails a lot of tests due to RMS failures. As I said in the original PR, we will have to go through and update either the expected output, or the tolerance, for each test. I can do this but it would be good to come to a policy decision first.

Member

pelson commented Jan 16, 2013

I can do this but it would be good to come to a policy decision first.

That is perfectly reasonable - we don't want you doing a lot of tedious work if it only goes stale again.

I'm confident that if we can blitz through a review of what's here this week, @mgiuca-google can then go through rms values next (or if there are other volunteers to help with that process, it can be shared appropriately).

I've done a merge up to the current head. You said "rebase" and I'm not sure if you actually prefer a rebase instead of a merge

I did mean rebase, which is generally our preferred way of bringing branches up to date, but the reason why this is preferred over merge eludes me (for a linear history on master???). I'm sure others can fill in the details on that and whether or not to undo the merge and rebase instead.

Cheers,

Owner

mdboom commented Jan 16, 2013

Yes -- we definitely want a rebase, not a merge. The merge creates clutter in the history, and it makes it look like the old master is not the "trunk".

Why are there more RMS failures with this change? The images should either be identical to the input (in which case they pass) or any differences should be handled by this new algorithm. If not, then updating the baselines will only cause the comparisons to work for you but fail for me (who produced most of the existing baseline images). Or am I missing something? I hope to find some time shortly to check this out and poke at it a bit.

Owner

mdboom commented Jan 16, 2013

Ok -- I see what's happening. It seems like most of these tests are failing due to a subtle text positioning improvement, which shows up mostly in the vector formats. I don't see any failures that look problematic -- in fact one failure is due to a baseline image still showing an old bug. I think the thing to do here is "reset" all of the baselines by updating all of them. I'll file a PR against this one shortly to do just that.

Owner

mdboom commented Jan 16, 2013

@mgiuca-google : github won't let me file a PR against your repo (???). Perhaps you could just manually merge mdboom/matplotlib:update_baseline_images.

Contributor

mgiuca-google commented Jan 17, 2013

Yes -- we definitely want a rebase, not a merge. The merge creates clutter in the history, and it makes it look like the old master is not the "trunk".

Well, for what it's worth, if you merge the branch into the master with git merge --no-ff, you don't get that bad history. The merge commit's first parent will be the previous commit to master, so that anyone doing a git log --first-parent will see only the trunk, and not the individual commits to the branch. Note: I'm still going to do the rebase, since you asked me to, but I still recommend you merge with --no-ff to avoid splatting my branch commits (at this point, dozens) into the trunk.

Thanks for going to the effort of resetting all of those images. I wasn't sure you'd want to do that, but I think it's the best outcome. I was able to manually merge it, but it doesn't seem directly relevant to my branch. Wouldn't it be better to cherry-pick fb68c58 into master (since it should not break with the existing comparison function)? Then this branch is just about fixing the comparison function, and not the images themselves.

Contributor

mgiuca-google commented Jan 17, 2013

Okay, I have done the rebase. Now all of my commits are applied to the current HEAD.

I am not sure whether I've done a "rebase" as you intended though. Did you just want my commits applied to HEAD, or did you actually want me to go back through the history and fix up the commits so that they are all in logical order and pristine? For example, removing the "Use int16 instead of int32 arrays," and just using int16 from the start. Also, should it be the case that the tests pass on all of the commits (so, don't commit a failing test case before fixing the code)? I'm just trying to get an idea of what style of branch you want to accept.

If you intend to do a merge --no-ff, then it shouldn't matter if the history is a bit buggy, as long as the final product is fine. If you intend to do a fast-forward merge, then all of the commits need to be sensible.

@pelson pelson and 1 other commented on an outdated diff Jan 17, 2013

lib/matplotlib/testing/compare.py
@@ -316,37 +300,25 @@ def compare_images( expected, actual, tol, in_decorator=False ):
# open the image files and remove the alpha channel (if it exists)
expectedImage = _png.read_png_int( expected )
actualImage = _png.read_png_int( actual )
+ expectedImage = expectedImage[:,:,:3]
+ actualImage = actualImage[:,:,:3]
@pelson

pelson Jan 17, 2013

Member

Hmmm. Is there a reason for not taking the alpha channel into account?

@mdboom

mdboom Jan 17, 2013

Owner

I think for the vast majority of our tests, it doesn't matter. But it's conceivable it might for one testing that the background of the figure is transparent for example. It probably saves some time, which is important when considering how long the tests currently take to run.

Note also that for PDF comparisons, Ghostscript never gives an alpha channel, so it's completely redundant there.

Maybe it should be a kwarg compare_alpha (defaulting to False)? I wouldn't want that to hold up this PR because of it.

Contributor

mgiuca-google commented Jan 18, 2013

Updating the PR (note: I didn't rebase again but I will after further discussion). Here are the remaining issues. Let me know if I've missed some:

  • PEP8 compliance. I've fixed the issue that was pointed out.
  • Removing the alpha channel. Interesting that the current code says "remove the alpha channel" but doesn't actually do so. I believe it used to do this, but it was lost in one of the refactorings over the past six months. I think that removing the alpha channel is correct at the moment, because none of the images have alpha and we want to make sure we can compare two images if one has alpha and the other does not. I'm happy to add a compare_alpha argument, but probably not in this PR.
  • Automatically generating some of the expected output images. I don't really want to generate the image algorithmically because a) that makes the tests non-deterministic (since it is a random scramble), and b) it involves lots of new code in the testing infrastructure with new ways to go wrong. The new images total 1.6 MB which is fairly hefty. Instead, I've replaced that rather large image with a tiny one that makes the same point, for a total of 4.3 kB of new images. Note that the old images are still in my commit history, but I'll rebase them away before we merge the PR.
  • The per-test tolerance settings. I decided to reset all of the custom tolerances since they are basically meaningless against the new algorithm (and some of them were huge). Obviously it's hard to choose an appropriate tolerance without having lots of computers to test it on, so maybe we can just set them low to begin with and creep them up as necessary. I've set the default tolerance to 10 out of 255, which allows for small changes in the kerning. For images with lots of text, the tolerance may need to be higher (for example, one images gives me an RMS error of 13.3 compared to Michael's updated baseline image). I'm getting huge RMS errors on the mathtext (presumably because Michael and I have different TeX renderers) -- in some cases up to 50 out of 255. I've set the tolerance to 50 for those tests, but this worries me, because with such a high tolerance, it won't detect a lot of problems with those images.
Member

pelson commented Jan 18, 2013

Obviously it's hard to choose an appropriate tolerance without having lots of computers to test it on, so maybe we can just set them low to begin with and creep them up as necessary.

I agree with that approach. I'm prepared to accept that some developers' (depending on machine/os) test suite will fail after merging this PR - its easy for us to iteratively increase tolerances as needed.

I've set the tolerance to 50 for those tests, but this worries me, because with such a high tolerance, it won't detect a lot of problems with those images.

Perhaps, in an ideal world, we would do well to be able to specify regions of different tolerances in the same image. But not here. 😄

Member

pelson commented Jan 18, 2013

for a total of 4.3 kB of new images

Excellent saving! Thanks for doing this.

@pelson pelson and 2 others commented on an outdated diff Jan 18, 2013

lib/matplotlib/testing/compare.py
@@ -299,8 +282,9 @@ def compare_images( expected, actual, tol, in_decorator=False ):
= INPUT VARIABLES
- expected The filename of the expected image.
- actual The filename of the actual image.
- - tol The tolerance (a unitless float). This is used to
- determine the 'fuzziness' to use when comparing images.
+ - tol The tolerance (a colour value difference, where 255 is the
@pelson

pelson Jan 18, 2013

Member

That's my kind of spelling - but is probably inconsistent with the rest of the docs/codebase. Would you mind dropping the "u" from colour? (It feels very alien asking you to do this...)

@NelleV

NelleV Jan 18, 2013

Contributor

@pelson in fact:

nvaroqua@u900-bdd-1-156t-6917:~/Projects/matplotlib$ git grep color | wc
   15543   81737 1965493
nvaroqua@u900-bdd-1-156t-6917:~/Projects/matplotlib$ git grep colour | wc
    59     396    4876
@dmcdougall

dmcdougall Jan 18, 2013

Member

That's probably my fault...

Contributor

mgiuca-google commented Jan 20, 2013

Changed. (Don't worry, I'm used to writing "color" to be consistent with the code around me. I'm usually less careful in comments, but fixed for consistency.)

Owner

mdboom commented Jan 23, 2013

Agreed about alpha channel -- best to ignore it for now.

The difference in the mathtext tests is most likely the version of freetype. None of our tests use the tex renderer -- they use the built-in mathtext renderer. It would be nice to find an algorithm that would ignore those differences.

Note that we switched to using non-antialiased text at one point to be more robust with the old image comparison algorithm. Perhaps switching back to antialiased text would actually be more robust with the new algorithm. @mguica-google: I'll send you my output for the mathtext tests with antialiasing turned back on, and then perhaps you can experiment comparing those to the antialiased tests generated by your platform.

Owner

mdboom commented Jan 23, 2013

@mgiuca-google: I've put antialiased text results on my antialiased_text_test branch. If you merge that into this and run on your local machine, we should know pretty quickly whether this new approach is more robust against differences in antialiased text or aliased text (as it is now).

Contributor

mgiuca-google commented Jan 24, 2013

@mdboom Good idea. I'd say that it is a net improvement, though not as good as I had hoped. It doesn't obviate the need to bump up the tolerances in a few places, but I was able to reduce the tolerance on mathtext from 50 down to 32, so I think it's a win.

Your antialiased_text_test branch has a few issues:

  • The test_arrow_patches baseline images are once again blank. I'm not sure what it is about your machine that is doing that. In any case, they don't have text, so it is safe to simply revert them.
  • The test_backend_svg baseline images are shifted slightly to the left, causing test failures. Again, no text, so safe to revert.

Maybe you should revert all of the pdf and svg files, since they shouldn't have changed anyway. Then I would say it is safe to go ahead and commit that (independent of this branch). After reverting the above files, it passes on my machine even with the old comparison algorithm. Oh, as of 6cf6cbb, there are some changes to the baseline images so you might need to generate them again!

Owner

mdboom commented Jan 25, 2013

The image comparison tests fail with the old algorithm and non-antialiased text on Travis-CI (which must be using yet another version of freetype) -- so I don't think I want to commit the antialiased images independent of this PR. What I think would be best is for me to fix up the problems as you noted (the test_arrow_patches and revert all of the non-PNG baselines), have you merge that into this branch, get Travis to pick it up and see how it does. I'll post another note here when I've had a chance to do that (and I might try to get to the bottom of the test_arrow_patches failure while I'm in there).

Owner

mdboom commented Jan 25, 2013

Ok -- I've updated my antialiased_text_branch (removing the old commit and replacing it with a new one that only updates PNG files).

Contributor

mgiuca-google commented Jan 30, 2013

OK sorry for the delay. I was away. Now I have merged your branch and it seems to pass on my machine. I've also merged from master and there were a few new baseline images, so I updated those to be antialiased as well. We'll see how Travis likes it =)

Owner

efiring commented Feb 19, 2013

I did a manual merge and tested, with the result that the usual stackplot failures are still there, but in addition exceptions are being generated by test_backend_pgf.test_xelatex, test_pdflatex, and test_rcupdate. It looks like these tests need to be updated, or a default tolerance provided by converting the third argument of compare_images() to a kwarg. The failure is the same on all, e.g.:

======================================================================
ERROR: matplotlib.tests.test_backend_pgf.test_xelatex
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/tests/test_backend_pgf.py", line 45, in backend_switcher
    result = func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/tests/test_backend_pgf.py", line 86, in test_xelatex
    compare_figure('pgf_xelatex.pdf')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/tests/test_backend_pgf.py", line 60, in compare_figure
    err = compare_images(expected, actual)
TypeError: compare_images() takes at least 3 arguments (2 given)
Owner

mdboom commented Feb 25, 2013

We at long last have the tests passing cleanly on Travis again. What's left to get this in at this point? This is a great piece of work, and it's high time to polish the details and get it merged. Thanks again.

Contributor

mgiuca-google commented Feb 25, 2013

Did efiring's issues get fixed? If so, is there another branch with that fix which I should be merging into mine?

Owner

efiring commented Feb 26, 2013

I think that what we need now is for you to rebase your branch onto the latest master, since @mdboom has all tests passing there. Then, if your rebased branch also passes all tests, it is probably ready to go. If not, it will at least be ready for tracking down any remaining problems.

Owner

mdboom commented Feb 26, 2013

I think we'll probably need a rebase plus a regeneration of the images with antialiased text (since I think we determined that works better with this approach). And then we cross our fingers and hope Travis likes it ;)

mgiuca-google and others added some commits Aug 28, 2012

Added new test suite, test_compare_images.
This tests the image comparison function itself. Currently, all three cases
fail, due to a buggy comparison algorithm.

In particular, test_image_compare_scrambled shows the algorithm massively
under-computing the error, and test_image_compare_shade_difference shows the
algorithm massively over-computing the error.
Remove the alpha channel from the expected and actual images.
(This regressed when the PIL code was migrated to numpy.)
testing/compare: Fix image comparison RMS calculation.
The previous implementation did not compute the RMS error. It computed the RMS
in the difference of the number of colour components of each value. While this
computes 0 for equal images, it is incorrect in general. In particular, it does
not detect differences in images with the same pixels in different places. It
also cannot distinguish small changes in the colour of a pixel from large ones.
Do not divide RMS by 10000 when testing against tolerance.
This was arbitrary and made no sense. Increased all tolerances by a factor of
10000. Note that some are ridiculously large (e.g., 200 out of 255).
testing/compare: Remove "retry ignoring pixels with differences of on…
…ly 1."

This was introduced in 1283feb, presumably to hack around the fact that 1-pixel
differences can make a very large error value. This is not necessary any more,
since the root cause has been fixed."
test_compare_images: Replace cosine_peak-nn test images with new base…
…line images

derived from basn3p02 in pngsuite tests.
These are much smaller images than the cosine tests.
tests: Removed existing custom tolerance values, since they are irrel…
…evant under the new algorithm.

Added a few new tolerance values, for output I am seeing that is valid but slightly different to the baseline image.
Contributor

mgiuca-google commented Feb 27, 2013

I have rebased from master and there was just one image that had changed in the meantime and needed to be regenerated (test_legend/legend_various_labels.png). I've regenerated that image with anti-aliased text and rolled that into mdboom's "Update all of the images to use antialiased text" commit. All tests seem to pass; we'll see what Travis says.

Owner

mdboom commented Feb 27, 2013

The Travis failure on 2.6 appears to be due to a network failure, not really any fault of ours. Ideally, we'd do something to push Travis to try again, but I'm reasonably confident that we are ok here, given that we have 2.7 and 3.2 working.

Owner

mdboom commented Feb 28, 2013

I'm just going to bite the bullet and merge this. I'm reasonably confident that the 2.6 test will pass once this is merged.

Thanks for all of this work -- I know this was a long lived PR for being so pervasive and fundamental to our workflow, but I think it represents a real improvement.

mdboom added a commit that referenced this pull request Feb 28, 2013

@mdboom mdboom merged commit f5d86b1 into matplotlib:master Feb 28, 2013

1 check failed

default The Travis build could not complete due to an error
Details
Contributor

mgiuca-google commented Feb 28, 2013

Sweet! Thanks for dealing with this Michael. It's a relief to have it done.

@mgiuca-google mgiuca-google deleted the mgiuca-google:fix-image-comparison branch Feb 28, 2013

Member

pelson commented Feb 28, 2013

Sweet!

Indeed. Very nice work @mgiuca-google - thanks for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment