Switch to content based comparison #854

stephenfin · 2020-04-19T13:24:21Z

Time to drop the JSON-based testing. MuPDF is a "lightweight PDF, XPS, and E-book viewer [that] consists of a software library, command line tools, and viewers for various platforms" [1]. The PyMuPDF library provides Python binding for MuPDF and gives us the ability to introspect documents [2]. Start using this to provide a content-based comparison of the PDFs, comparing reference PDFs with those generated by our test suite.

This adds one more dependency, but should result in far more reliable, deterministic test behavior across different environments. Fuzzy matching is used to abstract minor differences.

Note that there is some overlap between this PR and #853, since that was mostly created from patches that I had previously included in this series but decided were non-essential. Expect a merge conflict (which I can resolve) for whatever ends up being merge later.

Also note that I haven't actually deleted the rst2pdf/tests/md5 directory. This PR is already large enough and that can be done in a follow-up.

Just hope things are working as expected on my machine. Signed-off-by: Stephen Finucane <stephen@that.guru>

There are three issues to resolve here, one with an input file and two with reference PDFs. rst2pdf/tests/input/test_issue_175.txt This attempts to insert a large enough spacer to force a header to the bottom of the first page and the text onto the next page, in order to ensure the header will be moved to the next page to group it with its text. However, the spacer wasn't large enough and led to both the header and text being included on the first page. rst2pdf/tests/reference/test_aafigure.pdf It doesn't appear possible to format a backtick as a literal without including the escape character. The escape character wasn't included in the reference PDF. rst2pdf/tests/reference/test_issue_288.pdf This contained a different number of block elements on Python 3 than on Python 2. Update the PDF but disable the build until Python 2 is dropped. Signed-off-by: Stephen Finucane <stephen@that.guru>

Previously, we were assessing whether a PDF was correct by comparing a checksum of the file against a list of known valid checksums. When a different checksum was seen, which was quite often given the amount of moving pieces here, it was up to the user to determine whether there was a bug or simply a small but valid change in the output PDF. Let's simplify this. Instead of storing checksums, use the pymupdf library to compare the PDFs programmatically. We need to use a bit of fuzzing here, since blocks can get moved around and text rewrapped, but it should be mostly bullet proof otherwise. Signed-off-by: Stephen Finucane <stephen@that.guru>

Resolve issues with the Sphinx tests. Signed-off-by: Stephen Finucane <stephen@that.guru>

stephenfin · 2020-05-20T15:10:07Z

This is failing because there isn't a Python 2.7 wheel available for PyMuPDF at the moment. I've reported the issue (pymupdf/PyMuPDF#511) but it seems it's blocked on Travis. @akrabat @lornajane How close are we to dropping to Python 2.7 support? Perhaps I could simply drop testing here and side-step the whole issue.

lornajane · 2020-05-21T15:35:27Z

We can drop Python 2 support. Please do everything else you need to do to get this working for 3 and if that's the only reason the build is failing, we can review with that in mind and/or bite the bullet to remove support for 2.

At a bigger picture level (and I probably said this already), I'm not at all sold on doing only image comparisons since not all the tests are purely visual. But having the ability to have "unknown md5 but honestly this PDF looks fine to the machine" as a test output: incredibly valuable!!!! We had some imagemagick thing that did this before (it was a bit half baked). Image comparison is a very welcome addition to our testing setup but I don't think it replaces what we have.

Side question: how does the performance compare for running the test suites through the image comparison over the existing md5 process? (the answer might be: run it yourself and see I suppose!)

akrabat · 2020-05-22T07:50:02Z

#841 takes us to Python 3 only. I'll update PR 841 though as its dependencies probably need checking.

stephenfin · 2020-05-22T09:56:19Z

We can drop Python 2 support. Please do everything else you need to do to get this working for 3 and if that's the only reason the build is failing, we can review with that in mind and/or bite the bullet to remove support for 2.

Okay, I'll drop Python 2.7 tests from Travis and get this passing.

At a bigger picture level (and I probably said this already), I'm not at all sold on doing only image comparisons since not all the tests are purely visual. But having the ability to have "unknown md5 but honestly this PDF looks fine to the machine" as a test output: incredibly valuable!!!! We had some imagemagick thing that did this before (it was a bit half baked). Image comparison is a very welcome addition to our testing setup but I don't think it replaces what we have.

So, interestingly enough, this isn't actually doing image-based comparison at all (that was #849 which I've abandoned in favour of this approach). Instead, it's inspecting the generated PDF using the PyMuPDF library. If you look at the compare_pdfs function I've added, you'll see that I'm comparing each element on each page individually, albeit with some minor fuzzing because output from Reportlab seems to subtly change with different versions and on different environments.

Side question: how does the performance compare for running the test suites through the image comparison over the existing md5 process? (the answer might be: run it yourself and see I suppose!)

A run before this change yields the following:

$ time pytest
========================================== test session starts ==========================================                                                                                                          
platform linux -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1                                                                                                                                              
rootdir: /home/stephenfin/Development/innovation/rst2pdf                                                                                                                                                           
collected 308 items

...

======================================== short test summary info ========================================                                                                                                          
FAILED rst2pdf/tests/input/source.txt::source                                                                                                                                                                      
FAILED rst2pdf/tests/input/test_box_widths.txt::test_box_widths                                                                                                                                                    
FAILED rst2pdf/tests/input/test_fancytitles.txt::test_fancytitles - AssertionError: File test_fancytit...                                                                                                          
FAILED rst2pdf/tests/input/test_inkscape.txt::test_inkscape - AssertionError: File test_inkscape.pdf n...                                                                                                          
FAILED rst2pdf/tests/input/test_issue_239_2.txt::test_issue_239_2 - AssertionError: File test_issue_23...                                                                                                          
FAILED rst2pdf/tests/input/test_issue_310_2.txt::test_issue_310_2                                                                                                                                                  
FAILED rst2pdf/tests/input/test_issue_311.txt::test_issue_311                                                                                                                                                      
FAILED rst2pdf/tests/input/test_issue_473.txt::test_issue_473                                                                                                                                                      
FAILED rst2pdf/tests/input/test_issue_83.txt::test_issue_83                                                                                                                                                        
FAILED rst2pdf/tests/input/test_repeat-table-rows.txt::test_repeat-table-rows                                                                                                                                      
FAILED rst2pdf/tests/input/test_slides.txt::test_slides                                                                                                                                                            
FAILED rst2pdf/tests/input/sphinx-issue172/conf.py::sphinx-issue172                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue252/conf.py::sphinx-issue252                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue318/conf.py::sphinx-issue318                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue319/conf.py::sphinx-issue319                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue320/conf.py::sphinx-issue320                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-repeat-table-rows/conf.py::sphinx-repeat-table-rows                                                                                                                              
======================== 17 failed, 265 passed, 26 skipped in 229.76s (0:03:49) =========================                                                                                                          
                                                                                                                                                                                                                   
real    3m50.167s                                                                                                                                                                                                  
user    4m6.070s                                                                                                                                                                                                   
sys     0m54.214s

After this change, I see:

$ time pytest
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/stephenfin/Development/innovation/rst2pdf
collected 308 items

...

============================================================================================= short test summary info =============================================================================================
FAILED rst2pdf/tests/input/test_fancytitles.txt::test_fancytitles - AssertionError: File test_fancytitles.pdf not generated
FAILED rst2pdf/tests/input/test_inkscape.txt::test_inkscape - AssertionError: File test_inkscape.pdf not generated
FAILED rst2pdf/tests/input/test_issue_239_2.txt::test_issue_239_2 - AssertionError: File test_issue_239_2.pdf not generated
FAILED rst2pdf/tests/input/sphinx-issue172/conf.py::sphinx-issue172
FAILED rst2pdf/tests/input/sphinx-issue252/conf.py::sphinx-issue252 - Failed: Call failed with -6:
FAILED rst2pdf/tests/input/sphinx-issue318/conf.py::sphinx-issue318
FAILED rst2pdf/tests/input/sphinx-issue319/conf.py::sphinx-issue319
FAILED rst2pdf/tests/input/sphinx-issue320/conf.py::sphinx-issue320
FAILED rst2pdf/tests/input/sphinx-repeat-table-rows/conf.py::sphinx-repeat-table-rows
============================================================================== 9 failed, 272 passed, 27 skipped in 261.33s (0:04:21) ==============================================================================

real    4m21.616s
user    4m22.740s
sys     0m52.636s

So that's roughly a 14% increase in runtime (albeit with one additional skipped test). I think that's acceptable given this is more deterministic.

Also, yes, this is still not passing completely on my machine but I'm working on it. It's passing in the gate which is key for now.

stephenfin · 2020-05-22T10:51:13Z

Turns out I wasn't constraining my versions, which was resulting in a number of those failures. I ran again with the requirements from requirements.txt and got the following:

$ time pytest
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/stephenfin/Development/innovation/rst2pdf
collected 308 items

...

============================================================================================= short test summary info =============================================================================================
FAILED rst2pdf/tests/input/test_fancytitles.txt::test_fancytitles - AssertionError: File test_fancytitles.pdf not generated
FAILED rst2pdf/tests/input/test_inkscape.txt::test_inkscape - AssertionError: File test_inkscape.pdf not generated
FAILED rst2pdf/tests/input/test_issue_239_2.txt::test_issue_239_2 - AssertionError: File test_issue_239_2.pdf not generated
FAILED rst2pdf/tests/input/sphinx-issue252/conf.py::sphinx-issue252 - Failed: Call failed with -6:
============================================================================== 4 failed, 277 passed, 27 skipped in 288.35s (0:04:48) ==============================================================================

real    4m48.676s
user    4m47.132s
sys     0m52.288s

This is very variable though. Yet another run gave me a run time of 265.66s (0:04:25) so I wouldn't put a whole lot of weight in it.

lornajane · 2020-05-23T18:29:49Z

These numbers are reliable enough to answer to my original concerns. I was expecting a performance hit but I had no idea if it was going to make the tests take ten times as long to run or what. Less than double seems like a perfectly good ball park figure and I am reassured!

It seems there are issues with the Python 2.7 wheels for PyMuPDF [1]. We could wait for a fix, but given the next version of rst2pdf is expected to be Python 3 only it's as easy to just drop testing for this now. [1] pymupdf/PyMuPDF#511 Signed-off-by: Stephen Finucane <stephen@that.guru>

stephenfin · 2020-06-02T13:31:28Z

Bump 😄 I'm available in case there's anything anything more you need from me

akrabat · 2020-06-03T16:30:25Z

This works on my Mac and seems logical. Do we know if it's more resistant to changes in ReportLab versions?

Also, presumably we no longer need tests/md5?

stephenfin · 2020-06-03T19:56:31Z

It should be, yes. I'm planning to extend this further to do things like combining/splitting blocks due to ReportLab differences (seen locally on newer versions) [*], but I figured that wasn't necessary for a first pass. The tests/md5 directory was left because I wanted to avoid merge conflicts while this was discussed and can be removed as soon as this merges.

[*] In short, consider the case where there are two blocks in PDF A - blocks A1 and A2 - and one block in PDF B - block B. If coordinates x0 and y0 of block A1 equals x0 and y0 of block B, and coordinates x1 and y1 of block A2 equals x1 and y1 of block B, then we can say the combination of blocks A1 and A2 == block B. Again, I'll explain this better in a future PR.

akrabat · 2020-06-04T08:02:46Z

That makes sense. The interesting bit is when we can avoid invalidating all the tests because a minor release of ReportLab reorganised the internals of the PDF file.

We do need to ensure that the tests pass on multiple ReportLabs/ (or whatever other dependencies change the PDF internals) though as we have that feature currently with the multiple hashes.

akrabat

👍

stephenfin · 2020-06-05T10:39:50Z

🙌

stephenfin mentioned this pull request Apr 19, 2020

Integrate tox #790

Merged

stephenfin force-pushed the content-based-comparison branch from 9bc5e9b to a2776ee Compare April 26, 2020 15:53

stephenfin added 4 commits May 20, 2020 11:47

Add missing reference PDFs

238d739

Just hope things are working as expected on my machine. Signed-off-by: Stephen Finucane <stephen@that.guru>

Resolve issues with various Sphinx tests

4a795de

Resolve issues with the Sphinx tests. Signed-off-by: Stephen Finucane <stephen@that.guru>

stephenfin force-pushed the content-based-comparison branch from a2776ee to 4a795de Compare May 20, 2020 11:17

stephenfin mentioned this pull request May 20, 2020

Missing py27 Linux wheel for 1.17.0 release pymupdf/PyMuPDF#511

Closed

stephenfin force-pushed the content-based-comparison branch from 46b1dfa to 4a795de Compare May 30, 2020 13:32

akrabat added this to the 1.0 milestone Jun 4, 2020

akrabat approved these changes Jun 4, 2020

View reviewed changes

Update CHANGES for rst2pdf#854

e2debb4

akrabat closed this in 1ba0875 Jun 4, 2020

akrabat merged commit 1ba0875 into rst2pdf:master Jun 4, 2020

akrabat mentioned this pull request Jun 5, 2020

Support Python 3 only #841

Merged

stephenfin deleted the content-based-comparison branch June 5, 2020 10:39

stephenfin mentioned this pull request Jun 6, 2020

Failing test on local environment #795

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to content based comparison #854

Switch to content based comparison #854

stephenfin commented Apr 19, 2020 •

edited

stephenfin commented May 20, 2020

lornajane commented May 21, 2020

akrabat commented May 22, 2020

stephenfin commented May 22, 2020

stephenfin commented May 22, 2020

lornajane commented May 23, 2020

stephenfin commented Jun 2, 2020

akrabat commented Jun 3, 2020

stephenfin commented Jun 3, 2020 •

edited

akrabat commented Jun 4, 2020

akrabat left a comment

stephenfin commented Jun 5, 2020

Switch to content based comparison #854

Switch to content based comparison #854

Conversation

stephenfin commented Apr 19, 2020 • edited

stephenfin commented May 20, 2020

lornajane commented May 21, 2020

akrabat commented May 22, 2020

stephenfin commented May 22, 2020

stephenfin commented May 22, 2020

lornajane commented May 23, 2020

stephenfin commented Jun 2, 2020

akrabat commented Jun 3, 2020

stephenfin commented Jun 3, 2020 • edited

akrabat commented Jun 4, 2020

akrabat left a comment

Choose a reason for hiding this comment

stephenfin commented Jun 5, 2020

stephenfin commented Apr 19, 2020 •

edited

stephenfin commented Jun 3, 2020 •

edited