Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to content based comparison #854

Merged
merged 6 commits into from Jun 4, 2020

Conversation

stephenfin
Copy link
Contributor

@stephenfin stephenfin commented Apr 19, 2020

Time to drop the JSON-based testing. MuPDF is a "lightweight PDF, XPS, and E-book viewer [that] consists of a software library, command line tools, and viewers for various platforms" [1]. The PyMuPDF library provides Python binding for MuPDF and gives us the ability to introspect documents [2]. Start using this to provide a content-based comparison of the PDFs, comparing reference PDFs with those generated by our test suite.

This adds one more dependency, but should result in far more reliable, deterministic test behavior across different environments. Fuzzy matching is used to abstract minor differences.

Note that there is some overlap between this PR and #853, since that was mostly created from patches that I had previously included in this series but decided were non-essential. Expect a merge conflict (which I can resolve) for whatever ends up being merge later.

Also note that I haven't actually deleted the rst2pdf/tests/md5 directory. This PR is already large enough and that can be done in a follow-up.

@stephenfin stephenfin mentioned this pull request Apr 19, 2020
Just hope things are working as expected on my machine.

Signed-off-by: Stephen Finucane <stephen@that.guru>
There are three issues to resolve here, one with an input file and two
with reference PDFs.

rst2pdf/tests/input/test_issue_175.txt
  This attempts to insert a large enough spacer to force a header to the
  bottom of the first page and the text onto the next page, in order to
  ensure the header will be moved to the next page to group it with its
  text. However, the spacer wasn't large enough and led to both the
  header and text being included on the first page.

rst2pdf/tests/reference/test_aafigure.pdf
  It doesn't appear possible to format a backtick as a literal without
  including the escape character. The escape character wasn't included
  in the reference PDF.

rst2pdf/tests/reference/test_issue_288.pdf
  This contained a different number of block elements on Python 3 than
  on Python 2. Update the PDF but disable the build until Python 2 is
  dropped.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Previously, we were assessing whether a PDF was correct by comparing a
checksum of the file against a list of known valid checksums. When a
different checksum was seen, which was quite often given the amount of
moving pieces here, it was up to the user to determine whether there was
a bug or simply a small but valid change in the output PDF. Let's
simplify this. Instead of storing checksums, use the pymupdf library to
compare the PDFs programmatically. We need to use a bit of fuzzing here,
since blocks can get moved around and text rewrapped, but it should be
mostly bullet proof otherwise.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Resolve issues with the Sphinx tests.

Signed-off-by: Stephen Finucane <stephen@that.guru>
@stephenfin
Copy link
Contributor Author

This is failing because there isn't a Python 2.7 wheel available for PyMuPDF at the moment. I've reported the issue (pymupdf/PyMuPDF#511) but it seems it's blocked on Travis. @akrabat @lornajane How close are we to dropping to Python 2.7 support? Perhaps I could simply drop testing here and side-step the whole issue.

@lornajane
Copy link
Contributor

We can drop Python 2 support. Please do everything else you need to do to get this working for 3 and if that's the only reason the build is failing, we can review with that in mind and/or bite the bullet to remove support for 2.

At a bigger picture level (and I probably said this already), I'm not at all sold on doing only image comparisons since not all the tests are purely visual. But having the ability to have "unknown md5 but honestly this PDF looks fine to the machine" as a test output: incredibly valuable!!!! We had some imagemagick thing that did this before (it was a bit half baked). Image comparison is a very welcome addition to our testing setup but I don't think it replaces what we have.

Side question: how does the performance compare for running the test suites through the image comparison over the existing md5 process? (the answer might be: run it yourself and see I suppose!)

@akrabat
Copy link
Member

akrabat commented May 22, 2020

#841 takes us to Python 3 only. I'll update PR 841 though as its dependencies probably need checking.

@stephenfin
Copy link
Contributor Author

We can drop Python 2 support. Please do everything else you need to do to get this working for 3 and if that's the only reason the build is failing, we can review with that in mind and/or bite the bullet to remove support for 2.

Okay, I'll drop Python 2.7 tests from Travis and get this passing.

At a bigger picture level (and I probably said this already), I'm not at all sold on doing only image comparisons since not all the tests are purely visual. But having the ability to have "unknown md5 but honestly this PDF looks fine to the machine" as a test output: incredibly valuable!!!! We had some imagemagick thing that did this before (it was a bit half baked). Image comparison is a very welcome addition to our testing setup but I don't think it replaces what we have.

So, interestingly enough, this isn't actually doing image-based comparison at all (that was #849 which I've abandoned in favour of this approach). Instead, it's inspecting the generated PDF using the PyMuPDF library. If you look at the compare_pdfs function I've added, you'll see that I'm comparing each element on each page individually, albeit with some minor fuzzing because output from Reportlab seems to subtly change with different versions and on different environments.

Side question: how does the performance compare for running the test suites through the image comparison over the existing md5 process? (the answer might be: run it yourself and see I suppose!)

A run before this change yields the following:

$ time pytest
========================================== test session starts ==========================================                                                                                                          
platform linux -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1                                                                                                                                              
rootdir: /home/stephenfin/Development/innovation/rst2pdf                                                                                                                                                           
collected 308 items

...

======================================== short test summary info ========================================                                                                                                          
FAILED rst2pdf/tests/input/source.txt::source                                                                                                                                                                      
FAILED rst2pdf/tests/input/test_box_widths.txt::test_box_widths                                                                                                                                                    
FAILED rst2pdf/tests/input/test_fancytitles.txt::test_fancytitles - AssertionError: File test_fancytit...                                                                                                          
FAILED rst2pdf/tests/input/test_inkscape.txt::test_inkscape - AssertionError: File test_inkscape.pdf n...                                                                                                          
FAILED rst2pdf/tests/input/test_issue_239_2.txt::test_issue_239_2 - AssertionError: File test_issue_23...                                                                                                          
FAILED rst2pdf/tests/input/test_issue_310_2.txt::test_issue_310_2                                                                                                                                                  
FAILED rst2pdf/tests/input/test_issue_311.txt::test_issue_311                                                                                                                                                      
FAILED rst2pdf/tests/input/test_issue_473.txt::test_issue_473                                                                                                                                                      
FAILED rst2pdf/tests/input/test_issue_83.txt::test_issue_83                                                                                                                                                        
FAILED rst2pdf/tests/input/test_repeat-table-rows.txt::test_repeat-table-rows                                                                                                                                      
FAILED rst2pdf/tests/input/test_slides.txt::test_slides                                                                                                                                                            
FAILED rst2pdf/tests/input/sphinx-issue172/conf.py::sphinx-issue172                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue252/conf.py::sphinx-issue252                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue318/conf.py::sphinx-issue318                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue319/conf.py::sphinx-issue319                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-issue320/conf.py::sphinx-issue320                                                                                                                                                
FAILED rst2pdf/tests/input/sphinx-repeat-table-rows/conf.py::sphinx-repeat-table-rows                                                                                                                              
======================== 17 failed, 265 passed, 26 skipped in 229.76s (0:03:49) =========================                                                                                                          
                                                                                                                                                                                                                   
real    3m50.167s                                                                                                                                                                                                  
user    4m6.070s                                                                                                                                                                                                   
sys     0m54.214s

After this change, I see:

$ time pytest
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/stephenfin/Development/innovation/rst2pdf
collected 308 items

...

============================================================================================= short test summary info =============================================================================================
FAILED rst2pdf/tests/input/test_fancytitles.txt::test_fancytitles - AssertionError: File test_fancytitles.pdf not generated
FAILED rst2pdf/tests/input/test_inkscape.txt::test_inkscape - AssertionError: File test_inkscape.pdf not generated
FAILED rst2pdf/tests/input/test_issue_239_2.txt::test_issue_239_2 - AssertionError: File test_issue_239_2.pdf not generated
FAILED rst2pdf/tests/input/sphinx-issue172/conf.py::sphinx-issue172
FAILED rst2pdf/tests/input/sphinx-issue252/conf.py::sphinx-issue252 - Failed: Call failed with -6:
FAILED rst2pdf/tests/input/sphinx-issue318/conf.py::sphinx-issue318
FAILED rst2pdf/tests/input/sphinx-issue319/conf.py::sphinx-issue319
FAILED rst2pdf/tests/input/sphinx-issue320/conf.py::sphinx-issue320
FAILED rst2pdf/tests/input/sphinx-repeat-table-rows/conf.py::sphinx-repeat-table-rows
============================================================================== 9 failed, 272 passed, 27 skipped in 261.33s (0:04:21) ==============================================================================

real    4m21.616s
user    4m22.740s
sys     0m52.636s

So that's roughly a 14% increase in runtime (albeit with one additional skipped test). I think that's acceptable given this is more deterministic.

Also, yes, this is still not passing completely on my machine but I'm working on it. It's passing in the gate which is key for now.

@stephenfin
Copy link
Contributor Author

Turns out I wasn't constraining my versions, which was resulting in a number of those failures. I ran again with the requirements from requirements.txt and got the following:

$ time pytest
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/stephenfin/Development/innovation/rst2pdf
collected 308 items

...

============================================================================================= short test summary info =============================================================================================
FAILED rst2pdf/tests/input/test_fancytitles.txt::test_fancytitles - AssertionError: File test_fancytitles.pdf not generated
FAILED rst2pdf/tests/input/test_inkscape.txt::test_inkscape - AssertionError: File test_inkscape.pdf not generated
FAILED rst2pdf/tests/input/test_issue_239_2.txt::test_issue_239_2 - AssertionError: File test_issue_239_2.pdf not generated
FAILED rst2pdf/tests/input/sphinx-issue252/conf.py::sphinx-issue252 - Failed: Call failed with -6:
============================================================================== 4 failed, 277 passed, 27 skipped in 288.35s (0:04:48) ==============================================================================

real    4m48.676s
user    4m47.132s
sys     0m52.288s

This is very variable though. Yet another run gave me a run time of 265.66s (0:04:25) so I wouldn't put a whole lot of weight in it.

@lornajane
Copy link
Contributor

These numbers are reliable enough to answer to my original concerns. I was expecting a performance hit but I had no idea if it was going to make the tests take ten times as long to run or what. Less than double seems like a perfectly good ball park figure and I am reassured!

It seems there are issues with the Python 2.7 wheels for PyMuPDF [1]. We
could wait for a fix, but given the next version of rst2pdf is expected
to be Python 3 only it's as easy to just drop testing for this now.

[1] pymupdf/PyMuPDF#511

Signed-off-by: Stephen Finucane <stephen@that.guru>
@stephenfin
Copy link
Contributor Author

Bump 😄 I'm available in case there's anything anything more you need from me

@akrabat
Copy link
Member

akrabat commented Jun 3, 2020

This works on my Mac and seems logical. Do we know if it's more resistant to changes in ReportLab versions?

Also, presumably we no longer need tests/md5?

@stephenfin
Copy link
Contributor Author

stephenfin commented Jun 3, 2020

It should be, yes. I'm planning to extend this further to do things like combining/splitting blocks due to ReportLab differences (seen locally on newer versions) [*], but I figured that wasn't necessary for a first pass. The tests/md5 directory was left because I wanted to avoid merge conflicts while this was discussed and can be removed as soon as this merges.

[*] In short, consider the case where there are two blocks in PDF A - blocks A1 and A2 - and one block in PDF B - block B. If coordinates x0 and y0 of block A1 equals x0 and y0 of block B, and coordinates x1 and y1 of block A2 equals x1 and y1 of block B, then we can say the combination of blocks A1 and A2 == block B. Again, I'll explain this better in a future PR.

@akrabat
Copy link
Member

akrabat commented Jun 4, 2020

That makes sense. The interesting bit is when we can avoid invalidating all the tests because a minor release of ReportLab reorganised the internals of the PDF file.

We do need to ensure that the tests pass on multiple ReportLabs/ (or whatever other dependencies change the PDF internals) though as we have that feature currently with the multiple hashes.

@akrabat akrabat added this to the 1.0 milestone Jun 4, 2020
Copy link
Member

@akrabat akrabat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@akrabat akrabat closed this in 1ba0875 Jun 4, 2020
@akrabat akrabat merged commit 1ba0875 into rst2pdf:master Jun 4, 2020
@akrabat akrabat mentioned this pull request Jun 5, 2020
@stephenfin stephenfin deleted the content-based-comparison branch June 5, 2020 10:39
@stephenfin
Copy link
Contributor Author

🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants