Update regression tests #122

alerque · 2015-06-30T17:12:20Z

So this includes a whole bunch of changes to get the test suite gearing running a bit smoother. Here's the basic idea.

Enable --regression testing for the build tests in Travis. If changes to the core change output of existing documents, there should be a reason for it. This will mean updating the expected outputs a lot more as things change, but that extra review work should be done. I have some tooling I've been using locally for updating tests that I'll try to adapt for broader use and submit later.
Update tests that have fallen behind. This includes a number of them that just didn't have the current debug-output format or that the default margins for the class or the way the folio was arranged changed for one reason or another.
Add tests that were just plain missing.
Reformat tests to A5 where possible. The smaller (and consistent) size is much easier to test. Besides the debug output I've been using visual PDF comparison tools (including one I wrote a while back for another project) that make it easier to scan for problems. When doing this on a whole bunch of tests (and across a whole bunch of commits with git-bisect!) its much easier if the size is small and consistent. Note I didn't change them all yet because it wasn't apparent to me how some of them might be affected by a layout change.
Strip non-essential parts of the tests out. With the idea that each test should be checking for exactly one problem, I tried to eliminate as many extra factors and code as possible.
Use only freely available fonts. So far these changes are only in tests/*, but I would like to be able to compile anything coming out of the core repository without needing proprietary fonts on my system. None of the items in the test suite seem to depend on the licensed fonts so I've replaced them with reasonable free ones that can be installed in the Travis environment as well as by anyone wanting them. The only exception to this is the OSX specific font issue. As this font format is only used on OSX and it's a default system font I left that one alone and marked the test to only run on Darwin (which means it won't be part of Travis's tests for now).

Most of these are fully tested in that I actually reproduced the errors running old versions of SILE and made sure the expected output was actually the desired thing. There are a couple exceptions (marked "fail state untested") when I couldn't figure out the fail state (or it looked like I didn't have a working state).

Most of the commits have been isolated to individual tests. I squashed a few of them that were closely related, but if problems ever come up with the tests I wanted git blame and git revert to be useful so they are mostly test-specific.

Adobe Naskh is a lot prettier, esp with mixed Arabic RTL text and LTR numerals (which seems to be the point of this test), but it's non-free and hard to come by. The Noto series on the other hand is easier to install on Travis for the test suite. I have a copy of both compared the output to see that the it was laying out the same other than the different font metrics.

The original Minion Pro font for the chapter break might have not shown this problem, but on systems without that font the fallback might have a different X height. On my system this test would wrap pages, which introduced a new problem...that may need a test of it's own, but in the spirit of testing one thing at a time this reduces the total content to be on one page even if the selected font isn't around.

In the spirit of testing one thing at a time I pared this test down to just the code that is essential for the test, notably of typesetting mixed LTL/RTL frames and the same Unicode in different Arabic scripts. I also added the expected output as a benchmark.

…espace eating

… LTR)

All trim out now obsolete traling comments to suppress whitespace.

This was generated on an OSX machine with the TTC font in question, so it should be the _correct_ output. Note I was unable to duplicate the fail state on this because the TTC loader only wants to work for me. Also note the test that checks this needs to be skiped if not on OSX.

Although it works in parallel, Travis launches jobs from the top down in the matrix. During busy times and if availability is low this can mean the early ones on the list finish some time before the others. Since I'm most interested in seeing the tests run in the latest versions and can wait to hear about regression issues later, sort the matrix in reverse order with the newest versions on top.

Newer systems can use ~/.local/share/fonts but in spite of the documentation Ubuntu 12.04 as used by Travis CI does not actually look there by default. It does check /home/<usernome>/.fonts so...

The Ubuntu 12.04 LTS repos have very very old versions of the SIL fonts (e.g. 1.001 from 2009 instead of 2.020 from 2014) and the metrics have gotten changed quite a bit, especially for the Arabic Scheherazade font, but also lesser so for some of the others. This makes the regression tests hard to run because it's not the font version any developer is likely running on their system. Rather than benchmark the old ones, this will just install the newest available versions into the Travis test environment. (Note mkdir is not necessary for the manual font install since unzip will create up to 1 directory on extract.)

Whatever is wrong with the toolchain that generated the last one (see issue sile-typesetter#124) it's definitely broken (the diacritics are positioned all wrong) and this one is—at the very least—less broken.

This is a package from Trusty because the one in Precise repos is way out of date and has some bad metrics, ergo killing some of the regression tests that depend on it. Since there isn't much in the package but the font the deb file installs fine without much help.

The Travis environment is really hard (and slow) to debug, so it's more useful to get data about all the regression tests at once rather than waiting to find out others failed after fixing one. This will * Still fail immediately for any failure in the normal compile checks * Keep processing regression checks after a failure, but output the right exit code in the end. * Not be quite as verbose. The actual/expected data is in the file system anyway and the diff is far more useful for debuging from Travis logs than the full dumps which are hard to even scroll past.

Not that we want to go breaking examples and such, but these will need a different testing system than the ones we run using attack.pl triggered by the existence of .expected files. Here we actually have the PDF's in the repository anyway. If there _are_ regression tests that need to be run on the content of these examples it should be moved into tests/.

The logic of checking whether all the stuff compiles vs whether it exactly matches the reference implementation was getting too convoluted. This separates the concerns so that only documentation and examples get compile checks while the tests get compile and regression checks (depending of the mode of the flag at run time). Travis for example runs both modes, but separately so what failed is more apparent than if they were lumped in together.

Having a callout to a debug function is great in local code but it shouldn't be in the committed code unless it is behind a switch of some sort. This particular instance breaks the way the test suite works because of the extra un-expected output.

alerque · 2015-07-02T06:05:45Z

Oley oley! Happy dance—I see green! This should be ready for merge. Finally.

But @simoncozens, you might say, “It looks red to me?!” And you'd be right. But if you look closer you'll find it's actually supposed to be red because there actually is a regression issue (just reported as #125) between Lua versions. Lua 5.3 handles float math differently than 5.3 and SILE.outputters.debug needs to be fixed somehow so that the output of the debugger is deterministic. The test suite itself is, for the first time, actually testing for regressions and expected to succeed. As soon as the math bug is fixed it should show green across the whole matrix.

As fixing the actual existing regression is something for the master branch rather than than the test code I'll leave that to be fixed in master or a different PR. This set of changes is, to the best of my knowledge, doing what it should be doing and merging it to master makes sense even with the red x because the issue it found actually exists in master.

Update regression tests

simoncozens · 2015-07-02T13:26:03Z

This is incredibly cool (and pretty scary). Thank you very much for all this work. I came up with the idea of regression tests when I was mucking about with the linebreaking algorithm - that is some very scary code and I didn't want to disturb it unintentionally - but I think this will also help us catch plenty of other problems along the way. (Or we could be drowning in a sea of red crosses for the near future. I don't know. Let's find out. :-) )

alerque · 2015-07-02T13:45:49Z

The sea of red crosses is a definite possibility. I think it can be manged however. I have some shell scripts I wrote while setting this up that might help. One of them for example runs through regression tests, catches ones that fail, generates overlay images for visual review, pops up the diff and the overlay image and asks if it's okay, then regenerates the expected output if it gets an okay.

The debug output is pretty cool actually and I think as long as we pay attention to keeping it deterministic and have the right tooling on hand it shouldn't be too bad to manage. In any event it will be easier if we start now rather than trying to add this on farther down the road.

alerque added 17 commits June 30, 2015 09:36

update git ignore rules to better dodge test artifacts

f9ec2b0

cleanup trailing whitespace

e0fa1b2

use env <interpreter> in shebang linse for portability

db7ae8c

organise and comment Travis config

952d06d

use newest luarocks in Travis test suite

e8bb46e

add typesetting regression tests to Travis recipie

7268543

substitute for non-free font that isn't essential to test case

eb5b2fc

manually install Noto fonts used in tests to Travis test rigs

82cadce

strip down test content and use lorem package

b55e0fc

add missing benchmark for Kannada script test

ba65de4

normalize tests on A5 were not important to the test

1e6a7f8

strip unneccessary markup from tests

2f3e29c

normalize tests on A5 were not important to the test

5e31386

add missing test benchmark an normalise on A5

deefa44

alerque mentioned this pull request Jul 1, 2015

OSX can't break up words in Arabic #123

Closed

alerque force-pushed the feature/test-regressions branch from 9d42b4d to 61cfa1c Compare July 1, 2015 07:30

alerque mentioned this pull request Jul 1, 2015

Harfbuzz with Graphite is messing up Arabic #124

Closed

alerque force-pushed the feature/test-regressions branch from e1d68b5 to 1ab863f Compare July 1, 2015 18:02

alerque added 9 commits July 2, 2015 08:31

cleanup and simplify test case with lorem package

6aa0e3a

add missing test benchmark an normalise on A5

df77121

clean up test case and add missing benchmark

a0d74bf

clean up test case and add missing benchmark

889be39

remove duplicate test, see tests/bug-39.sil

8c48448

cleanup test, normalise to A5 and add text to avoid also testing whit…

fd70d8d

…espace eating

update expected output of tests (folio centering changed and explicit…

3f4503f

… LTR)

normalise tests on A5 with min Working features

f255c89

All trim out now obsolete traling comments to suppress whitespace.

add missing test benchmarks (fail state untested)

863cbd2

alerque added 12 commits July 2, 2015 08:31

add flow control for OS specific regression tests

41992e6

fix local font path for Ubuntu 12.04 (Travis)

cf50271

Newer systems can use ~/.local/share/fonts but in spite of the documentation Ubuntu 12.04 as used by Travis CI does not actually look there by default. It does check /home/<usernome>/.fonts so...

add correct reference for Arabic test

4914c30

Whatever is wrong with the toolchain that generated the last one (see issue sile-typesetter#124) it's definitely broken (the diacritics are positioned all wrong) and this one is—at the very least—less broken.

use the latest version of harfbuzz for test environment

2f033ec

remove debuging code

6893011

Having a callout to a debug function is great in local code but it shouldn't be in the committed code unless it is behind a switch of some sort. This particular instance breaks the way the test suite works because of the extra un-expected output.

alerque force-pushed the feature/test-regressions branch from 1ab863f to 6893011 Compare July 2, 2015 05:45

This was referenced Jul 2, 2015

Float math in debug output different between Lua versions #125

Closed

weird issues on OS X #68

Closed

simoncozens added a commit that referenced this pull request Jul 2, 2015

Merge pull request #122 from alerque/feature/test-regressions

5b671c7

Update regression tests

simoncozens merged commit 5b671c7 into sile-typesetter:master Jul 2, 2015

alerque deleted the feature/test-regressions branch July 2, 2015 16:05

alerque mentioned this pull request Dec 4, 2015

Bigger infinity causes libtexpdf crash #227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update regression tests #122

Update regression tests #122

alerque commented Jun 30, 2015

alerque commented Jul 2, 2015

simoncozens commented Jul 2, 2015

alerque commented Jul 2, 2015

Update regression tests #122

Update regression tests #122

Conversation

alerque commented Jun 30, 2015

alerque commented Jul 2, 2015

simoncozens commented Jul 2, 2015

alerque commented Jul 2, 2015