Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update regression tests #122

Merged
merged 38 commits into from
Jul 2, 2015

Conversation

alerque
Copy link
Member

@alerque alerque commented Jun 30, 2015

So this includes a whole bunch of changes to get the test suite gearing running a bit smoother. Here's the basic idea.

  • Enable --regression testing for the build tests in Travis. If changes to the core change output of existing documents, there should be a reason for it. This will mean updating the expected outputs a lot more as things change, but that extra review work should be done. I have some tooling I've been using locally for updating tests that I'll try to adapt for broader use and submit later.
  • Update tests that have fallen behind. This includes a number of them that just didn't have the current debug-output format or that the default margins for the class or the way the folio was arranged changed for one reason or another.
  • Add tests that were just plain missing.
  • Reformat tests to A5 where possible. The smaller (and consistent) size is much easier to test. Besides the debug output I've been using visual PDF comparison tools (including one I wrote a while back for another project) that make it easier to scan for problems. When doing this on a whole bunch of tests (and across a whole bunch of commits with git-bisect!) its much easier if the size is small and consistent. Note I didn't change them all yet because it wasn't apparent to me how some of them might be affected by a layout change.
  • Strip non-essential parts of the tests out. With the idea that each test should be checking for exactly one problem, I tried to eliminate as many extra factors and code as possible.
  • Use only freely available fonts. So far these changes are only in tests/*, but I would like to be able to compile anything coming out of the core repository without needing proprietary fonts on my system. None of the items in the test suite seem to depend on the licensed fonts so I've replaced them with reasonable free ones that can be installed in the Travis environment as well as by anyone wanting them. The only exception to this is the OSX specific font issue. As this font format is only used on OSX and it's a default system font I left that one alone and marked the test to only run on Darwin (which means it won't be part of Travis's tests for now).

Most of these are fully tested in that I actually reproduced the errors running old versions of SILE and made sure the expected output was actually the desired thing. There are a couple exceptions (marked "fail state untested") when I couldn't figure out the fail state (or it looked like I didn't have a working state).

Most of the commits have been isolated to individual tests. I squashed a few of them that were closely related, but if problems ever come up with the tests I wanted git blame and git revert to be useful so they are mostly test-specific.

Adobe Naskh is a lot prettier, esp with mixed Arabic RTL text and LTR
numerals (which seems to be the point of this test), but it's non-free
and hard to come by. The Noto series on the other hand is easier to
install on Travis for the test suite. I have a copy of both compared the
output to see that the it was laying out the same other than the
different font metrics.
The original Minion Pro font for the chapter break might have not shown
this problem, but on systems without that font the fallback might have a
different X height. On my system this test would wrap pages, which
introduced a new problem...that may need a test of it's own, but in the
spirit of testing one thing at a time this reduces the total content to
be on one page even if the selected font isn't around.
In the spirit of testing one thing at a time I pared this test down to
just the code that is essential for the test, notably of typesetting
mixed LTL/RTL frames and the same Unicode in different Arabic scripts.

I also added the expected output as a benchmark.
alerque added 12 commits July 2, 2015 08:31
This was generated on an OSX machine with the TTC font in question, so
it should be the _correct_ output. Note I was unable to duplicate the
fail state on this because the TTC loader only wants to work for me.
Also note the test that checks this needs to be skiped if not on OSX.
Although it works in parallel, Travis launches jobs from the top down
in the matrix. During busy times and if availability is low this can
mean the early ones on the list finish some time before the others.
Since I'm most interested in seeing the tests run in the latest versions
and can wait to hear about regression issues later, sort the matrix in
reverse order with the newest versions on top.
Newer systems can use ~/.local/share/fonts but in spite of the
documentation Ubuntu 12.04 as used by Travis CI does not actually look
there by default. It does check /home/<usernome>/.fonts so...
The Ubuntu 12.04 LTS repos have very very old versions of the SIL fonts
(e.g. 1.001 from 2009 instead of 2.020 from 2014) and the metrics have
gotten changed quite a bit, especially for the Arabic Scheherazade font,
but also lesser so for some of the others. This makes the regression
tests hard to run because it's not the font version any developer is
likely running on their system. Rather than benchmark the old ones, this
will just install the newest available versions into the Travis test
environment.

(Note mkdir is not necessary for the manual font install since unzip
 will create up to 1 directory on extract.)
Whatever is wrong with the toolchain that generated the last one (see
issue sile-typesetter#124) it's definitely broken (the diacritics are positioned all
wrong) and this one is—at the very least—less broken.
This is a package from Trusty because the one in Precise repos is way
out of date and has some bad metrics, ergo killing some of the
regression tests that depend on it. Since there isn't much in the
package but the font the deb file installs fine without much help.
The Travis environment is really hard (and slow) to debug, so it's more
useful to get data about all the regression tests at once rather than
waiting to find out others failed after fixing one. This will

* Still fail immediately for any failure in the normal compile checks
* Keep processing regression checks after a failure, but output the
  right exit code in the end.
* Not be quite as verbose. The actual/expected data is in the file
  system anyway and the diff is far more useful for debuging from Travis
  logs than the full dumps which are hard to even scroll past.
Not that we want to go breaking examples and such, but these will need a
different testing system than the ones we run using attack.pl triggered
by the existence of .expected files.

Here we actually have the PDF's in the repository anyway. If there _are_
regression tests that need to be run on the content of these examples it
should be moved into tests/.
The logic of checking whether all the stuff compiles vs whether it
exactly matches the reference implementation was getting too convoluted.
This separates the concerns so that only documentation and examples get
compile checks while the tests get compile and regression checks
(depending of the mode of the flag at run time). Travis for example runs
both modes, but separately so what failed is more apparent than if they
were lumped in together.
Having a callout to a debug function is great in local code but it
shouldn't be in the committed code unless it is behind a switch of some
sort. This particular instance breaks the way the test suite works
because of the extra un-expected output.
@alerque
Copy link
Member Author

alerque commented Jul 2, 2015

Oley oley! Happy dance—I see green! This should be ready for merge. Finally.

But @simoncozens, you might say, “It looks red to me?!” And you'd be right. But if you look closer you'll find it's actually supposed to be red because there actually is a regression issue (just reported as #125) between Lua versions. Lua 5.3 handles float math differently than 5.3 and SILE.outputters.debug needs to be fixed somehow so that the output of the debugger is deterministic. The test suite itself is, for the first time, actually testing for regressions and expected to succeed. As soon as the math bug is fixed it should show green across the whole matrix.

As fixing the actual existing regression is something for the master branch rather than than the test code I'll leave that to be fixed in master or a different PR. This set of changes is, to the best of my knowledge, doing what it should be doing and merging it to master makes sense even with the red x because the issue it found actually exists in master.

simoncozens added a commit that referenced this pull request Jul 2, 2015
@simoncozens simoncozens merged commit 5b671c7 into sile-typesetter:master Jul 2, 2015
@simoncozens
Copy link
Member

This is incredibly cool (and pretty scary). Thank you very much for all this work. I came up with the idea of regression tests when I was mucking about with the linebreaking algorithm - that is some very scary code and I didn't want to disturb it unintentionally - but I think this will also help us catch plenty of other problems along the way. (Or we could be drowning in a sea of red crosses for the near future. I don't know. Let's find out. :-) )

@alerque
Copy link
Member Author

alerque commented Jul 2, 2015

The sea of red crosses is a definite possibility. I think it can be manged however. I have some shell scripts I wrote while setting this up that might help. One of them for example runs through regression tests, catches ones that fail, generates overlay images for visual review, pops up the diff and the overlay image and asks if it's okay, then regenerates the expected output if it gets an okay.

The debug output is pretty cool actually and I think as long as we pay attention to keeping it deterministic and have the right tooling on hand it shouldn't be too bad to manage. In any event it will be easier if we start now rather than trying to add this on farther down the road.

@alerque alerque deleted the feature/test-regressions branch July 2, 2015 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants