Add support for testing based on uncompressed PDF streams #10

josephwright · 2017-06-26T10:50:15Z

It is possible to produce uncompressed PDF streams, which allow those versed in PDF to examine the detailed output of a TeX run and to check on aspects that may be difficult/impossible to test or debug from the macro layer/\tracingall. Adding support for this area would allow both testing and debugging abilities to be enhanced.

The interface for this is to be decided, but seems likely to use a marker in the .lvt which tells l3build to read the PDF, followed by manipulation of the latter to extract the 'useful' parts.

The text was updated successfully, but these errors were encountered:

wspr · 2017-06-26T14:08:17Z

I initially thought “well, if TeX is writing the data to the PDF stream via XYZ approach then it could just write it to the log as well”, but obviously you don’t want to have duplicate code paths just for testing purposes! So makes sense to me.

josephwright · 2017-07-01T14:26:57Z

A suggested test environment here is based around

   \STREAMTEST#1#2 =>
     \pdfliteral direct {\%\space l3build test \space"#1"}
     #2
     \pdfliteral direct{\%\space l3build test end}

with some marker to the .log to tell l3build to look at the PDF (something like \CHECKPDF or perhaps \CHECKPDFSTREAM).

josephwright · 2017-07-01T14:28:22Z

The -p switch would probably be dropped in this approach, likely replaced by \CHECKPDFBINARY or something similar.

blefloch · 2017-07-02T00:32:34Z

Yes, a command in the test source seems better than a switch that one has to remember (and presumably make texlua build.lua ctan correctly pass along).

josephwright · 2017-07-02T07:24:04Z

@blefloch Good point: I'd not considered that (also applies to 'mixed' testing situation).

I'll try to come up with some proposals today if I have time: probably another branch so we can argue over the detail.

FrankMittelbach · 2017-07-02T07:56:15Z

Am 02.07.17 um 02:32 schrieb Bruno Le Floch:

Yes, a command in the test source seems better than a switch that one has to remember (and presumably make |texlua build.lua ctan| correctly pass along).

agreed. one would want to have tests that are self-contained and automatically work switches should only do things like restricting what to test, what to show from the results, how to deal with errors and the like but not needed to make a test work in the first place, ie a simple "check" target should always be able to correctly process all tests automatically

josephwright · 2017-07-02T08:03:47Z

OK, I need some names for the relevant macros. I was thinking \CHECKPDFSTREAM and \CHECKPDFBINARY. (I considered a two-part \GENERATEPDF + \CHECKSTREAM/\CHECKBINARY, but that is more work for the user, and what happens if the first marker is missed.) Names OK?

This is the first part of implementing PDF stream based tests: adjusting to use .log-based markers.

blefloch · 2017-07-02T11:16:32Z

The names sound good, assuming they do roughly the following:

\CHECKPDFBINARY instructs l3build to compare the binary pdf resulting from the run to some saved binary pdf result (and with texlua build.lua save -exetex test001 say, the result of the xetex run is presumably saved as test001.xetex.ext for some extension ext)
\CHECKPDFSTREAM instructs l3build to cancel pdf compression (it's also ok if that's always the case) and extract from the resulting pdf some regions marked by some markers in the pdf (I didn't follow the discussions you had with Javier on these markers).

If both \CHECKPDFBINARY and \CHECKPDFSTREAM are given, error.

car222222 · 2017-07-03T06:05:47Z

This could need a new issue?

As maybe Will was suggesting, it is essential to test exactly what is written out to external files, including the .pdf file, by any mechanism: writes, pdfliteral etc. This should, as Will said, be done without explicitly reproducing the material and writing it to the log file; and is better done without explicitly in the external file. Except for \immediate operations, this information needs to be located within the boxes traced by \showoutput .

Reasons: For output such as pdfliteral, or a pdf: special, testing at this stage (without looking into the .pdf file) tests precisely 'what LaTeX does' and this is what the test suite should test. (If you only compare what gets into the pdf file then, in the case of a diff, you do not know whether something in the LaTeX setup has caused the change or if it is caused by a problem in the process that produces the content of the pdf file (and that is external to LaTeX).

Note that \writes, \specials (if not rejected) and \pdfliterals appear in the box data from \showoutput

I did not know how to trace the (expanded) content of an \immediate write.

Are there other primitives for output that need to be traced?

josephwright · 2017-07-03T08:08:23Z

@car222222 I'm not sure I follow: Javier's request was related to areas that are hard/impossible to debug at the macro end as one has to be sure that the binaries are 'behaving'. (The ordering and exact nature of \special instructions can be vital in this regard, and that's simply not visible to the macro layer.)

car222222 · 2017-07-03T08:25:38Z

I said that it might be a new issue.

I do not understand the relevance of 'binaries', as Javier needs an uncompressed PDF for his tests.

With the correct settings, the content of a \special is written to the .log file as part of he contents of the box that is shipped out. Thus visibility to the macro layer (even simply) is not required to test this output. Same for \pdfliteral and \write but the contents of an \immediate \write do not appear to be tracable.

FrankMittelbach · 2017-07-03T09:34:54Z

Am 03.07.17 um 10:25 schrieb Chris Rowley:

With the correct settings, the content of a \special is written to the .log file as part of he contents of the box that is shipped out. Thus visibility to the macro layer (even simply) is not required to test this output. Same for \pdfliteral and \write but the contents of an \immediate \write do not appear to be tracable.

as I understand it the specials are only there to mark regions in the pdf and it is then region that he checks for spotcolor or something (didn't quite follow the discussion, nor is this something I understand much about).

josephwright · 2017-07-03T09:49:49Z

@car222222 What we ask for in specials and what the binaries (pdfTeX, dvips, dvipdfmx, ...) do aren't necessarily the same, so if you are seeking a particular outcome you have to check the 'end result'. This shows up once one starts doing colour/graphics/hyperlink/bookmark stuff of any complexity. So my understanding is that Javier wants a way to test what the user actually gets for some particular cases, not what we think the macro layer asks for. It's a somewhat specialist area but it's not unreasonable to cover it.

josephwright · 2017-07-03T09:50:52Z

@FrankMittelbach Yup, that's more or less it: \pdfliteral here is just being used to do the equivalent of \OMIT/\TIMO or \TEST, but for the PDF stream rather than the .log.

car222222 · 2017-07-03T09:59:21Z

Sorry: clearly I should have made it new issue. I was not suggesting that this could be used for Javier's case. I was mainly responding to Will, as he seems to be implying that it was impossible to get such things into the .log file. Also to note that the contents of immediate things are, I think, unloggable, or are they?

car222222 · 2017-07-03T10:11:22Z

What we ask for in specials and what the binaries (pdfTeX, dvips, dvipdfmx, ...) do aren't necessarily the same, so if you are seeking a particular outcome you have to check the 'end result'.

That is what I said earlier. It is the reason why, when possible, one should also (in many cases and where possible) test what LaTeX outputs. Then you can tell if a fail is due to LaTeX or due to some change in what 'the binaries' have done to the output from LaTeX.

(BTW: not sure I like executables/processes being called 'binaries' as they may or not be in binary form; is that standard jargon now?)

josephwright · 2017-07-03T10:25:34Z

@car222222 One can of course can (and should) test the macro layer, but that's what we already have, so I'm not sure what the issue is there.

On 'binaries', I tend to use that to differentiate from the macro layer: they are executable programs, not scripts. (The latter to me as a 'Windows person' won't be executables either in the main ...)

car222222 · 2017-07-03T12:37:50Z

So is 'the macro layer' what I call 'LaTeX'? Just getting this clear.

In that case the only issues left are how to log the contents of \immediate output; and whether I interpreted Will's input correctly:-).

Am I correct that \immediate \write etc cannot be traced in detail?

josephwright · 2017-07-03T16:00:39Z

A couple of notes from the TeX Live list re. behaviour of (x)dvipdfmx in this area.

dvipdfmx deliberately drops literal lines starting % and also drops whitespace. @davidcarlisle therefore suggested something like

\special{pdf:literal direct (OMIT THIS LINE) pop}

as an approach that would work (with care).

To get no PDF compression you need a couple of not-really documented specials:

\special{dvipdfmx:config z 0}% ~ \pdfcompresslevel
\special{dvipdfmx:config C 0x40}% ~ \pdfobjcompresslevel

which avoids needing to manually run xdvipdfmx for XeTeX and does seem to have the desired effect.

josephwright · 2017-07-03T16:07:50Z

Also worth noting that the pdfTeX and dvipdfmx produce rather different results at the low level: dvipdfmx doesn't start a new line for the special and surrounds it with ET/BT, whereas for pdfTeX it is literally inserted directly in the text but does stand on a line on its own. All probably workable as the results will be highly route-dependent anyway ...

josephwright · 2018-07-28T22:12:18Z

I've now looked at binary-based PDF comparison in a137f8c. That shows up that whilst you can automatically make PDFs, the binaries are not platform-independent. Thus it may well be the case that the best set up for us is stream-based testing. That would simplify some aspects here, as we then don't need two separate PDF routes.

If no one objects, I'll drop binary comparison and re-work for stream support instead.

wspr · 2018-07-29T01:32:57Z

Right, it sounds like testing at the binary level isn't feasible. Thanks for clarifying.

josephwright · 2018-07-29T08:22:53Z

@wspr Well, for 'reproducible build' uses work on a binary level, but those are single-platform. That was the original motivation for looking at PDF-based testing, but I agree that for us it probably doesn't help so much.

See issue #10 and issue #61) for discussions. This first pass only strips out a minimal amount of data: more normalization may well be needed.

josephwright · 2018-07-31T11:25:57Z

This is now done.

josephwright added the enhancement label Jun 26, 2017

josephwright self-assigned this Jun 26, 2017

josephwright added a commit that referenced this issue Jul 2, 2017

Remove --pdf|-p option (see #10)

2610ac0

This is the first part of implementing PDF stream based tests: adjusting to use .log-based markers.

josephwright mentioned this issue Jul 3, 2017

Replace switch-based selection of PDF (binary) testing with macro approach #21

Closed

josephwright mentioned this issue Jul 28, 2018

binary (pdf) tests doesn't work: engine specific pdf is missing #62

Closed

josephwright mentioned this issue Jul 29, 2018

testing MediaBox and other metadata #61

Closed

josephwright added a commit that referenced this issue Jul 31, 2018

Use a text-based approach to PDF comparison

64825da

See issue #10 and issue #61) for discussions. This first pass only strips out a minimal amount of data: more normalization may well be needed.

josephwright closed this as completed Jul 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for testing based on uncompressed PDF streams #10

Add support for testing based on uncompressed PDF streams #10

josephwright commented Jun 26, 2017

wspr commented Jun 26, 2017 via email

josephwright commented Jul 1, 2017

josephwright commented Jul 1, 2017

blefloch commented Jul 2, 2017

josephwright commented Jul 2, 2017

FrankMittelbach commented Jul 2, 2017 via email

josephwright commented Jul 2, 2017

blefloch commented Jul 2, 2017

car222222 commented Jul 3, 2017

josephwright commented Jul 3, 2017

car222222 commented Jul 3, 2017

FrankMittelbach commented Jul 3, 2017 via email

josephwright commented Jul 3, 2017

josephwright commented Jul 3, 2017

car222222 commented Jul 3, 2017 •

edited

Loading

car222222 commented Jul 3, 2017

josephwright commented Jul 3, 2017

car222222 commented Jul 3, 2017

josephwright commented Jul 3, 2017

josephwright commented Jul 3, 2017

josephwright commented Jul 28, 2018 •

edited

Loading

wspr commented Jul 29, 2018 via email

josephwright commented Jul 29, 2018

josephwright commented Jul 31, 2018

Add support for testing based on uncompressed PDF streams #10

Add support for testing based on uncompressed PDF streams #10

Comments

josephwright commented Jun 26, 2017

wspr commented Jun 26, 2017 via email

josephwright commented Jul 1, 2017

josephwright commented Jul 1, 2017

blefloch commented Jul 2, 2017

josephwright commented Jul 2, 2017

FrankMittelbach commented Jul 2, 2017 via email

josephwright commented Jul 2, 2017

blefloch commented Jul 2, 2017

car222222 commented Jul 3, 2017

josephwright commented Jul 3, 2017

car222222 commented Jul 3, 2017

FrankMittelbach commented Jul 3, 2017 via email

josephwright commented Jul 3, 2017

josephwright commented Jul 3, 2017

car222222 commented Jul 3, 2017 • edited Loading

car222222 commented Jul 3, 2017

josephwright commented Jul 3, 2017

car222222 commented Jul 3, 2017

josephwright commented Jul 3, 2017

josephwright commented Jul 3, 2017

josephwright commented Jul 28, 2018 • edited Loading

wspr commented Jul 29, 2018 via email

josephwright commented Jul 29, 2018

josephwright commented Jul 31, 2018

car222222 commented Jul 3, 2017 •

edited

Loading

josephwright commented Jul 28, 2018 •

edited

Loading