Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for testing based on uncompressed PDF streams #10

Closed
josephwright opened this issue Jun 26, 2017 · 24 comments
Closed

Add support for testing based on uncompressed PDF streams #10

josephwright opened this issue Jun 26, 2017 · 24 comments
Assignees

Comments

@josephwright
Copy link
Member

It is possible to produce uncompressed PDF streams, which allow those versed in PDF to examine the detailed output of a TeX run and to check on aspects that may be difficult/impossible to test or debug from the macro layer/\tracingall. Adding support for this area would allow both testing and debugging abilities to be enhanced.

The interface for this is to be decided, but seems likely to use a marker in the .lvt which tells l3build to read the PDF, followed by manipulation of the latter to extract the 'useful' parts.

@wspr
Copy link
Contributor

wspr commented Jun 26, 2017 via email

@josephwright
Copy link
Member Author

A suggested test environment here is based around

   \STREAMTEST#1#2 =>
     \pdfliteral direct {\%\space l3build test \space"#1"}
     #2
     \pdfliteral direct{\%\space l3build test end}

with some marker to the .log to tell l3build to look at the PDF (something like \CHECKPDF or perhaps \CHECKPDFSTREAM).

@josephwright
Copy link
Member Author

The -p switch would probably be dropped in this approach, likely replaced by \CHECKPDFBINARY or something similar.

@blefloch
Copy link
Member

blefloch commented Jul 2, 2017

Yes, a command in the test source seems better than a switch that one has to remember (and presumably make texlua build.lua ctan correctly pass along).

@josephwright
Copy link
Member Author

@blefloch Good point: I'd not considered that (also applies to 'mixed' testing situation).

I'll try to come up with some proposals today if I have time: probably another branch so we can argue over the detail.

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Jul 2, 2017 via email

@josephwright
Copy link
Member Author

OK, I need some names for the relevant macros. I was thinking \CHECKPDFSTREAM and \CHECKPDFBINARY. (I considered a two-part \GENERATEPDF + \CHECKSTREAM/\CHECKBINARY, but that is more work for the user, and what happens if the first marker is missed.) Names OK?

josephwright added a commit that referenced this issue Jul 2, 2017
This is the first part of implementing PDF stream based tests:
adjusting to use .log-based markers.
@blefloch
Copy link
Member

blefloch commented Jul 2, 2017

The names sound good, assuming they do roughly the following:

  • \CHECKPDFBINARY instructs l3build to compare the binary pdf resulting from the run to some saved binary pdf result (and with texlua build.lua save -exetex test001 say, the result of the xetex run is presumably saved as test001.xetex.ext for some extension ext)

  • \CHECKPDFSTREAM instructs l3build to cancel pdf compression (it's also ok if that's always the case) and extract from the resulting pdf some regions marked by some markers in the pdf (I didn't follow the discussions you had with Javier on these markers).

If both \CHECKPDFBINARY and \CHECKPDFSTREAM are given, error.

@car222222
Copy link

This could need a new issue?

As maybe Will was suggesting, it is essential to test exactly what is written out to external files, including the .pdf file, by any mechanism: writes, pdfliteral etc. This should, as Will said, be done without explicitly reproducing the material and writing it to the log file; and is better done without explicitly in the external file. Except for \immediate operations, this information needs to be located within the boxes traced by \showoutput .

Reasons: For output such as pdfliteral, or a pdf: special, testing at this stage (without looking into the .pdf file) tests precisely 'what LaTeX does' and this is what the test suite should test. (If you only compare what gets into the pdf file then, in the case of a diff, you do not know whether something in the LaTeX setup has caused the change or if it is caused by a problem in the process that produces the content of the pdf file (and that is external to LaTeX).

Note that \writes, \specials (if not rejected) and \pdfliterals appear in the box data from \showoutput

I did not know how to trace the (expanded) content of an \immediate write.

Are there other primitives for output that need to be traced?

@josephwright
Copy link
Member Author

@car222222 I'm not sure I follow: Javier's request was related to areas that are hard/impossible to debug at the macro end as one has to be sure that the binaries are 'behaving'. (The ordering and exact nature of \special instructions can be vital in this regard, and that's simply not visible to the macro layer.)

@car222222
Copy link

I said that it might be a new issue.

I do not understand the relevance of 'binaries', as Javier needs an uncompressed PDF for his tests.

With the correct settings, the content of a \special is written to the .log file as part of he contents of the box that is shipped out. Thus visibility to the macro layer (even simply) is not required to test this output. Same for \pdfliteral and \write but the contents of an \immediate \write do not appear to be tracable.

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Jul 3, 2017 via email

@josephwright
Copy link
Member Author

@car222222 What we ask for in specials and what the binaries (pdfTeX, dvips, dvipdfmx, ...) do aren't necessarily the same, so if you are seeking a particular outcome you have to check the 'end result'. This shows up once one starts doing colour/graphics/hyperlink/bookmark stuff of any complexity. So my understanding is that Javier wants a way to test what the user actually gets for some particular cases, not what we think the macro layer asks for. It's a somewhat specialist area but it's not unreasonable to cover it.

@josephwright
Copy link
Member Author

@FrankMittelbach Yup, that's more or less it: \pdfliteral here is just being used to do the equivalent of \OMIT/\TIMO or \TEST, but for the PDF stream rather than the .log.

@car222222
Copy link

car222222 commented Jul 3, 2017

Sorry: clearly I should have made it new issue. I was not suggesting that this could be used for Javier's case. I was mainly responding to Will, as he seems to be implying that it was impossible to get such things into the .log file. Also to note that the contents of immediate things are, I think, unloggable, or are they?

@car222222
Copy link

What we ask for in specials and what the binaries (pdfTeX, dvips, dvipdfmx, ...) do aren't necessarily the same, so if you are seeking a particular outcome you have to check the 'end result'.

That is what I said earlier. It is the reason why, when possible, one should also (in many cases and where possible) test what LaTeX outputs. Then you can tell if a fail is due to LaTeX or due to some change in what 'the binaries' have done to the output from LaTeX.

(BTW: not sure I like executables/processes being called 'binaries' as they may or not be in binary form; is that standard jargon now?)

@josephwright
Copy link
Member Author

@car222222 One can of course can (and should) test the macro layer, but that's what we already have, so I'm not sure what the issue is there.

On 'binaries', I tend to use that to differentiate from the macro layer: they are executable programs, not scripts. (The latter to me as a 'Windows person' won't be executables either in the main ...)

@car222222
Copy link

So is 'the macro layer' what I call 'LaTeX'? Just getting this clear.

In that case the only issues left are how to log the contents of \immediate output; and whether I interpreted Will's input correctly:-).

Am I correct that \immediate \write etc cannot be traced in detail?

@josephwright
Copy link
Member Author

A couple of notes from the TeX Live list re. behaviour of (x)dvipdfmx in this area.

dvipdfmx deliberately drops literal lines starting % and also drops whitespace. @davidcarlisle therefore suggested something like

\special{pdf:literal direct (OMIT THIS LINE) pop}

as an approach that would work (with care).

To get no PDF compression you need a couple of not-really documented specials:

\special{dvipdfmx:config z 0}% ~ \pdfcompresslevel
\special{dvipdfmx:config C 0x40}% ~ \pdfobjcompresslevel

which avoids needing to manually run xdvipdfmx for XeTeX and does seem to have the desired effect.

@josephwright
Copy link
Member Author

Also worth noting that the pdfTeX and dvipdfmx produce rather different results at the low level: dvipdfmx doesn't start a new line for the special and surrounds it with ET/BT, whereas for pdfTeX it is literally inserted directly in the text but does stand on a line on its own. All probably workable as the results will be highly route-dependent anyway ...

@josephwright
Copy link
Member Author

josephwright commented Jul 28, 2018

I've now looked at binary-based PDF comparison in a137f8c. That shows up that whilst you can automatically make PDFs, the binaries are not platform-independent. Thus it may well be the case that the best set up for us is stream-based testing. That would simplify some aspects here, as we then don't need two separate PDF routes.

If no one objects, I'll drop binary comparison and re-work for stream support instead.

@wspr
Copy link
Contributor

wspr commented Jul 29, 2018 via email

@josephwright
Copy link
Member Author

@wspr Well, for 'reproducible build' uses work on a binary level, but those are single-platform. That was the original motivation for looking at PDF-based testing, but I agree that for us it probably doesn't help so much.

josephwright added a commit that referenced this issue Jul 31, 2018
See issue #10 and issue #61) for discussions.

This first pass only strips out a minimal amount of data:
more normalization may well be needed.
@josephwright
Copy link
Member Author

This is now done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants