Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listings break book PDFs/EPUBs without any warning #7699

Closed
ijmitch opened this issue Nov 25, 2023 · 37 comments
Closed

Listings break book PDFs/EPUBs without any warning #7699

ijmitch opened this issue Nov 25, 2023 · 37 comments
Assignees
Labels
books bug Something isn't working listings
Milestone

Comments

@ijmitch
Copy link

ijmitch commented Nov 25, 2023

Bug description

I was turning a website into a book and struggled to understand why Quarto (1.4 as it happens) was producing PDFs and EPUBs which Macos Preview refuses to open with the complaint that the file is broken or an invalid format.

I finally discovered that removing the listing options from index.qmd would give me a working PDF and EPUB.

Searching got me as far as the Discussion #4266 where it's stated that listings are only supported for websites. But with https://quarto.org/docs/books/ saying:

HTML books are actually just a special type of Quarto Website and consequently support all of the same features as websites including full-text search.

and the listing working for the html of the website of the book, I think there's at least a doc update needed - I couldn't find any qualification that listings don't work with PDF/EPUBs of books, but also this should perhaps be diagnosed during quarto render so people don't scratch there heads as much.

Apologies if I've missed a statement in the docs.

Steps to reproduce

I've reproduced by using the boilerplate book project and making the index.qmd a listing page thus:

---
listing: default
---
# Preface {.unnumbered}

This is a Quarto book.

To learn more about Quarto books visit <https://quarto.org/docs/books>.

Expected behavior

Preview gives me HTML as expected:

image

Actual behavior

but the PDF is unusable

Your environment

Quarto 1.4
Macos Sonoma 14.1.1

Quarto check output

Quarto 1.4.506
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.9: OK
      Dart Sass version 1.69.5: OK
      Deno version 1.37.2: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.4.506
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: v2023.11
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/ijmitch/Library/TinyTeX/bin/universal-darwin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.9.13
      Path: /Users/ijmitch/.pyenv/versions/3.9.13/bin/python3
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with python3 -m pip install jupyter

[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/
@ijmitch ijmitch added the bug Something isn't working label Nov 25, 2023
@ijmitch
Copy link
Author

ijmitch commented Nov 25, 2023

My original listing page was with type: grid but I simplified it for the steps to reproduce above. I just checked that adding type: grid to the boilerplate index.qmd results in the same problem.

@ijmitch
Copy link
Author

ijmitch commented Nov 25, 2023

I guess in most cases it's reasonable that listings as used, for example, for blogs are not valid for books, but then that makes the statement that books "support all of the same features as websites" questionable.

In my case, the type: grid was only on the homepage to provide a more grand way to get to the three major starting points for different types of consumer of the website.

I wondered if I could suppress the listing by giving it an id and putting that inside some conditional content:

---
listing:
  id: contents-listing
  type: grid
---
# Preface {.unnumbered}

This is a Quarto book.

::: {.content-visible when-format="html"}
::: {#contents-listing}
:::
:::

To learn more about Quarto books visit <https://quarto.org/docs/books>.

but that still produces a broken PDF.

@mcanouil
Copy link
Collaborator

mcanouil commented Nov 25, 2023

I guess in most cases it's reasonable that listings as used, for example, for blogs are not valid for books, but then that makes the statement that books "support all of the same features as websites" questionable.

Websites are not $\LaTeX$. The features shared are obviously shared by the format, meaning html.
Quarto never stated for listing to work in non HTML documents.
Even you quoted the documentation stating "HTML book".
Also the listing feature is documented in "website" and nowhere else.

So, if you want to use that, use conditional content to not include an HTML feature in $\LaTeX$ based documents.
See https://quarto.org/docs/authoring/conditional.html.

I am going ahead and closing this, as it is not a bug but rather a misuse of the feature depending of the formats.

@mcanouil mcanouil added books and removed bug Something isn't working labels Nov 25, 2023
@ijmitch
Copy link
Author

ijmitch commented Nov 25, 2023

I'm new to Quarto. I was very happy with some content as a website (originally rendered with Docusaurus), but we identified that some users might benefit from the entire thing as a PDF or EPUB, so I was just hoping that converting to a book would be a reasonable means to achieve that.

I missed the point that 'HTML books' are not guaranteed to be PDF-able.

@mcanouil
Copy link
Collaborator

I perfectly understand the use case but many HTML features will never come to PDF because the tools are too much different, thus the conditional content feature which allow user to use $\LaTeX$ or HTML specific features without compromising the compatibility of the documents to both formats.

@cscheid cscheid reopened this Nov 27, 2023
@cscheid
Copy link
Collaborator

cscheid commented Nov 27, 2023

Be that as it may, Quarto should strive to never generate malformed content.

A few options:

  • At the very least, we should silently not emit listing output where it would break the document.
  • Ideally, we'd support listing, but that's a hard thing to do in general.
  • I'd be happy if we warned that listing: default is not supported in PDF.

@cscheid cscheid added the bug Something isn't working label Nov 27, 2023
@ijmitch
Copy link
Author

ijmitch commented Nov 27, 2023

Thank you @cscheid - I was intending to come back this morning and appeal that simply getting a bad PDF without warning is pretty unhelpful.

I also found #5782 touches on some of the same things.

You might also see that I did try to get the listing content within a ::: {.content-visible when-format="html"} div which I believe anticipated what @mcanouil was suggesting, but the way I approached it didn't solve the problem of the bad PDF - perhaps I was taking a wrong approach there?

@cscheid
Copy link
Collaborator

cscheid commented Nov 27, 2023

perhaps I was taking a wrong approach there?

What you did is completely reasonable, but doesn't solve the bug: the content-visible feature doesn't work quite as well as you'd hope for.

Two things happen when you add listings: the listing contents itself, and all of the HTML dependencies of the listing contents that we have to add. The content-visible technique you tried only removes the former (and you have no way to know that the latter is there.)

One can argue that the "HTML detritus" of the listings feature sticking around when content-visible removes the only listing in the PDF document is actually the bug. But that is honestly a very hard bug for us to fix in general with how our code is setup right now, and so the best we can do is to not allow the document to arrive at such a state to begin with.

@cscheid cscheid added the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label Nov 27, 2023
@ijmitch
Copy link
Author

ijmitch commented Nov 27, 2023

In other SSG's (well, mainly Mkdocs) before I recently landed here (and was greatly impressed with the maturity of the approach) I'd resorted to simply adding a script with suitable Pandoc commands to generate whichever set of source files into a PDF or EPUB, so I'm not averse to reverting to that.

I'm trying to wean people off even thinking about needing such things for the use case I have, so was hoping not to put any/much effort into this - but I certainly appreciate the discussion here.

@mcanouil
Copy link
Collaborator

mcanouil commented Nov 27, 2023

Be that as it may, Quarto should strive to never generate malformed content.

A few options:

  • At the very least, we should silently not emit listing output where it would break the document.

  • Ideally, we'd support listing, but that's a hard thing to do in general.

  • I'd be happy if we warned that listing: default is not supported in PDF.

Then the issue is way more general than PDF/EPUB as Typst, Word, etc are likely to be affected as well, and possibly other HTML-based features in Quarto behave like listings.

@ijmitch
Copy link
Author

ijmitch commented Nov 27, 2023

Being a novice here (at least with respect to Quarto, if not SSGs, structuring of source projects, YAML etc), might I just describe what might be my ideal solution?

I'd be very happy to stay with type: website but then have an ability to define a composite PDF including multiple .qmd files to be rendered in the output. My first delight with Quarto was PDF under the 'other formats' for a page, but some people seem to want a larger single document.

The way of specifying the .qmd files to include in the output could even be similar to how listings work for HTML. So rather than assume all the content is present in the PDF for a type: book project, just allow a 'pdf listing' to specify some pdf output. Obviously then I could exclude the file with the offending HTML listing content.

I have no idea if that's easier to contemplate within the constraints of the implementation... it's just a suggestion.

@cscheid
Copy link
Collaborator

cscheid commented Nov 27, 2023

I'd be very happy to stay with type: website but then have an ability to define a composite PDF including multiple .qmd files to be rendered in the output.

We'd like to support that as well, but "composite PDF" formats is, as of today, just "book through the LaTeX toolchain". That format is stricter than type: website, and so we can't simply make that "just work".

I'll note that if you use a "book" project, you still get a "website" and you can get a PDF composite. Your website, of course, will have some limitations in the format, but many books have been written and published this way: https://quarto.org/docs/books/index.html

@ijmitch
Copy link
Author

ijmitch commented Nov 27, 2023

@cscheid - that's fine, and it was the temptation saying that "ah, if people really want a pdf of the collection of material then that's really a book" which was the beginning of this, but I ran myself into the wall of the differences between the types. I will stick with type: website and work harder to tell the consumers "this is not the PDF you're looking for" ;-)

@dragonstyle dragonstyle added this to the v1.4 milestone Nov 27, 2023
@ijmitch
Copy link
Author

ijmitch commented Nov 27, 2023

Sorry if I'm labouring a point here, but I just changed the _quarto.yml of the book boilerplate project to type: website and with quarto render index.qmd where that file has:

---
listing:
  id: contents-listing
  type: grid
---
# Preface {.unnumbered}

This is a Quarto book.

To learn more about Quarto books visit <https://quarto.org/docs/books>.

this gives me an index.pdf where it's simply ignored the listing - which is more reasonable than the broken pdf with type: book. It's a shame book projects trying to put multiple source content into one pdf can't do the same.

@mcanouil
Copy link
Collaborator

mcanouil commented Nov 27, 2023

Thank you.
As you might not have noticed, the issue is open as bug and assigned, this means the team understood the issue and will resolve it in due time.
Now you just have to wait and be patient.

@dragonstyle
Copy link
Collaborator

Thanks for the detailed reporting - I'm sorry to say that when I try to reproduce the issue, I'm not able to (as I would expect, knowing how listings work). Listing process is tied directly to HTML output - when a PDF is being generated no code runs which processes listings, so they are ignored.

I confirmed this by creating a default book project and the modifying the index.qmd to contain a listing of various formats that you suggested. In all cases, I was able to properly render and view a pdf file.

I've attached a zip of my attempt to reproduce the issue (including the outputs). If you are able to consistently reproduce this issue, it would be great if you could provide the complete reproducible case so I can test locally - it will be very unexpected if listing content is ending upon in PDF output!

book.zip

@dragonstyle dragonstyle closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2023
@dragonstyle dragonstyle added the needs-repro Issues that are blocked until reporter provides an adequate reproduction label Dec 4, 2023
@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

@dragonstyle ... well, this is interesting/embarrassing - I unzipped your archive and checked my Mac could open the generated PDF and all was fine.

However if I rebuild it with quarto render then the new PDF is broken as I experienced before which provoked this issue.

So the problem is with my Mac's stack of software... what I don't know. Surely this would just be down to my version of Quarto and TinyTex?

Does:

Rendering PDF
running xelatex - 1
  This is XeTeX, Version 3.141592653-2.6-0.999995 (TeX Live 2023) (preloaded format=xelatex)
   restricted \write18 enabled.
  entering extended mode

running xelatex - 2
  This is XeTeX, Version 3.141592653-2.6-0.999995 (TeX Live 2023) (preloaded format=xelatex)
   restricted \write18 enabled.
  entering extended mode

tell you anything suspicious?

@mcanouil
Copy link
Collaborator

mcanouil commented Dec 4, 2023

You could try to use Quarto to remove and reinstall TinyTex.
quarto remove tinytex (in theory) and quarto install tinytex.

@mcanouil mcanouil removed the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label Dec 4, 2023
@dragonstyle
Copy link
Collaborator

dragonstyle commented Dec 4, 2023

That log output seems completely fine - You might want to try a simple document as a PDF and see whether that works (e.g. standalone document like):

---
title: Hello World
format: pdf
---

## Hello World

This is a PDF

If that renders fine, then perhaps there is something about books triggering this that differs between our environments.

One other useful thing to do would be to include the option keep-tex under your pdf format, which will keep the generate LaTeX file around. This would let us see if something sketchy/unexpected is showing up in the LaTeX output which might give us a clue.

Generally if we were making invalid LaTeX, I would expect an error while rendering the LaTeX to PDF (for example if we mixed HTML and LaTeX), so this is unusual for sure. Can you share one of these broken pdfs and we can see if I can open it?

@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

B.pdf

Here's the broken (at least for me) PDF.

My PDF rendering when not a book, or without making a file include listing is absolutely fine.

So just removing the listing front-matter from index.qmd gives me this PDF which opens fine...

B.pdf

@dragonstyle
Copy link
Collaborator

I can confirm that on my machine the first is broken (though not in some obvious way, sadly), and the second is not...

Could you share the exact project that is reproducing this? Alternatively, can you trying using keep-tex to keep the intermediary LaTeX and share that?

@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

B.tex.zip

contains the tex which gives the broken PDF.

@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

I've been doing all this this evening just modifying the copy of the project which you sent me in the book.zip earlier.

@dragonstyle
Copy link
Collaborator

I'm perplexed. When I render that tex file to a pdf using xelatex B.tex I end up with a valid pdf file.

I can't explain how the presence of the listing key is causing this :( - listing processing is hidden behind a check of the output format (being HTML). The tex file doesn't really have any evidence of listings being processed either, so I am really at a loss.

@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

We must be looking in the wrong place.... here are good-B.tex and broken-B.tex each from a quarto render, one producing a good PDF and the other a broken PDF. They are identical, I think.
Archive.zip

If the .tex files are actually the same, there must be something else downstream of that which breaks the PDF.

But I have also, belatedly, noticed that the broken quarto render emits these WARNINGs:

❯ quarto render
[1/4] index.qmd
WARNING: File /Users/ijmitch/Downloads/b/intro.qmd in the listing 'contents-listing' contains no metadata.
WARNING: File /Users/ijmitch/Downloads/b/summary.qmd in the listing 'contents-listing' contains no metadata.
WARNING: File /Users/ijmitch/Downloads/b/references.qmd in the listing 'contents-listing' contains no metadata.
[2/4] intro.qmd
[3/4] summary.qmd
[4/4] references.qmd

whereas, perhaps obviously, the good case where there's no presentation of front-matter with listing does not.

@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

And, yes - xelatex broken-B.tex gives me a good PDF too!

@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

Archive-2.zip

This ZIP has a B.pdf which is the broken result from quarto render and broken-B.pdf which is actually GOOD since it came from xelatex broken-B.tex. You can see they are quite considerably different in size. I've tried a file compare in VSCode, but I can't tell where the variations in the PDF contents is significant. Certainly there's lots in common and all the variation is in binary data inside a subset of the stream objects.

@mcanouil
Copy link
Collaborator

mcanouil commented Dec 4, 2023

May I suggest to use a Git repository instead of many zip archives?
At least you get diffs and possibly can use codespaces, etc.

@dragonstyle
Copy link
Collaborator

I can reproduce it locally (woo hoo!) - the key is to use the command quarto render (with no to --pdf), which mixes HTML and PDF rendering, likely causing the issue somehow. Investigating now... Thanks for your persistence narrowing this down!

@dragonstyle dragonstyle reopened this Dec 4, 2023
@dragonstyle dragonstyle removed the needs-repro Issues that are blocked until reporter provides an adequate reproduction label Dec 4, 2023
@ijmitch
Copy link
Author

ijmitch commented Dec 4, 2023

Progress!

I just shoved the project into https://github.com/ijmitch/quarto-book-debug and made it public.

@dragonstyle
Copy link
Collaborator

Ok that was pretty straightforward- there are global postprocessors that handle last minutes tasks during a project render, and the listing post processors were not expecting that the PDF output would appear in the list of outputs to process (though that is actually expected). I added a check that will filter outputs to only HTML output and that should resolve it.

I'll start a fresh pre-release build and this should be testable within a 10-15 minutes. Once again thx for persistence this was a good that defied my expectations!

@ijmitch
Copy link
Author

ijmitch commented Dec 5, 2023

@dragonstyle - that's great - many thanks!

@ijmitch
Copy link
Author

ijmitch commented Dec 5, 2023

hmm... @dragonstyle - did you test this for epubs as well as pdfs?

@dragonstyle
Copy link
Collaborator

I didn't text the mixed render case :( - will check now!

@ijmitch
Copy link
Author

ijmitch commented Dec 5, 2023

I added:

  epub:
     title: "B"

to _quarto.yml and got an epub out but it didn't open with Macos 'Books' app (whereas other epubs from Quarto without listing.

@dragonstyle
Copy link
Collaborator

Yeah, it was the same problem - I needed the check to be even more strict. A fresh build is on the way!

6eedc68

@ijmitch
Copy link
Author

ijmitch commented Dec 5, 2023

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
books bug Something isn't working listings
Projects
None yet
Development

No branches or pull requests

4 participants