Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LaTex/PDF output #152

Closed
umarcor opened this issue Aug 6, 2020 · 14 comments
Closed

Support LaTex/PDF output #152

umarcor opened this issue Aug 6, 2020 · 14 comments

Comments

@umarcor
Copy link

umarcor commented Aug 6, 2020

I found no reference to whether generating LaTeX/PDF output is either supported or planned.

@jjallaire
Copy link
Member

No, this format is just targeted at web output. Of course you can use the same R Markdown source to create PDF output using e.g. the Tufte Handout format (https://github.com/rstudio/tufte) or any of the rticles formats (https://github.com/rstudio/tufte)

@umarcor
Copy link
Author

umarcor commented Aug 9, 2020

@jjallaire thanks for clarifying. Unfortunately, I'm not sure to understand the ecosystem...

I'm currently using bookdown for generating HTML and PDF output from a set of Rmd files in a flat hierarchy. Now, I'd like to use distill's style for the HTML output. However, I tried writing a distill document, and it feels quite different:

  • _site.yml is required, instead of _bookdown.yml.
  • index.Rmd seems to be the only source. Other sources are rendered to separate HTML pages, without the expected format.
  • Rscript -e "rmarkdown::render_site('index.Rmd', 'distill::distill_article')" needs to be executed, instead of Rscript -e "bookdown::render_book('00-index.Rmd', 'bookdown::gitbook')".

Then, I tried with tufte. It seems to be equivalent to distill's workflow (all the content needs to be written in a single file), but the generation command requires the output to be specified: Rscript -e "rmarkdown::render('index.Rmd', 'tufte::tufte_html')", and _site.yml is not required. Hence, I'm afraid my current use case might not be supported.

In contrast, according to https://bookdown.org/yihui/rmarkdown/rticles-bookdown.html, rticles can be linked with bookdown. That is very desirable for the PDF output, and I will try it. Yet, I'm currently concerned about the HTML.

Summarizing, I'd like to have a bookdown document which uses distill for the HTML output, and rticles for the PDF output. Is it possible?

EDIT

@umarcor
Copy link
Author

umarcor commented Aug 9, 2020

For completeness, I tried using tufte styles with bookdown:

  • HTML:
    • bookdown::tufte_html_book works, but:
      • 'References' is empty (it seems that bibliografphy is not supported).
      • The TOC is shown at the top of the page, instead of being a sidebar.
  • PDF:
    • base_format: tufte::tufte_handout does not work because \subsubsection is undefined in LaTeX class tufte-handout.
    • base_format: tufte::tufte_book: the build is successful (using a simple 'author' field), but 'References' are not shown (\nobibliography*), and the cover page doesn't look good.
    • base_format: tufte_book2: same as tufte::tufte_book.

@jjallaire
Copy link
Member

Bookdown formats are their own self-contained entity (they deal with not just formatting but also how to combine chapters together, how to do navigation, etc). So you can't plug arbitrary HTML or PDF formats into bookdown, rather you need to use the existing bookdown formats (https://bookdown.org/yihui/bookdown/output-formats.html) or alternatively write a new one using the formats defined within the bookdown package as a guide.

@umarcor
Copy link
Author

umarcor commented Aug 13, 2020

So, is rticles a repository containing special templates that can be used with or without bookdown? or is https://bookdown.org/yihui/rmarkdown/rticles-bookdown.html outdated/incorrect?

@jjallaire
Copy link
Member

jjallaire commented Aug 13, 2020 via email

@yihui
Copy link
Member

yihui commented Aug 13, 2020

@umarcor https://bookdown.org/yihui/rmarkdown/rticles-bookdown.html is still up-to-date and correct: bookdown can be used with other output formats, including those from rticles. If you have problems with specific formats, you may report them to the specific Github repos.

BTW, the following is by design (Edward Tufte doesn't use sub-subsections in his books):

  • base_format: tufte::tufte_handout does not work because \subsubsection is undefined in LaTeX class tufte-handout.

And if you want generate PDF from HTML output formats, you may consider using Chrome to print the HTML page. It has been automated in pagedown: https://slides.yihui.org/2020-genentech-rmarkdown.html#19

@umarcor
Copy link
Author

umarcor commented Aug 13, 2020

@jjallaire, @yihui thanks for helping me understand. I'm sorry it's taking me time to wrap my head around it, since I'm not an R user.

If you have problems with specific formats, you may report them to the specific Github repos.

@yihui see rstudio/rticles#309. There, all the rticles "skeletons" are built using the default output/format/builder. That works ok for all except three templates (rstudio/rticles#309 (comment)). Apart from that, a minimal bookdown project is added, and it is tested with all the templates. 5 have an acceptable result, but most of them fail due to missing styles (rstudio/rticles#309 (comment)). Hence, the PR is a reproducible example per se. You can see all the runs in https://github.com/umarcor/rticles/actions, and for each of them, you can download the artifacts (the skeletons and bookdown+rticles builds that were successful). For example: https://github.com/umarcor/rticles/actions/runs/202005154. I'm quite sure I might be missing something very obvious, but a few of the errors seem legit.

And if you want generate PDF from HTML output formats, you may consider using Chrome to print the HTML page. It has been automated in pagedown: slides.yihui.org/2020-genentech-rmarkdown.html#19

Thanks for the reference. I've been using other projects such as Hugo (golang), Sphinx (Python) or AsciiDoc (Ruby). Hugo's and AsciiDoc's approach to PDF generation seems to be equivalent to the Chrome print solution you propose. However, my use case is that I want to provide HTML versions of documents that need to be written in LaTeX; because journals/publishers require it, or because other colleagues are going to use it for writting parts of the whole report/book. So, I am investigating alternatives that allow to generate the same LaTeX content that someone would write by hand, but using markdown instead.

  • Sphinx does use LaTeX for PDF output. However, the template is not designed to be replaced/redefined.
  • AsciiDoc might seem a good fit, because AsciiDoc -> DocBook -> LaTeX seems very cutomizable. Still, I found no intuitive guide/reference about how to write the LaTex template. For instance, DocBook + IEEEtrans.
  • Pandoc is a very capable toolbox and writing custom LaTeX templates is relatively easy. However, markdown is a poorly defined language for this task, so most required features for technical/scientific documents are implemented as extensions.

Consequently, the work that you both have done these last years is the closest to what I need: combining an "extended markdown frontend" with an "easy" LaTeX templating system (pandoc), that can produce nice HTML output too. Congratulations!

Now, I am trying to understand where are the current limitations of your infrastructure. Ideally, I'd like to have a single (r)markdown source and produce four outputs:

  • A fancy HTML website, with interactive elements (as distill).
  • A GitHub flavoured markdown that can be pasted in issues or visualized in repos.
  • A fancy LaTeX template, with useful features such as tocs/refs per chapter, backrefs, etc.
  • A LaTeX template required by some publisher/entity.

Naturally, if I use some non-standard feature (say, <aside>), I need to ensure that all four output formats can properly handle it. I can do that, as long as the underlying codebase provides the features to do so. Hence, I guess that's my main question: are rmarkdown/bookdown/blogdown/pagedown designed to reuse a single article in multiple books/sites with different styles/templates? Or is each rmarkdown source expected to be adapted for some specific template/output only?

For now, I want to use rticles ieee_article for LaTeX/PDF and distill for HTML only. Well, I'd like to see a combination between distill and bookdown's gitbook because each has useful features that are missing in the other. But, if I have to choose one to match a scientific article, that's distill.

@yihui
Copy link
Member

yihui commented Aug 14, 2020

Hence, I guess that's my main question: are rmarkdown/bookdown/blogdown/pagedown designed to reuse a single article in multiple books/sites with different styles/templates?

Good question. Short answer is, unfortunately, no. It's always a matter of trade-off. If you aim at a specific style/template, you will almost surely lose portability (i.e., the ability to adapt to other styles/templates). If you look for portability, you can't go very far with styling/theming. Think about how difficult it is for an author to submit the same paper to two different journals. I don't believe there is one ring to rule them all. I gave a talk on this topic this year and showed 14 demos based on the same source document. However, I might not have emphasized enough that you must keep the source document simple (e.g., only Markdown syntax but not HTML or LaTeX) in order to gain portability.

Personally I could only see hope in the HTML format. That's also the motivation behind the pagedown package.

@umarcor
Copy link
Author

umarcor commented Aug 21, 2020

@yihui, thanks a lot for your honest answer. Much appreciated.

I am gathering some notes and thoughts in the following article, and I added some of your refs: https://dbhi.github.io/mdpaper/. I'll be glad to fix or enhance it, should you find anything wrong or not exact.

Personally I could only see hope in the HTML format.

Since my background is academia (LaTex and Word), Sphinx and Hugo, just for the sake of discussion and sharing ideas, let me elaborate on this.

I understand your desire for having something easier to learn than LaTeX and yet providing a similar high quality result. I would also love to see an ecosystem where writers define the content once, and multiple visualizations can be used. Nevertheless, I fear that some of the arguments that compare the future of HTML and LaTeX are slightly biased.

My main point is that LaTeX does not compare to HTML. LaTeX is a typesetting system; TeX is Turing complete, and thus so is LaTeX. HTML is markup language, only. Hence, HTML sould be compared to LaTeX's common outputs: PS, DVI or PDF. On the other hand, the macro language that users need to write when defining the content should be compared to Markdown, reStructuredText or AsciiDoc. Last, but not least, pdflatex, xelatex, lualatex, etc. are to be compared with asciidoctor, pandoc, rstudio, relaxed, sphinx, etc.

Your expectation is for LaTeX to survive for 20-30 years. In that time, you expect the typesetting quality of web pages to catch up with LaTeX, but "LaTeX will probably never be able to catch up with HTML in other aspects". I beg to differ here. HTML and CSS have been around for 20+ years, and yet not a single publisher that I am aware of provides templates for papers, as they do with LaTeX or Word. It is surprising because most of them do provide HTML versions of the papers in their websites (even if academics submit them in PDF). That is probably because writing HTML and CSS from scratch is neither easier nor more versatile than using LaTeX. Hence, an HTML/CSS generator needs to be used, typically with a templating engine. Is any of the existing tools close feature wise? Hardly.

It took a decade for Donald Knuth to write TeX. His main conclusion after doing so was that "software is hard; it's harder than anything else I've ever had to do". LaTeX was not written until almost a decade later. And other 10-15 years were required until latex2e was released. All the people involved in the process were, and still are, brilliant people. And equally brilliant people has been contributing packages to the ecosystem for 20-50 years. Although programming languages are arguably better today and machines are undoubtly more powerful, the intrinsic complexity of the logical problems to be solved did not change. Hence, anyone willing to provide a hardly comparable solution would need to invest 10-20 years, at least.

Nowadays, we all seem to agree on searching/proposing alternatives for LaTeX. It feels archaic, obscure, and verbose. Still, most developers seem to believe that starting their own project from scratch using their favourite language is better than trying to build on others, or (better) with others. Well, this is not completely fair. Many of the document generators which produce HTML and/or PDF did first start as domain specific packages and then evolved. In practice, most of the developers don't really have time for either learning another language or gaining knowledge about typesetting.

Anyway, we have many incompatible markup (markdown) extensions and multiple incompatible templating engines. So, e.g. IEEE would need to provide up to 6 additional templates for each type of document, in order to satisfy the following list : AsciiDoc (Ruby and Tilt), MkDocs (Python and Jinja), RStudio (R and ???), ReLaXed (JS and Pug), Sphinx (Python and Jinja), Hugo (go and go templates), etc. Still, most of them would only support a reduced subset of what the LaTeX ecosystem provides. Furthermore, an article written with one of those tools will likely produce a different result than others. Conversely, an article/book written in LaTeX 20 years ago will likely produce exactly the same output.

At the same time, some of the developers of Latex3, who have been working on it since the 90s, recently quit their jobs to devote most of their time to LaTeX. Hence, I would not ensure that HTML will catch up faster than LaTeX allows interactive content and/or built-in HTML output. This is very relevant: if LaTeX is able to produce HTML, the argument of your article falls off. This is because, as said, LaTeX is NOT (only) a format, it is a typesetting system.

As far as I am aware, none of the development teams behind asciidoctor, mkdocs, rstudio, relaxed, sphinx, or hugo is (significantly) larger than the one working on LaTeX. I do neither think that any of them is significantly more or less capable. Hence, unless some standardization effort is done which allows multiple groups to advance together, the ecosystem is likely to remain the same for very long. Naturally, it can be argued that developers using HTML, CSS and JavaScript will outnumber the developers with advanced knowledge of LaTeX. My point would be that the language is negligible compared to the complexity of the logic. That is, I find it very difficult for someone with very advanced typesetting and mathematical knowledge to devote 10 years to the task, regardless of the language.

A very illustrative comparison: tikz-timing (8y of development) and wavefrom (developed in 1y, active for 6y). Wavedrom is JavaScript, it can be embedded in Sphinx/Rstudio HTML outputs, it can also generate SVG, presentations can be done with impress.js. It's so cool! But, there are sequences which you cannot describe in Wavedrom, or which are significantly more verbose. Conversely, I did not find something which I could not describe with tikz-timing. Note that the limitations of Wavedrom are by design, in order to reduce the development and maintaining burden. Moreover, it should have been straightforward to add interactive rulers and markers, since it's a web site and waveforms are generated in JavaScript. Still, since it was not considered at first, probably it would need to be rewritten from scratch.

Focusing on Markdown, it seems quite notorious that some extension is required for handling bibliografies, cross-references between documents and importing/including other markdown files. There is CommonMark, but that is not defined there. The de facto standard for bibliografies is pandoc's format, which supports CSL. In fact, CSL is a very nice initiative. Still, different tools and/or templates handle bibliografies in different and incompatible ways. Actually, the templates in RStudio, which is a subset of the ecosystem, do have consistency issues with this very basic feature. Precisely, distill requires bibliography: refs.bib and cannot use biblatex format, while bookdown supports bibliography: ['refs.bib'] and can combine biblatex, biber and CSL. This is annoying not only because the frontmatter of rmarkdown files needs to be different, but because the same *.bib file cannot be used for both templates. That defeats the whole purpose of using a standard format for defining the content of the references. Still, this is just an example that illustrate the many isues that arise when trying to use "alternatives to LaTeX".

From a wider perspective, a standardized intermediate representation or abstract tree for documents should exist, mimicking LLVM. That would allow different frontends and backends to be decoupled. In some sense, this is what pandoc provides. However, it seems not be clearly exposed, documented or supported, because AFAIAA all the projects pre-generate (markdown) sources and then execute pandoc as an external tool.

Of course, this is not to bluntly criticize your work. As said, I can only congratulate both of you for the work you did during these last 5 years and what you are still doing. Although specific to R and non-standard, I believe that rmarkdown is one of the most complete approaches based on markdown only; I'm using both distill and bookdown in several of my repos. Moreover, it is not your duty to solve such an enormous integration/standardization effort. Yet, I believe you might want to take care about the fragmentation in your own ecosystem. Details such as inconsistencies in the bibliografies, rstudio/bookdown#918, asides being differently defined/used in distill and tufte, etc. do add layers of complexity to environments that are quite confusing already.

Once again, thanks a lot for your clarifications and references.

@jjallaire
Copy link
Member

@umarcor Thank you for the incredibly detailed and lucid discussion on "where we stand" with sophisticated document generation from markdown. Thanks also for https://dbhi.github.io/mdpaper/. I agree that we aren't even close to where we need to be, and I also agree (for the reasons you cite) that it will be hard to evade using LaTeX for print output for many decades to come.

We are going to try to unify these various piecemeal solutions, as well as try to invest more heavily in solutions that work across languages (i.e. don't have a hard R dependency).

At a more tactical level, I've update Distill to use Pandoc for bibliography generation (rather than the Distill JavaScript framework) so things should work more as you expect there: e8585bc

@umarcor
Copy link
Author

umarcor commented Aug 25, 2020

@jjallaire, thank you very much for your feedback. I was afraid you might take it wrong, because I pointed several very specific caveats of your solution (as explained 'mdpaper' is not "fair" in this regard). Hence, I'm really glad to see that you accepted it as a lucid (sic) discussion, which is the very essential purpose.

Also, please let me thank you for reacting so fast with regard to Distill and Pandoc 🎉. However, do not let the discussion alter your agenda. As said, you are already doing a very nice work, and you have some very interesting tasks ahead of you. I believe this discussion is just for the sake of knowledge 🤔 , so that others approaching this field can have a rough but broad introduction. That is, I would and will recommend rmarkdown to users/colleagues with some technical knowledge/requirements, but which are not developers. OTOH, mdpaper is for developers who foresee that, whichever tool they pick, they will need to write custom extensions/features.

@jjallaire
Copy link
Member

We are definitely aware of the deficits here, and your writing puts a much finer point on how/why each of them matter. Will be working hard over the next few years to remedy them as best we can 😄

@fkohrt
Copy link

fkohrt commented Dec 17, 2021

For others reading this: https://dbhi.github.io/mdpaper/https://dbhi.github.io/docascode/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants