Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plain TeX writer #1541

Open
jgm opened this issue Aug 16, 2014 · 27 comments
Open

Plain TeX writer #1541

jgm opened this issue Aug 16, 2014 · 27 comments

Comments

@jgm
Copy link
Owner

jgm commented Aug 16, 2014

I want to explore the possibility of adding a plain tex writer---one whose output can be processed by plain tex (or perhaps eplain), without latex or context macros.

What I have in mind is to have pandoc emit macros that are fairly closely customized to pandoc's own structural elements, and include definitions of these macros in the preamble of the default template. Then users could modify these macros to get the appearance they want.

@mszep
Copy link
Contributor

mszep commented Aug 31, 2014

I had thought about exactly this concept a little while ago, but I couldn't come up with a use case for this which isn't already served well by ConTeXt and LaTeX.

Since both of the above systems provide advanced functionality for specifying layout and apprearance, I think tweaking a hypothetical TeX template file to get the desired appearance would take the user much longer, if they don't already know raw TeX.

Do you have a specific application in mind where the extra low-level control would be worth it?

@jgm
Copy link
Owner Author

jgm commented Jan 2, 2015

One advantage, besides customizability, is that only a very minimal tex install would be needed to produce a PDF.

@mszep
Copy link
Contributor

mszep commented Jan 3, 2015

I've been thinking a bit more about this lately, and getting more interested.

I think my earlier question is missing the point; since any document can be typeset by any of the systems (plain TeX, ConTeXt or LaTeX).

The reduced dependency is nice when installing, but also when authoring, since it would bypass the clunky LaTeX and ConTeXt systems. Plain TeX might turn out to be a better match for pandoc's document model then the two higher-level macro packages.

@mb21
Copy link
Collaborator

mb21 commented Oct 26, 2015

Pandoc now has PDF generation using ConTeXt built in as well. ConTeXt Standalone states "ConTeXt macro files are small (less than 10MB), but the suite comes with various free fonts which considerably increase the size of the distribution to around 200MB)." There's also BasicTeX which comes in at around 100MB (which, however only contains two fonts and no bidi package etc). How does plain TeX handle unicode and bidirectional text btw?

@mikeshulman
Copy link

I have a use case for a pandoc plain tex writer: conversion of LaTeX snippets to Plain TeX snippets for sharing bits of code (in my case, math homework problems) between people who use different dialects of TeX. Is it ever likely to happen?

@shreevatsa
Copy link

Curious about this: if I understand correctly, what it would take to write a plain TeX writer is to emulate src/Text/Pandoc/Writers/LaTeX.hs and src/Text/Pandoc/Writers/ConTeXt.hs (currently 1501 lines and 546 lines long respectively).

  1. Is there some guide on how to do this (e.g. a complete description of Pandoc's internal representation, and what are the kinds of data that need to be “translated”)?

  2. Would the writer have to be written in Haskell, or is it just the preferred convention and (like filters) is there a possibility of being able to use some other language (say Python or Lua)?

@mb21
Copy link
Collaborator

mb21 commented Jan 18, 2018

@shreevatsa yes, we'd have to add a Writers/PlainTeX.hs in Haskell, since all the writers (as the rest of pandoc) is written in Haskell.

Is there some guide

Have a look at http://pandoc.org/CONTRIBUTING.html

Pandoc's internal representation

and https://github.com/jgm/pandoc-types/blob/master/Text/Pandoc/Definition.hs

However, before all this, we'd have to agree on what TeX exactly this plain writer would produce. That's what this issue is for... suggestions welcome :) At least it should be able to handle test/writer.native

@shreevatsa
Copy link

Thanks. I think @jgm already outlined a sensible approach in the issue description:

What I have in mind is to have pandoc emit macros that are fairly closely customized to pandoc's own structural elements, and include definitions of these macros in the preamble of the default template. Then users could modify these macros to get the appearance they want.

So for example the metadata ("date",MetaInlines [Str "July",Space,Str "17,",Space,Str "2006"]) (from writer.native) may turn into \date{July 17, 2006} and then there would be a definition of \date (or how to use the token list / boxes set by \date) in the preamble etc. We'd be re-implementing small bits of LaTeX/ConTeXt/eplain/opmac etc., which work similarly.

@jgm
Copy link
Owner Author

jgm commented Jan 18, 2018 via email

@Witiko
Copy link

Witiko commented Feb 3, 2018

As discussed in #4341, the witiko/markdown TeX package is an (unintentional) implementation of this idea (see an article introducing the package in TUGboat vol.39, no.2). The first step is to decide and document the TeX macros that will correspond to the individual elements of the AST; see section 2.2.3 of the witiko/markdown documentation to see the choices made by the package. Most importantly, the macros need to be prefixed (e.g. \pandocMetadata rather than \metadata) if Pandoc wants to co-exist with other TeX packages.

@Witiko
Copy link

Witiko commented Feb 4, 2018

Below are some of my assorted thoughts on this:

  • The resulting format should be reasonably easy to read and edit by hand even though it will be
    a serialization of the internal AST representation. The produced files will be passed around,
    hand-edited, and debugged by users; often, the original source codes will not be available.

    The format produced by witiko/markdown was not designed for readability and it makes
    debugging more difficult than it should be.

  • To expand on the above point, the resulting format should have an associated reader, i.e. the
    format should specify a machine-readable subset of TeX that is unambiguously convertible
    back to an internal AST representation.

    Different TeX formats use different category codes for characters. To give an example, the pipe
    character (|) is active in ConTeXt and the left angle bracket (<) replaces backslash (\) as the
    command prefix in XMLTeX. The most sensible way in my mind is to assume that the plain TeX
    character codes are used and expect a corresponding TeX package to switch character
    codes just before the document produced by Pandoc is included.

  • The format should provide a common interface. Macro packages for different TeX formats
    (e.g. plain TeX, LaTeX, ConTeXt, OPmac, XMLTeX, etc.) should be able specify sensible default
    definitions for the individual macros. Tools other than Pandoc should be able to produce output
    conforming to the format.

@brainchild0
Copy link

brainchild0 commented Apr 10, 2020

I would not wish to discourage anyone from attempting this work who might be enthusiastic about it, but for those attempting optimally to delegate development resources, I share some thoughts on the subject.

  • It's true that plain TeX distributions are smaller than LaTeX, but the benefit might be doubtful on laptops, desktops, and servers, as they run software no more complex that it was 25 years ago.

  • I have difficulty understanding how plain TeX is more suitable for the semantic document model. LaTeX presents itself as a semantic language, and furthermore boasts the distinction over its predecessor is the handling of very technical typographic considerations through the document classes that are provided to the end user, removing the need for such functionality to included in the document itself and for a user who understands how to create it.

  • For cases in which desired result is merely completely plain, printable output, without valuation of the heavy layout concerns championed by a full typographic engine, then a plain TeX writer might have value, but even so, a better choice might be ConTeXt, which seeks to remedy some of the distractions of TeX without further adding the complexity of LaTeX.

@jgm
Copy link
Owner Author

jgm commented Apr 10, 2020

The original intent of this issue was not to reduce dependencies but to enhance customizability. The emitted TeX would match as closely as possible pandoc's own document model, and all formatting would be done by macro definitions.

@brainchild0
Copy link

brainchild0 commented Apr 10, 2020

The original intent of this issue was not to reduce dependencies but to enhance customizability.

I see. The intention of #5879 and #5880. were to make the LaTeX output customizable. This approach preserves the benefits of LaTeX's classes and packages without loss of customization options.

@jgm
Copy link
Owner Author

jgm commented Apr 10, 2020

If we generate tex that matches the pandoc AST, one could always use LaTeX to define the macros and process the result with pdflatex. In a sense it would be generic tex -- you supply the macro definitions, which could be in plain tex or latex.

@brainchild0
Copy link

Would you not lose then the layout features and the macros provided by a document class?

@jgm
Copy link
Owner Author

jgm commented Apr 11, 2020

Would you not lose then the layout features and the macros provided by a document class?

I don't see why. The document class is specified in material that goes in the template; it's not generated by the latex writer currently. You could still use a template of your choice. The template would have to provide macro defs for all the pandoc commands.

@mb21 mb21 changed the title Plain tex writer Plain TeX writer Jun 7, 2020
@Witiko
Copy link

Witiko commented Aug 8, 2021

@jgm As discussed in my yesterday TUG 2021 talk, there is an effort by @drehak (see drehak/lunamark) underway to produce a writer that would convert Pandoc's AST to the TeX AST input (see the spec) of the witiko/markdown package. We would distribute the writer with witiko/markdown, as discussed in #4341, and then use the writer from the \pandocInput TeX command to typeset any document format that Pandoc can read and keep full control over the formatting. However, there are two caveats to using Lua:

  1. The Lua writer would be located in the TeX directory structure, where it's difficult to find by Pandoc. We can get around this by spawning the Lua writer in the current working directory when needed. This is feasible but convoluted.

  2. For plumbing, it would be useful to have a TeX AST reader as well. However, there is no concept of a Lua reader in Pandoc. If we'd like to use Pandoc's Lua interpreter, then we'd likely have to abuse RawBlock to perform a no-op conversion from the TeX AST to Pandoc's AST and perform the parsing in a Lua filter. This is feasible but convoluted.

This leads me to the conclusion that the best way forward in the long run would be to add a Haskell reader and writer for the TeX AST format of witiko/markdown to Pandoc. Would you merge such a contribution?

@jgm
Copy link
Owner Author

jgm commented Aug 8, 2021

Where is the TeX AST format of witiko/markdown documented, exactly? I'd like to take a look.

@Witiko
Copy link

Witiko commented Aug 8, 2021

There is a specification in the Token Renderers section of the user manual (HTML) and the technical documentation (PDF).

@jgm
Copy link
Owner Author

jgm commented Aug 8, 2021

Not crazy about the markdownRendererImage style names.
After all, pandoc isn't limited to converting from markdown. These are generic elements that are supported in many formats.
I'd be more likely to go for something generic like Image.

@Witiko
Copy link

Witiko commented Aug 8, 2021

Not crazy about the markdownRendererImage style names.
After all, pandoc isn't limited to converting from markdown. These are generic elements that are supported in many formats.

The \markdownRenderer… prefix determines the provenance (the Markdown TeX package) rather than the language.

I'd be more likely to go for something generic like Image.

Having shorter macros such as \Image will interfere with commands defined by TeX formats, packages, and users. Therefore, some namespacing will be required to be good neighbors with the preexisting TeX ecosystem.

The namespacing does not need to be the \markdownRenderer… prefix: We can set up arbitrary TeX commands with e.g. the \pandoc… prefix and with ~1:1 correspondence to Pandoc's AST. I can then independently map them M:N to my \markdownRenderer… commands in witiko/markdown.

@silby
Copy link

silby commented Aug 9, 2021

What I have in mind is to have pandoc emit macros that are fairly closely customized to pandoc's own structural elements, and include definitions of these macros in the preamble of the default template. Then users could modify these macros to get the appearance they want.

I have always assumed this would mean emitting a bunch of macros named like \pandocFoo or even \pdcFoo if you want it to be a little shorter. Doesn’t seem like it would be too annoying to deal with the namespace, especially if you assume part of the point of emitting plain TeX macros from Pandoc is that you’re not going to be hand-editing the TeX all that much.

@brainchild0
Copy link

brainchild0 commented Sep 7, 2021

I would be curious to understand how fully these proposed efforts may offer a foundation for expanding the LaTeX writer toward greater support for document abstractions (discussed earlier). Certainly, it would be valuable that any improvements would open opportunities of such kind.

@Witiko
Copy link

Witiko commented Jan 18, 2022

@jgm To give you an update, @drehak and I have since written a white paper (in Slovak, here is a machine translation to English) that discusses how the elements of Pandoc's AST can be mapped to the elements of the Markdown package for TeX. We have also produced a proof of concept that uses a Pandoc Lua Writer to convert any document understood by Pandoc to generic TeX, which can then be typeset using the Markdown package for TeX.

We plan to fully implement the Lua writer and the accompanying package for TeX and describe them both in detail in a TUGboat article that would appear in March. We will share the preprint with you when ready. Since the Markdown package supports plain TeX, LaTeX, and ConTeXt, Pandoc could then reduce some of its maintenance costs and receive support for plain TeX by replacing its writers for ConTeXt and LaTeX with a single writer that would produce generic TeX. This would be to our mutual benefit, because we could in turn stop shipping and maintaining our Lua writer for generic TeX.

@jgm
Copy link
Owner Author

jgm commented Jan 18, 2022

@Witiko excellent, I look forward to hearing more about this in a couple months!

@Witiko
Copy link

Witiko commented Mar 31, 2022

@jgm @drehak In Section 2.3 of our TUGboat 43:1 article preprint, we give an example of how our proof of concept can be used to directly typeset and style any document format understood by Pandoc in TeX:

2.3 Integration with Pandoc

Pandoc is a tool for converting between dozens of document formats. In our proof of concept, we integrate Pandoc with the Markdown package, so that we can typeset and style any document format understood by Pandoc directly from TeX.

To give an example, we have prepared a manual page wolf.1 in the roff language:

.TH WOLF "1" "2022-04-01" "wolf 1.0.0" "User Commands"
.SH NAME
wolf \- tool for befriending and scaring grandmas
.SH SYNOPSIS
.B wolf
[\fB-b\fR|\fB--befriend\fR]
[\fB-s\fR|\fB--scare\fR]
<\fIgrandma\fR>

Here is how we would typeset our manual page:

\documentclass{article}
\usepackage{pandoc-to-markdown, emoji}
\markdownSetup{
    renderers = {
        headingOne = {%
            \section*{\emoji{wolf} #1}%
        },
    },
}
\begin{document}
\pandocInput[format=man]{wolf.1}
\end{document}

Output:

🐺 NAME

wolf - tool for befriending and scaring grandmas

🐺 SYNOPSIS

wolf [-b|--befriend] [-s|--scare] <grandma>

Our proof of concept consists of a Lua writer that produces TeX commands corresponding to the abstract syntax tree of Pandoc and a TeX package that maps these commands to the renderers of the Markdown package. A rewrite of our Lua writer in Haskell will be offered as a basis of the upcoming plain TeX writer for Pandoc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants