Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit Figure element in Block #3177

Open
jgm opened this issue Oct 23, 2016 · 50 comments
Open

Explicit Figure element in Block #3177

jgm opened this issue Oct 23, 2016 · 50 comments

Comments

@jgm
Copy link
Owner

@jgm jgm commented Oct 23, 2016

Currently we represent figures in the AST using this hack: a figure is an Image whose title attribute starts with fig: and which is by itself in a Para.

Short of a full-featured figure environment in the AST, it would make sense to move to a less hacky representation: a Div with class figure containing the image (which need not have a title starting with fig:). This would involve changes to readers and writers.

Indeed, if we did this, we could support figures containing multiple images, via explicit Divs.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Oct 23, 2016

Another advantage is that attributes could be added explicitly to the Div.

<div class="figure floatRight">
![my image](img.jpg){.imageclass}
</div>

See #3094.

@hrehfeld

This comment has been minimized.

Copy link

@hrehfeld hrehfeld commented Oct 24, 2016

Seconded, I just spent an hour figuring out why multiple images in a paragraph don't create a figure. Is there any way to create a figure with multiple images right now? (I guess creating Rawblocks will work?)

Relevant code is here? https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/HTML.hs#L450 Why is the match only for one image and not multiple ones in the first place?

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Oct 25, 2016

If multiple images were allowed, which one would form the figure's caption? (There is only one caption.) What would determine how the images are arrayed in the figure? In retrospect, some kind of more explicit syntax for figures would have been desirable, and maybe that's the direction we should move in.

@hrehfeld

This comment has been minimized.

Copy link

@hrehfeld hrehfeld commented Oct 25, 2016

Layout: Hm, I only use html and latex backends, but both of those handle multiple images in a figure in a reasonable way without extra specification. However, in latex IIRC it makes a difference if there is a SoftBreak between images (break vs. no break). IMHO it would be up to the writer/backend to break up figures if multiple images are not supported.

Caption: None unless explicitely stated? That's legal in both html and latex iirc. However I believe Paras do not support extra data like attrs, so that a div or figure node would be easier.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Oct 25, 2016

+++ Hauke Rehfeld [Oct 24 16 16:42 ]:

Seconded, I just spent an hour figuring out why multiple images in a
paragraph don't create a figure. Is there any way to create a figure
right now (I guess creating Rawblocks will work?)

You can paste the images together into one image, I suppose,
or use a filter.

Relevant code is here?
[1]https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/HT
ML.hs#L450 Why is the match only for one image and not multiple ones in
the first place?

Good question. I suppose that since I'm in a field where
pictorial figures aren't used much, I didn't realize how
common figures with multiple images and a single caption
are.

But how would it work to allow a paragraph with multiple
images (and nothing else) to form a figure? From which
image would the figure's caption be taken? What determines
how the images are arranged in the figure (side by side,
in a square, etc.)?

Really we need a more explicit syntax for figures if this
kind of thing is going to be allowed.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Feb 26, 2017

I think we need an explicit Block element for Figure.

@jgm jgm added the AST change label Feb 26, 2017
@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Feb 26, 2017

The question is whether this figure element should only contain images, or if it should be a general floating-container more analogous to the LaTeX \begin{figure} and HTML5 figure elements (emphasis added):

Usually a <figure> is an image, illustration, diagram, code snippet, etc., that is referenced in the main flow of a document, but that can be moved to another part of the document or to an appendix without affecting the main flow.

If so, the figure element should contains a caption (multiple paragraphs allowed) and arbitrary block content:

Figure Attr [Block] [Block]
@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Feb 26, 2017

@jgm jgm added this to the pandoc 2.0 milestone Aug 15, 2017
@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 20, 2017

Development on figures branch for pandoc, pandoc-types.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 20, 2017

Thinking about about explicit Markdown syntaxes for figures.

If we had a native syntax for divs, we could treat any div with a following caption as a figure:

;--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)
;---
^^  [This is the optional short caption.]
    This is the long caption. It can span multiple blocks. 
    Syntax like footnotes.  

    Subsequent paragraphs indented.  We automatically treat a
    div as a figure if it is followed by a caption.  Or is it
    too confusing if the caption comes outside the div?

Another possibility would be to have a special kind of marking for figures, like:

!--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)

[This is the optional short caption.]
This is the long caption. It can span multiple blocks. 
Syntax like footnotes.  

In this version, all the Para elements at the end of the
structure are treated as the caption,
so we don't need an explicit syntax to mark the caption.
!---

I'm trying to avoid syntaxes that require you to use the word figure.

I like the ^^ syntax for attaching captions; we might want to use this for tables as well (INS: and code blocks) if we use it for this.

@jgm jgm changed the title Better handling of implicit figures Explicit Figure element in Block Aug 20, 2017
@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 20, 2017

TODO on figures branch on pandoc and pandoc-citeproc

  • Finish updating writers to handle Figure
  • Update readers
    • RST
    • Markdown
    • LaTeX
    • HTML
    • Org/Blocks
    • MediaWiki
  • Get everything to compile
  • Update tests
  • Update ToJSON, FromJSON instance in pandoc-types to include Figure, Caption
  • Update Arbitary in pandoc-types (it lacks Div, Figure, Span, probably others)
  • Markdown syntax, new extension?
  • Figure numbering and internal refs?
@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 20, 2017

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats. E.g. in docbook, only certain elements are allowed inside a figure element. http://tdg.docbook.org/tdg/4.5/figure.html

Perhaps instead the contents should be limited to a list of images (or perhaps a list of lists of images, so they can be organized on lines? -- though it may be better to let the layout happen automatically, given width information).

Perhaps listings could go in figures as well?

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 20, 2017

I think I'm going to remove this from the 2.0 milestone as it still needs more thought.

@jgm jgm removed this from the pandoc 2.0 milestone Aug 20, 2017
@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Aug 21, 2017

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats.

I still think taking the most general approach in the AST makes sense. There are always going to be some formats that don't support certain things, but that should be handled by the respective writers and the AST design shouldn't be held up by those. It would be great to have a general block figure element to output to HTML/ePUB/LaTeX...

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Aug 21, 2017

Concerning the caption syntax, I kind of prefer the second one, since it is clearly placed inside the figure/div element. A third variant:

;--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)
;--
This is the long caption. It can span multiple blocks. 

Syntax like footnotes.  
;--
This is the optional short caption.
Since it's optional, it needs to go at the end in this syntax.
;---
@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 22, 2017

What kinds of things do people really put in figures, besides images?

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Aug 23, 2017

Maybe the element I have in mind is more of a Float than a Figure.

Again the MDN extract posted above:

Usually a <figure> is an image, illustration, diagram, code snippet, etc., that is referenced in the main flow of a document, but that can be moved to another part of the document or to an appendix without affecting the main flow.

And from Wikibooks LaTeX/Floats:

Floats are containers for things in a document that cannot be broken over a page. LaTeX by default recognizes "table" and "figure" floats, but you can define new ones of your own (see Custom floats below). Floats are there to deal with the problem of the object that won't fit on the present page, and to help when you really don't want the object here just now.

Floats are not part of the normal stream of text, but separate entities, positioned in a part of the page to themselves (top, middle, bottom, left, right, or wherever the designer specifies). They always have a caption describing them and they are always numbered so they can be referred to from elsewhere in the text.

Usually it's tables and images that are floated, but it could also be source code, a poem, some sort of aside box etc. Even Docbook has a sidebar element.

Maybe the table AST element shouldn't have a caption, only the Figure element should have a caption. Current markdown table syntax with captions would be converted to Figure attr caption [Table a]. With the attr specifying whether the figure should float or be at that fixed position in the text, plus whether to list it in the list of figures/list of tables etc.

Summarizing, a Figure is an element that:

  • usually has a caption/title
  • can be listed in "List of figures" or similar TOC-like entity
  • can be referenced from other parts of the text (see #813), and
  • may or may not float (which is actually a layout decision).

I think it would be great to have the figure type in the AST for pandoc 2.0. Writing the code for the writers and reference generators etc. can be done later.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Aug 23, 2017

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Aug 26, 2017

I'm leaning currently towards the second option (general float/caption container). Use cases include floating more than just images (e.g. float two tables that share a caption), or having one figure with a caption, that contains subfigures (or images) with each having a caption, e.g:

It's probably true that it gets a bit trickier to consider all cases in all writers, but it is a more flexible option.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Sep 11, 2018

Btw, I'm not wild about autogenerating ids for these. I'd rather require people to specify ids manually. And this would be consistent with current Div syntax, which does require attributes.

@hrehfeld

This comment has been minimized.

Copy link

@hrehfeld hrehfeld commented Sep 12, 2018

I think you have most of the points down. The only thing I can add at the moment is:

  • I'm not sure what pandoc's philosophy is here usually, but only supporting the common denominator between all formats makes pandoc a much less attractive product to convert between formats that you specifically choose for their feature set. If I'm choosing latex and html as the formats, I would definitely expect pandoc to be able to convert figure elements between the two.
  • I'm frequently using pandoc's ast. An implicit figure is much harder to work with than an Explicit. It's the difference between node.classname is Figure and a bunch of nested node has child, child has caption, caption begins with "figure:", etc.
@hrehfeld

This comment has been minimized.

Copy link

@hrehfeld hrehfeld commented Sep 12, 2018

One more thought: maybe if would be nice to have some formal way of generating documentation on what actually gets lost in conversion between formats. Not sure if this is possible with code reflection and without a lot of manual work, but if you could do e.g.: pandoc conversion-loss -t latex -o odt and it would display a list of lossage in the conversion from latex to odt, it would make it much more transparent.

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Sep 12, 2018

It's fairly rare to have a captioned equation. But numbered equations are common.

Agreed, but If you google "equation caption", you'll see a lot of people asking for it, including a SO question with 40 upvotes (and google image search for some examples). Admittedly, if you google something you'll inevitably find it, so it may still be fringe use-case, but it does exist.

I'm not wild about autogenerating ids for these. I'd rather require people to specify ids manually.

I agree that's usually for the best, but eventually people will want to generate a toc-like list of figures in HTML, with links to the figures. So eventually someone will probably write a filter to autogenerate the ids of those figures that aren't referenced in the text. Just something to keep in mind...

if people use fifteen different labels, we'd need fifteen numbering sequences, and we'd need to override a lot of the automatic LaTeX figure machinery, leading to unidiomatic LaTeX.

My takeaway from the discussion in #813 was, that generating unidiomatic LaTeX is unfortunately not a good choice, because people need to submit it to journals etc. So I think we're stuck with having to do both: generate idiomatic LaTeX, and reimplement an equivalent logic for HTML etc. However, the fig: prefix in LaTeX is just a convention as well, so maybe the people that need idiomatic LaTeX should just stick to those conventions, even in their markdown.


Anyway, the big question: Should we have figures that act as generic floats for all sorts of content, or should we restrict readers to fallback to divs with class figure, or even strip figures that don't contain 'canonical' content? While I like the idea of keeping things as simple as possible (e.g. in terms of permutations of possible inputs to consider), restricting the markdown input doesn't really solve the problem, since we still have to consider other inputs. For example, what should this output?

pandoc -f html -t markdown
^D
<figure>
  <figcaption>This is a figure of figures.</figcaption>

  <figure>
    <p><span class="math display">\[a^2 + b^2 = c^2\]</span></p>
    <figcaption>Pythagoras' theorem</figcaption>
  </figure>

  <figure>
    <pre><code>
    def hypotenuse_length(a, b):
      return math.sqrt(a*a + b*b)
    </code></pre>
    <figcaption>Python implementation</figcaption>
  </figure>
</figure>

Finally, about the AST question, I guess you're best qualified to decide, since you already started implementing this on the figures branch. I would imagine, that it's easier to pattern-match on Figure attr capt blk instead of Div attr ((Para (Image i)):capt). But yes, it would break backwards-compatibility. But maybe we could works towards pandoc 2.5, along with PageBreak and the new Table elements? I'm happy to contribute code once the design is decided.

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Sep 12, 2018

@hrehfeld Thanks for your input!

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Sep 12, 2018

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Sep 14, 2018

Yet another syntax variant:

::: {}
![](gull.jpg)

> A gull
:::
  • Using the contents of the last blockquote in a div as the caption.
  • Can include arbitrary block content in both content and caption.
  • Obvious syntax and fallback for other markdown parsers, even though a blockquote is of course not semantically comparable to a caption.

For the short caption, the quote should end in a span (or maybe don't require the span-attribute and maybe require it to be in its own paragraph?):

::: {}
![](gull.jpg)

> A gull with a very long caption
>
> [A gull]
:::
@lierdakil

This comment has been minimized.

Copy link
Contributor

@lierdakil lierdakil commented Sep 14, 2018

A div would be treated as a figure if it starts with a paragraph containing one or more images (and nothing else).

For my 2 cents, I'm not at all a fan of this idea, tbh. In my experience, explicit is almost always better than implicit, and it raises a question of "how do I make a proper div with an image that's not interpreted as a figure". Some sort of extension (kinda like implicit_figures) might work, e.g. impicit_figure_divs, so that "figure divs" would require an explicit figure class without it enabled, but that sounds very similar to what we're trying to avoid here, no? My vote here is for non-ambiguous syntax.

As for subfigures, I kinda like my approach taken in pandoc-crossref. Subfigures are a list of Para of Image (with optional SoftBreak or Space inbetween). So, f.ex.,

:::{#fig:subfigure}
![a](image-in-row-1-1.png)
![b](image-in-row-1-2.png)
![c](image-in-row-1-3.png)

![d](image-in-row-2-1.png)
![e](image-in-row-2-2.png)
![f](image-in-row-2-3.png)

Figure caption
:::

will be rendered in 3x2 grid kinda like this:

a b c
d e f

Figure caption

Line breaks and spaces are optional, this would be equivalent:

:::{#fig:subfigure}
![a](image-in-row-1-1.png) ![b](image-in-row-1-2.png)![c](image-in-row-1-3.png)

![d](image-in-row-2-1.png)![e](image-in-row-2-2.png)
![f](image-in-row-2-3.png)

Figure caption
:::

Note, however, that this syntax kinda only makes sense if figures are explicit in Markdown syntax, with none of that "if a Div contains X, then it's a figure" business. Otherwise, too much overlap for my taste.

If you go implicit figure divs route, I'd like to at least have an explicit marker for figure captions, that is, I'm not really a fan of using the first/last Para that's not an Image. Perhaps taking a page from current table caption syntax might work, requiring captions to have <word>: in the first paragraph (where <word> is any sequence of characters, including no characters, without spaces/breaks)? I.e.

:::
![](some-image.png)

: Caption
:::

Otherwise, it'd be very tricky to do something like

:::{.warning}
![](screenshot-snippet.png)

If you see something like this on your screen, your computer is about to explode, RUN!
:::
@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Sep 14, 2018

For my 2 cents, I'm not at all a fan of this idea, tbh. In my experience, explicit is almost always better than implicit, and it raises a question of "how do I make a proper div with an image that's not interpreted as a figure". Some sort of extension (kinda like implicit_figures) might work, e.g. impicit_figure_divs, so that "figure divs" would require an explicit figure class without it enabled, but that sounds very similar to what we're trying to avoid here, no? My vote here is for non-ambiguous syntax.

In pandoc we've tried to avoid using English language words to mark things, which tells against using an
explicit figure class. (A document in Chinese should not be strewn with latin characters.) My guess was that divs that start with a paragraph containing just images but aren't meant to big figures are rare enough that the implicit approach would work (together with a way to defeat it in rare cases, e.g. a nofigure class). This would be much like the current implicit_figures extension, which hasn't been a big problem from that point of view (paragraphs just containing an image being pretty rare).

:::{.warning}
![](screenshot-snippet.png)

If you see something like this on your screen, your computer is about to explode, RUN!
:::

I'll concede that this is a good counterexample to my claim!

As for subfigures, I kinda like my approach taken in pandoc-crossref. Subfigures are a list of Para of Image (with optional SoftBreak or Space inbetween). So, f.ex.,

![a](image-in-row-1-1.png)
![b](image-in-row-1-2.png)
![c](image-in-row-1-3.png)

![d](image-in-row-2-1.png)
![e](image-in-row-2-2.png)
![f](image-in-row-2-3.png)

Figure caption
:::

will be rendered in 3x2 grid kinda like this:

I like the grid idea and propose we adopt it in whatever we end up with. (Do you actually use a tabular to render this, or just let the images adjoin each other on the line?)

I'm not really a fan of using the first/last Para that's not an Image.

That's not exactly the proposal. Everything after the images part would be the caption. It can contain arbitrary block-level content, on this proposal. That may actually be a bit too liberal. For example, do we want to allow other captioned elements, like tables, in a caption? Probably not. But I can imagine lots of people who will want to have (non-captioned) tabular content inside a caption. One of the motivations for this issue is expanding what can go in a figure caption.

As I see it, our options are:

  1. Go with the full implicit div approach, with some way of manually disabling it (nofigure class).

  2. Require a more explicit marking of the div, using some English (or other) language terms. This could be fairly unobtrusive, as with your #fig:, but this would still jump out in a Chinese text.

  3. Try to be cleverer about the implicit div approach, putting more restrictions on what the container looks like. For example, as @mb21 suggests, we could require that it start with images and end with a blockquote, which would be the caption. Or as I mentioned earlier, we could require an hrule to separate the image part and the caption.

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Sep 14, 2018

If we use the dash in # my unnumbered title {-} as a precedent for "here comes an English word that shall not be spoken", we could do:

::: -
something
:::

Now, the question is: should - stand for .nofigure or .figure :S

@lierdakil

This comment has been minimized.

Copy link
Contributor

@lierdakil lierdakil commented Sep 14, 2018

Do you actually use a tabular to render this, or just let the images adjoin each other on the line?

Both, actually. There are switches (metadata variables) that control whether it's rendered as a table, or just adjoined images with line breaks. And it's a whole another story with LaTeX (long story short: subfloat from subfig package, and a mix of RawBlocks and RawInlines; it might be a good idea to use subcaption package instead)

It [caption] can contain arbitrary block-level content, on this proposal.

This... doesn't really make much sense to me, tbh. I struggle coming up with a reasonable example of a figure/float caption that would even consist of more than one paragraph, let alone contain tables or other block-level elements. But okay, maybe I'm just hung up on LaTeX limitations.

As I see it, our options are ...

Out of these three, I'm kinda leaning towards the last option, "blockquoted" caption in particular, since it looks the least ambiguous. That said, actual use might be somewhat cumbersome -- there would be lots of extraneous > which don't actually have any semantic meaning and are just there as a quirk of the syntax (which sounds rather bad if I put it this way I guess). It wouldn't be that bad for reasonably short captions though (1-2 paragraphs), and as I said, I can't imagine a reasonable use-case for something much longer than that.

If we use the dash in # my unnumbered title {-} as a precedent for "here comes an English word that shall not be spoken", we could do

Only I believe that should be

::: {-}
something
:::

for the sake of consistency?

Now, the question is: should - stand for .nofigure or .figure :S

My intuition would suggest that {-} disables something, so I would expect - standing for .nofigure. Maybe that's just me though.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Sep 14, 2018

It [caption] can contain arbitrary block-level content, on this proposal.
This... doesn't really make much sense to me, tbh. I struggle coming up with a reasonable example of a figure/float caption that would even consist of more than one paragraph, let alone contain tables or other block-level elements. But okay, maybe I'm just hung up on LaTeX limitations.

See #4229 for one request for multiparagraph captions. As far as I can see, there's no LaTeX limitation preventing this (except that if you have multiple paragraphs, you must specify the optional short caption argument to \caption). Certainly it might be sensible to have a list in a caption, or a block quote. See also #1024 for a proposal for block-level content in table captions. If this posed a problem in multiple output formats, that could be a reason for disallowing it. But it seems possible in LaTeX, HTML, ...

It's definitely worth making sure there's a strong reason for multiparagraph captions before we take that step.

there would be lots of extraneous > which don't actually have any semantic meaning and are just there as a quirk of the syntax (which sounds rather bad if I put it this way I guess).

Yes, I have the same reservation. The hrule proposal might be nicer, actually:

::: {#myfig}
![the image](img.jpg)

____
caption goes here
:::

I'm not wild about the {-} idea. {-} is already used for unnumbered headers. And it's pretty cryptic what it's supposed to mean.

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Sep 15, 2018

[block captions are] possible in LaTeX, HTML, ...

That's enough of a reason to make the AST so, I think.

About the hrule vs >: I really think both are fine. I agree that semantically, the hrule makes more sense. But for the most common use-case of one-paragraph captions, the > is less to type and takes up one line less.

But if we keep the current implicit_figures extension around for the most common use-case, then we can err on the side of making things more explicit for more complex figures: then I would be totally fine with requiring the attribute with id, and using one more line for the hrule.

@lierdakil

This comment has been minimized.

Copy link
Contributor

@lierdakil lierdakil commented Sep 15, 2018

But if we keep the current implicit_figures extension

Please do, because backwards compatibility. Rewriting all those documents using a new figure syntax would be a huge pain.


A side question: since as far as I understand the intention is to introduce a new AST element for figures specifically, why are we hung up on re-using existing syntax elements to define a new one? I mean, can't we invent a syntax for figures (or, more generally, "floats" in LaTeX terms) specifically?

@mb21

This comment has been minimized.

Copy link
Collaborator

@mb21 mb21 commented Sep 15, 2018

can't we invent a syntax for figures (or, more generally, "floats" in LaTeX terms) specifically?

We certainly could, but considering how long it took to agree on a native div syntax, the reasoning was to use something that resembles that, so that people don't have to learn yet another completely new syntax.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Sep 15, 2018

@lierdakil

This comment has been minimized.

Copy link
Contributor

@lierdakil lierdakil commented Sep 16, 2018

It's not decided to introduce a new AST element for
figures.

Okay then, the AST change label on this issue confused me.

The syntax issue is orthogonal to this.

Well, certainly. Obviously, any syntax could be within reason parsed as any AST element, that's pretty much the meaning of A in AST. There's a catch to that though, in my experience, it's good to have at least some correspondence between AST and the syntax, to keep parser complexity reasonably low and yourself reasonably sane ^_^

While pondering this, I came up with a bit of a middle ground between reusing div syntax and coming up with a new one, that might be agreeable. So here's a quick proposal:

We'd, generally, like to avoid ambiguity between figure divs and regular divs. Besides heuristics, classes seem like an obvious (and explicit) choice, but using english is discouraged for i18n reasons. But we don't really have to use alphanumerics for classes now, do we? So how about "figure divs" requiring ! class? (! because Markdown image syntax uses ! to differentiate from links -- seems like an obvious choice).

Case 1:
:::{#regulardiv}
![](picture.png)
---
This is a regular div, despite starting with an image and containing <hr>
:::

Case 2:
:::!
![](picture.png)
---
This is a figure div without an id (or possibly with automatic id?)
:::

Case 3:
:::{.! #figureId}
![](picture.png)
---
This is a figure div with an id
:::

Case 4:
:::! {#figureId}
![](picture.png)
---
This syntax doesn't really work as of Pandoc 2.2.1, but seems
like an obvious extension of case 2
:::

IIRC, CSS doesn't really understand non-alphanumeric classes anyway, so ! class by itself wouldn't be too meaningful in most output formats, consequently it doesn't seem to shadow anything in terms of functionality.

This syntax is also ambivalent wrt AST representation -- if down the line AST is changed to include explicit "float" elements, it's distinct enough as to not be considered "stealing" syntax from divs.

@jgm

This comment has been minimized.

Copy link
Owner Author

@jgm jgm commented Sep 16, 2018

@lierdakil

This comment has been minimized.

Copy link
Contributor

@lierdakil lierdakil commented Sep 17, 2018

Actually, Pandoc allows ! class. Well, kind of:

$ pandoc --version
pandoc 2.2.1
Compiled with pandoc-types 1.17.5.1, texmath 0.11.0.1, skylighting 0.7.2
$ echo -e ':::!\ntest\n:::\n' | pandoc -t native
[Div ("",["!"],[])
 [Para [Str "test"]]]

It doesn't parse in the attribute list though (that is, inside {})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.