Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: internal links to tables and figures and headers #813

Open
GeraldLoeffler opened this issue Apr 4, 2013 · 149 comments · May be fixed by jgm/pandoc-types#81
Open

New Feature: internal links to tables and figures and headers #813

GeraldLoeffler opened this issue Apr 4, 2013 · 149 comments · May be fixed by jgm/pandoc-types#81

Comments

@GeraldLoeffler
Copy link

GeraldLoeffler commented Apr 4, 2013

It's currently possible to include internal links to sections. I'd like to propose a similar feature for links to figures/images and tables.

It may make sense to provide this feature only if the figure/image or table that is being linked to has a caption. In that case Pandoc can today automatically generate a number for the figure or table and include it in the caption, e.g. "Figure 15".

At the most basic, the text of the link would be provided by the user, as is currently the case for links to sections.

Of course it would be very convenient if the automatically generated number for the figure or table would also be used for the text of the link, e.g. "as can be seen in Figure 15, blah", where "Figure 15" would be the internal link whose text is auto-generated from the figure it points to.

@kovla
Copy link

kovla commented May 3, 2013

That would be lovely indeed. In academic writing it is quite often necessary, and while automatic numbering of figures and tables is nice, it really should be linked to what is in the text.

@nichtich
Copy link
Contributor

nichtich commented May 21, 2013

One could use the figure caption as link target, similar to links to captions:

![la lune](lalune.jpg "Voyage to the moon")

...is shown in figure [la lune]...

And/or without automatic generation of link text:

...is shown in [the figure](#la-lune)...

See also issue #615 on automatic numbering of figures and tables in HTML output.

@liob
Copy link

liob commented Jul 23, 2013

I concur. However, @nichtich suggestion breaks the current syntax. Maybe a less intrusive approach would be a syntax like:

![Voyage to the moon](lalune.jpg){la lune}

It would be great to be able to reference figures. As @nichtich said: it is nearly a requirement in academic writing.

@jgm
Copy link
Owner

jgm commented Jul 23, 2013

A more consistent format would be

![Voyage to the moon](lalune.jpg){#lalune}

See the current attribute format for headers.

@liob
Copy link

liob commented Jul 27, 2013

indeed, that is a more consistent format.

About the implementation:
I see 2 major ways to implement this feature:

  1. Emulate something like the latex figure environment and output the figure as image with plain text underneath. Very much like figures are handled now in docx format, except that you put "Figure 1:" at the beginning. This would be the most portable way and should be fairly easy to implement in all format writers. However, than pandoc has to keep track of the references itself for cross referencing.
  2. Implement it the "proper" way in the corresponding format writer. Sticking with the docx example: Adding a caption to the image and then cross reference it in the text.

Can anybody (@jgm ?) make an educated guess on how much work either of the solutions will be?

@aaren
Copy link

aaren commented Sep 10, 2013

I agree - this is essential for academic writing. I wish I knew Haskell!

The current way around this, in the mailing list discussion, is functional but clumsy.

Would this mean using \autoref in the latex? Then from markdown input:

...is shown in [the figure](#la-lune)...

you would get the latex output:

...is shown in \autoref{la-lune}...

@AvverbioPronome
Copy link

AvverbioPronome commented Sep 19, 2013

![Voyage to the moon](lalune.jpg){#lalune}

I just tried to write something like

Some text

![Bla blah](pic.png)   {#something}

Some other text

I was surprised that did not work. It showed the image without caption, and a raw "{#something}" afterwards.

I assumed curly braces were for assigning attributes to anything... :D

@CFCF
Copy link

CFCF commented Nov 16, 2013

A workaround with numbered example lists is added to #904

For my purposes, this method works well with docx.

@Utsira
Copy link

Utsira commented Mar 29, 2014

I agree that being able to reference figures is essential to academic writing. The workarounds linked to above aren't really satisfactory, in my opinion

![Voyage to the moon](lalune.jpg){#lalune} would be perfect

@srhb
Copy link

srhb commented Apr 24, 2014

Similar syntaxes would be very interesting for equations, too. In fact, why not adopt a completely general syntax? It would be especially nice if it could carry over to LaTeX bits, once you have to bail out and use say \begin{align} and friends.

@frederik-elwert
Copy link
Contributor

frederik-elwert commented Apr 25, 2014

I have sympathy for the numbered example list approach, mainly for two reasons: Firstly, what we want are not really links but references, and secondly, the use case for numbered example lists is already close to, e.g., numbered equations. The example from the docs is close to a typical use case for figure references:

(@good)  This is a good example.

As (@good) illustrates, ...

This mechanism can already be used for figure references, as CFCF pointed out:

![Figure (@primitive_hut): The primitive hut](Illustrations\primitive_hut.png)

As can be seen in Figure (@primitive_hut), huts may be primitive.

# Index of Figures

(@primitive_hut) *Primitive hut* from the frontispiece of Marc-Antoine Laugier’s 1755 second edition of *Esssay on Architecture*, illustration by Charles-Dominique-Joseph-Eisen.

However, there are a few drawbacks:

  • You currently need an index of figures, since example lists require the (@id) to be at the beginning of a line at least once.
  • You have to add the Figure (@id): bit to the caption manually.
  • This breaks LaTeX/PDF output, since LaTeX adds a “Figure” prefix itself.

Thus, a proper referencing scheme would need a bit additional thinking. Especially, PDF and HTML output should work alike, probably by pandoc adding the Figure: bit to HTML output, while leaving it to LaTeX in the PDF case. Additionally, this should also work for referencing numbered sections, like in see chapter (@mychapter).

@btel
Copy link

btel commented Apr 25, 2014

Your workaround works as suggested, but I had to remove the parentheses when referencing the label, otherwise they were rendered in the output. After this modification my example looks like this:

Figure @figure is about being in time

![Figure @figure: Cubes](cubes.png)

(@figure) Figure 1

To remove the automatic numbering in LaTex (Figure 1:, etc.) you can add to the template:

\usepackage[labelformat=empty]{caption}

After rendering to pdf this produces the following output:

figure

@bitsgalore
Copy link

bitsgalore commented May 8, 2014

Just came across this issue as well and ended up here. I'm also really in favor of support for the syntax suggested by @jgm above:

![Voyage to the moon](lalune.jpg){#lalune}

Especially since this is the standard way of dealing with this in PHP Markdown Extra:

http://michelf.ca/projects/php-markdown/extra/#spe-attr

@mangecoeur
Copy link

mangecoeur commented Jul 10, 2014

Has there been any developments on this? It also seems to me that @jgm suggestiong

![Voyage to the moon](lalune.jpg){#lalune}

is the most consistent internally and with other tools. What would need to happen for this to be implemented?

@edwardabraham
Copy link

edwardabraham commented Jul 23, 2014

I was wanting to add support for this addition to the syntax. When trying to replicate papers using markdown for the scholmd project, this is the feature that stands out as most needed by Pandoc . In short this can be addressed through the general use of {#lalune} for labelling elements, and of @lalune for referencing the number of the corresponding element. The syntax (@) may be used to number elements that are otherwise unnumbered.

A general syntax for labels {#lalune}, that are associated with the preceding element would allow for any element to be labelled (paragraphs, equations, tables, etc.). By associating the label with an element in the abstract syntax tree, the properties of the element would be available when the reference was made, and so they can be numbered appropriately. This syntax is already used in one context in Pandoc (section heading labels), and is used by PHP Markdown extra. For elements that don't have numbers, such as equations, the syntax (@) may be used (from the example_lists extension). So an equation would be numbered and labelled as $$ F = G{m_1 m_2 \over r^2$$ (@) {#gravity}. (An alternative could be to use the example_lists extension style and number and label it in one go as $$ F = G{m_1 m_2 \over r^2$$ (@gravity). There are clearly some details and edge cases to be thought through here.)

When the document is rendered, Pandoc would associate a number with each labelled element, based on its type, and its position in the document. This logic would need to be carried out in Pandoc, so that it was available to the range of back-end writers (including HTML). The philosophy would be similar to Pandoc-citeproc, which carries out its own formatting of citations, rather than delegating to writers that support this approach (such as bibtex for latex). An option would to have this behaviour depend on the backend (so that it in latex it inserts \label and \ref commands), but elsewhere it may insert calculated numbers, if referencing is not supported by the backend. This has the advantage that it will work easily in contexts where only a fragment of the document is rendered. If pandoc is calculating the numbering, a syntax would be needed for specifying the start numbers in a fragment that wasn't being compiled in stand alone mode.

Labelled elements may be linked to, with the @ symbol being used to indicate the reference. So
a trip to [the moon](@lalune) would be an anchor link to the element labelled {#lalune}. In this case the text is rendered as a trip to the moon.

The syntax The moon is illustrated in Figure @lalune may be used to insert the number of the referenced element, as well as a link to that element, with the text rendered as The moon is illustrated in Figure 1. This follows the syntax used for referencing numbered lists with the example_lists extension.

A further syntax could be to use square brackets [@lalune] to insert the type and number of the element that is referenced, similar to the behaviour of latex's \autoref command. So, the moon is illustrated in [@lalune] would be rendered as the moon is illustrated in Figure 1 (including a link to the anchor). To implement this feature would require some localisation or customisation capability, so that the word used to describe the element could be specified. In its simplest, this customisation could be put in the YAML header, with for example figure_label: Fig. if the style required a shortened label. The syntax for the reference, [@lalune], is the same as is used by the pandoc-citeproc library, so it would be overloading that usage to implement a self-citation. Pandoc would have the information on the context that is needed to either format it as a citation, or as a reference, assuming that there was no collision between the labels and the citation keys.

@kovla
Copy link

kovla commented Jul 23, 2014

@edwardabraham It must be pointed out that the syntax [@lalune] is already used in pandoc for bibliographical citations.

@timtylin
Copy link
Contributor

timtylin commented Jul 23, 2014

@kovla @edwardabraham I don't see why #lalune couldn't be used also as a reference to the defined symbol. With this scheme [the moon](#lalune) could be a normal text link to the figure, while [#lalune] could do the numbered reference thing as mentioned. In fact, I have a custom build of Pandoc that does exactly this.

@edwardabraham
Copy link

edwardabraham commented Jul 24, 2014

@kovla The idea was to deliberately overload the [@lalune] syntax that is used for citations. The reason being that references to another part of the document are similar to citations (in essence they are self-citations). This has the benefit of avoiding introducing additional syntax. During processing the filter would identify which element the label was attached to, and use that information to appropriately format the text that is inserted into the document.

@evitaerc I prefer using the @ symbol, as it extends functionality that is already used by example_lists. Are you able to structure your pandoc build so that it may be implemented as a filter?

Note that this extension is [@lalune] a convenience and is not necessary, provided that the numbers are able to be accessed through the @lalune method.

@timtylin
Copy link
Contributor

timtylin commented Jul 24, 2014

@edwardabraham I tried the @ approach as well, but internal feedback in our lab showed that people get confused by what is a citation and what is an internal reference even when editing. The conclusion was that the mental model of keeping # for internal refs and using @ for external refs is the simplest to grok.

In fact, no one out of ten or so people have used example_lists (we are mostly writing extended abstracts and journal papers in the field of physics/engineering/applied math). When encountering a "list of scenarios" situation, the content was so static that people simply used literal numbers without issue.

Unfortunately the internal reference mechanism required heavy modification of the Markdown reader (additional state must be kept during the parsing process) and a custom AST, so I can't conceive of a filter implementation in the near future.

@elcritch
Copy link

elcritch commented Jul 24, 2014

Personally, the # symbol and the {#label} syntax would be easier to understand and use. In my mind citations and internal references follow very distinct "mental models". Many academic papers use distinct numbering for figures, tables, and equations but the proposed syntaxes don't appear to have a way to support distinct numberings by type. It would be an important design criteria (I only got to skim the comments, hopefully its not a redundant suggestion).
@edwardabraham You mentioned the scholmd. Is it currently just a repository of ideas or have they implemented any of the academic markdown features?
@evitaerc Great work! Is it possible for you to propose submitting the changes to the pandoc project or alternately creating a github fork to allow others to experiment?

@mangecoeur
Copy link

mangecoeur commented Jul 24, 2014

+1 for use of # symbol for internal references. But it's really important that the references can distinguish between equations, figures, and tables to have distinct numbering sequences.

There are two approaches to my mind

  1. make the "thing referenced" explicit in the tag, for instance using namespaces like #eqn.maxwells, #fig.hockeystick. Pandoc would have to track the objects in each namespace and format the references appropriately
  2. depend on pandoc's parser to know what type of thing is referenced and handle appropriately. So if you tag an image and then use a # reference pandoc automatically treats it as a "fig" reference, if you embed latex formula it because an equation reference etc. This would be cool but i suspect it would be a) complex and b) fragile - you get issues for instance if someone wants to embed an image for a formula.

@tomduck
Copy link

tomduck commented Sep 18, 2019

I am pleased to announce the 2.0.0 release of the pandoc-xnos filter suite:

  • pandoc-fignos, for numbering figures and figure references;
  • pandoc-eqnos, for numbering equations and equation references;
  • pandoc-tablenos, for numbering tables and table references; and
  • pandoc-secnos, for numbering section references (pandoc does the section numbering).

The filters emerged from recommendations made by the community in this thread, and in particular this post by @scaramouche1.

@despresc
Copy link
Contributor

despresc commented Sep 5, 2020

For the label/ref problem, labelling itself is pretty simple: they're just opaque identifiers, though some document systems (like some LaTeX packages and the existing reference-providing filters) have prefixed labels like thm:thing. (Incidentally, my preference is for future Markdown syntax not to require any internal structure on labels, beyond, say, what citations already require).

Numbering things and rendering references to them, on the other hand, strongly resembles the process of generating citations and bibliographies, and the ways that can be done vary almost as widely. Typing of numbered things, choosing how to insert numbers in titles, reference prefixes, configuring numbering with counters, modifying counters in the text, and automatically generating identifiers can all be supported and configured.

So It will be hard to choose exactly how Pandoc will number things and render references, and what configuration will be allowed. It could be as complex as LaTeX, but I'm not sure if that complexity is welcome in pandoc itself (maybe it is?). The Markdown syntax for refs will also have to be chosen, though I imagine it will operate somewhat like the citation syntax does currently, judging from the discussion in the thread above.

Ideally, the intermediate representation would be modified so that in principle a filter could perform numbering and reference rendering like pandoc-citeproc does for citations, potentially more complexly than pandoc itself would. This can be done without settling the other issues. In the simplest design, labels (as identifiers) and numbers (if at all) can be stored in the Attr that we have now, requiring no IR change there. References should get their own element, and based on the current Citation type, the following could work:

-- Support for labelling more things can be added by adding Attr to more types.
data Inline
 = ...
 | Ref [Reference] [Inline]
 ...

-- Might want to record whether or not it's a page reference for
-- paginated formats like TeX.
data Reference = Reference
  { referenceId :: Text
  , referencePrefix :: [Inline]
  , referenceSuffix :: [Inline]
  , referenceMode :: ReferenceMode
  , referenceHash :: Int
  }

-- The main modifier of a reference at the reference site itself
-- is how to render a prefix, if at all. 
data ReferenceMode
  = UpperCasePrefix
  | LowerCasePrefix -- may not be needed?
  | SuppressPrefix
  | NormalReference

The intent is to support using Ref like Cite is right now in the readers, to store a sequences of references from a compound reference and the text of what was parsed.

Slightly off-topic, but I have no idea what the citationNoteNum in Citation does. I'm not sure if it's used at all in the core pandoc packages. What is it for?

@despresc
Copy link
Contributor

despresc commented Sep 5, 2020

If numbers (meaning the full rendered number, like "2.4.1") were stored in the Attr of the numbered thing (to expose them to other filters), it would be wise to agree on a particular key for them. Having it be number is the easiest, I guess.

@jgm
Copy link
Owner

jgm commented Sep 5, 2020

We already use number in sections (after makeSections), so yes, I agree on that.

With Ref, I guess your idea is that the Ref elements will be postprocessed by a filter or built-in transformation, as Citations are now. The [Inline] part will be replaced by the rendered reference. That makes a lot of sense to me.

citationNoteNum -- I don't think it is used. In pandoc-citeproc the citeNoteNumber is taken from it, but since (as far as I can see) it's always 0 this never makes any difference. This type originates from citeproc-hs and probably needs some adjusting, especially as I go forward with the new citeproc processor. I can see why a field such as this would be needed. Some styles include back references like "Op. cit., n. 13" where you have to know the note number in which a particular citation occurs. In my current citeproc implementation, we get these numbers by assuming one note per citation -- but of course that breaks if you have a document containing both citations and footnotes, and you're using a footnote citation style. In that case we'd need some way for pandoc to tell citeproc, "This citation would be the Nth rendered note." I see no reason why we can't simply use the existing field for this -- it's probably what it was intended for.

@despresc
Copy link
Contributor

despresc commented Sep 5, 2020

Yes, internal references are enough like citations that I thought the same sort of representation and handling would be good, since Cite seems to work well in practice.

I think for the writers that didn't support citations (all of them initially), the fallback would be exactly what the fallback for Cite is now: just attempt to render the [Inline] content if possible.

@despresc
Copy link
Contributor

despresc commented Sep 6, 2020

If citationNoteNum is intended for that purpose, then there probably won't be any need for the analogous referenceNoteNum. I'm not sure I've seen an ibid. used with a reference before.

@tstenner
Copy link
Contributor

tstenner commented Jun 8, 2021

For some internal manuscripts I've put together a Lua filter that handles most cross references.

It currently assigns IDs to tables and equations based on attribute blocks at the end of the caption (i.e. : Caption for this table {#tab:example}) and surrounding spans for equations ([$$a^2+b^2=c^2$$]{#eq:pythagoras}).
In the next step, citations starting with a prefix (fig:, tab: etc.) are replaced with a link to the element or natively counted references (LaTeX + docx).

It's not meant as serious competition to the excellent pandoc-xnos, but rather as testing ground for new features (i.e. table attributes) and pandoc-xnos compatible implementation for the most basic needs.

@N0rbert
Copy link

N0rbert commented Nov 24, 2021

I tried to summarize current out-the-box pandoc LaTeX → docx experience in the question at StackOverflow. With test document and pandoc simple.tex --to docx --output simple.docx --table-of-contents --toc-depth 5 --number-sections --citeproc --verbose --csl ieee.csl command I obtained the following docx-rendering:

image

I see many strange things:

  • [fig:image] instead of Figure with number;
  • [tab:table] instead of Table with number;
  • [eq:eq] instead of equation number;
  • [exm:code] instead of example number.

Hope you will provide official out-the-box pandoc solution for it without third-party filters and so on. Do we currently have a solution, which I probably missed?

@jgm
Copy link
Owner

jgm commented Nov 24, 2021

@N0rbert one thing you're missing is the native_numbering extension. (The reason this isn't enabled by default is that it interferes with the popular filter pandoc-crossref.) If you do -t docx+native_numbering, then the situation improves a little bit: you get

Figure 1: [fig:image] Image

Table 1: [tab:table]

There's some low-hanging fruit here:

  • We should be able to get rid of the printing of the label ([tab:table] when native_numbering is specified. And maybe we should get of it in general for tables and figures, since we're getting a number for the \ref{} even without native_numbering.
  • We should also be able to resolve references to the example environment; after all we're creating a number for it.
  • It would be nice if native_numbering could be enabled by default; we need some other way to work around the problem described in support -native_numbering for docx #7499. (If pandoc can recognize when an external filter has already added the number, it can avoid doing so in that case. Perhaps a crude way to achieve this would be if pandoc-crossref added something to metadata that we can check. @lierdakil @jjallaire any thoughts?)

Of course, that still leaves us without good references to numbered equations (indeed, without numbered equations).

@jjallaire
Copy link

jjallaire commented Nov 24, 2021

Yes, we could establish a protocol where filters set a specific metadata value to indicate that they have already handled numbering. Maybe for consistency w/ native_numbering we could set filter_numbering or filter-numbering (or filter_numbered, filter-numbered, etc.)

jgm added a commit that referenced this issue Nov 25, 2021
Previously we included the text of the label in square brackets,
but this is undesirable in many cases.

See discussion in
<#813 (comment)>.
jgm added a commit that referenced this issue Nov 25, 2021
- Resolve references to theorem environments.
- Remove Span caused by "label" in figure, table, and theorem
  environments; this had an id that duplicated the environments' id.

See #813.
@N0rbert
Copy link

N0rbert commented Nov 25, 2021

Thank you for quick reply. With -t docx+native_numbering document looks better.
I'll keep an eye on next releases to check the changes provided by last two mentioned commits.
Thanks!

@BishopWolf
Copy link

BishopWolf commented May 27, 2022

I'm using -t docx+native_numbering but the rendered docx file still does not contain any reference when using the \autoref{whatever-(equations, figures, sections, etc)}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.