Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Github-flavored Markdown anchors/id scheme #2821

Closed
rkitover opened this issue Mar 27, 2016 · 10 comments
Closed

Support Github-flavored Markdown anchors/id scheme #2821

rkitover opened this issue Mar 27, 2016 · 10 comments

Comments

@rkitover
Copy link

If you have a section heading like:

## -c cmd

It generates the link #c-cmd but it SHOULD generate the link #-c-cmd, so the link does not work.

The command is being run with -t markdown_github .

@jgm
Copy link
Owner

jgm commented Mar 27, 2016

+++ Rafael Kitover [Mar 27 16 08:24 ]:

If you have a section heading like:

-c cmd

It generates the link #c-cmd but it SHOULD generate the link #-c-cmd,
so the link does not work.

SHOULD, according to what? See the documentation for the auto_identifiers
extension.

@rkitover
Copy link
Author

github makes the anchor #-c-cmd, so I would expect --toc -t markdown_github to generate a TOC usable on github?

I'm using doctoc for this at the moment, which makes the right TOC links.

@jgm
Copy link
Owner

jgm commented Mar 27, 2016

Is the algorithm github uses for generating automatic
identifiers for headers documented anywhere?

+++ Rafael Kitover [Mar 27 16 12:38 ]:

github makes the anchor #-c-cmd, so I would expect --toc -t
markdown_github to generate a TOC usable on github?

I'm using doctoc for this at the moment, which makes the right TOC
links.


You are receiving this because you commented.
Reply to this email directly or [1]view it on GitHub

References

  1. Support Github-flavored Markdown anchors/id scheme #2821 (comment)

@rkitover
Copy link
Author

I have only found this:

http://stackoverflow.com/questions/2822089/how-to-link-to-part-of-the-same-document-in-markdown

and this:

https://gist.github.com/asabaylus/3071099

it seems that the section heading is separated into words, lowercased, and the words are separated with dashes.

pandoc does almost the same thing, but in the case of commandline options like ## -foo it removes the leading dash while github doesn't. My normal section links worked just fine when I used --toc -t markdown_github .

@jgm
Copy link
Owner

jgm commented Dec 31, 2016

If I recall, the motivation for this was that id attributes in HTML 4 had to begin with a letter, and so couldn't begin with a hyphen. This has been relaxed in HTML 5 (which GitHub is targeting).

We use a single automatic identifier generating scheme for all formats. Indeed, the identifier is assigned by the reader, which doesn't even know whether the ultimate output is going to be HTML5, HTML4, or both or neither.

So it wouldn't be a simple change to allow the hyphen at the beginning of an automatically generated identifier.

@mb21 mb21 changed the title wrong markdown TOC entries for sections with leading dashes Support Github-flavored Markdown anchors/id scheme Jan 27, 2017
@jgm
Copy link
Owner

jgm commented Feb 20, 2017

Related issue with headers starting with a number

A solution perhaps would be to have the readers allow ids that don't start with a letter, and then do some kind of global transformation in the HTML4 writer. (But then we'd also need to determine whether a similar transformation was needed in other formats, e.g. texinfo and docbook and latex...)

@jgm jgm added this to the pandoc 2.0 milestone Feb 25, 2017
@jgm
Copy link
Owner

jgm commented Mar 5, 2017

The relevant function (in Text.Pandoc.Shared) is inlineListToIdentifier, which as documented respects this constraint:

HTML identifiers must start with a letter, and may contain only letters, digits, and the characters _-.

inlineListToIdentifier is used in uniqueIdent, which is used in registerHeader from Text.Pandoc.Parsing, which is used in the Markdown reader (and LaTeX, HTML, MediaWiki, Org, RST, Textile, TWiki readers).

@jgm
Copy link
Owner

jgm commented Mar 5, 2017

One option would be a small change to inlineListToIdentifier -- removing the part that drops non-letters from the beginning. Then, in the html4 writer, we could strip non-letters from the beginning of identifiers and internal links.

However, this isn't quite enough. If you had 1a and 2a, for example, this method would result in duplicate identifiers in html4.

@jgm
Copy link
Owner

jgm commented May 10, 2017

No dots either (#3655) in GitHub ids.

@cesarjorgemartinez
Copy link

cesarjorgemartinez commented Aug 3, 2017

We have the same problem.
We have in github (GFM) documentation using our own numbering:

# 1. Title
...
[1. Title](#1-title)

When convert to any document format, doesn't resolve correctly:

Using the markdown_github, trying to convert to html, convert to follow:
Header:

<h1 id="title">1. Title</h1> (fail, must be id="1-title")
Link:
<a href="#1-title">1. Title</a> (correct)

Using --toc option, also do the same:

<ul>
<li><a href="#title">1. Title</a></li>
</ul>

Seeing a odt converted file, the title destination (right-click in the link, modify hyperlink, destination in the document), appear as 1.1. Title.

The GFM does (from github.com), to resolve links:

  • It downcases the header string
  • remove anything that is not a letter, number, space or hyphen
  • changes any space to a hyphen.
  • If that is not unique, add "-1", "-2", "-3",... to make it unique

Here pandoc is removing the first numeration if exist: ^\s*[0-9]+(\.[0-9]+|\.)*\s+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants