Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Heading IDs #181

Merged
merged 1 commit into from Aug 4, 2013
Merged

Generate Heading IDs #181

merged 1 commit into from Aug 4, 2013

Conversation

jasonkarns
Copy link
Contributor

This PR is a modification of #117 with the following differences:

  1. Don't generate extraneous markup (no anchor tags, just id attributes)
  2. name attribute is obsolete in HTML5. id attribute is recommended
  3. id attribute accepts any non-white-space character per HTML5 spec (and is backwards compatible)

Changes:

  • revert the anchor options (anchors, anchorClass, and anchorContent)
  • revert the anchor element itself
  • simplify the heading-text-to-id logic

@leeola
Copy link

leeola commented Jul 26, 2013

So, as i mentioned in issue #89, it seems this is a rather widely attempted issue. With that said, this is one of the simplest working solutions i have seen yet. Very nice!

@leeola
Copy link

leeola commented Jul 26, 2013

After using this, i feel like an improvement could be made to track previous ID names, to ensure no duplicates. Possible append an index to the end of the id

@jasonkarns
Copy link
Contributor Author

Yeah, I thought about that. However, even though duplicate IDs would be problematic, I think it would be a bad idea to account for it in code.

There are a number of use-cases for these headers having deep-permalinks.

  1. Post-publication deep-linking by users of the page. (This is best handled by the markup generated by Add support for generating heading anchors in HTML output. #117's implementation and used by GitHub itself.)
  2. Post-publication deep-linking by the author, from other external sources.
  3. Author-time, in-page deep linking by the author.

Use cases 1 & 2 could be supported by random, opaque IDs. There is nothing inherent in the value of the ID that would impede these use cases.

Use case # 3, on the other hand, requires a known and predictable value for the ID attribute. Since the value of the ID must be known prior to its generation, the best method is to simply use the textual content of the header itself (even accounting for predictable conversion rules to generate IDs valid for the attribute). Appending an index would hamper the predictability of the generated IDs. (only slightly, to be sure, since the index could be derived simply by placement in the page; but IMO, still harmful enough to use case 3)

Since I don't believe de-duping should be handled in the code, it's obvious it falls on the shoulders of the author to ensure unique headers. This is a good thing, anyhow, as I would consider it a bad idea to have textually-identical headers on the same page.

@leeola
Copy link

leeola commented Jul 27, 2013

You think it would be bad to have textually identical headers? I don't mean to argue with you (really, i don't care haha), it's just that this use:

# Scenario A

## Problem
...

## Solution 
...


# Scenario B

## Problem
..

## Solution
..

etc..

Isn't that style of writing relatively common? By definition, should all headers be unique?

Assuming i understand you, use case 3, of the other linking to headers manually, it's a lose-lose use case, isn't it? If he has two identical headers, he can't link to the second header. If the second header has a different name that he expects (say, if it was appended an index) then he still doesn't know.

So assuming i understand use case 3, it seems like appending the index is the lesser of the two evils. At least he has the ability to link to that second header, no?

@jasonkarns
Copy link
Contributor Author

Isn't that style of writing relatively common? By definition, should all headers be unique?

Perhaps. I'm throwing my anecdotal opinion out there on this one. My gut is saying duplicate text headers are just a bad idea, generally. You've clearly shown a refuting example, though I would personally try to make the sub headers more contextually specific by making them unique. I mean, if they aren't unique, should they really be headers? 🤷 (I'm thinking outlines of a thesis paper or presentation, chapter titles of books, etc. Anything so generic as to be duplicated in two locations isn't really header material. -- again, opinion)

If he has two identical headers, he can't link to the second header.

Unless he embeds the second (duplicate) header directly as HTML with a manually-set ID. (In addition, there are other flavors of markdown that let you specify arbitrary attributes on some types of elements.)

So assuming i understand use case 3, it seems like appending the index is the lesser of the two evils. At least he has the ability to link to that second header, no?

I'm assuming the implementation you have in mind is to append an index only to the duplicate headers and not to every header? (otherwise, you'd lose predictability of every header, ever to gain support for an edge case) So the implementation would have to keep track of generated IDs?

  1. the author can drop in embedded HTML (with IDs) at any moment, therefore it's possible (highly unlikely) that we end up with duplicate IDs even with our implementation
  2. markdown's beauty is in its simplicity. It doesn't cover every use case, only hits the 80/20. For the remaining 20, authors can drop back into raw HTML
  3. Of this header dilemma, duplicate headers lies squarely outside of the 80% sweet spot. And I would relegate the honest-to-goodness, valid, duplicate-header scenario to maybe 5%. Which, IMO is not worth the effort. It's not a technically hard problem. But I don't see enough value in the robust implementation.
this.token.text.toLowerCase().replace(/\s/g, '-')

That gives you 95% of your uses, with a single (terse) line of code. Anything more doesn't meet the value proposition for me. /opinion

@jasonkarns
Copy link
Contributor Author

Personally, I would favor a particular feature that let's the author specify the ID attribute directly in markdown. I've already come across a flavor of markdown that did this, though I can't recall which one it was.

The syntax was fairly nice:

# my header {#my-header-id}

Or something similar. That would allow auto-creation of IDs for the general case; while allowing a clean syntax for overriding the IDs for the edge case. (While not doing anything pseudo-magical or unexpected)

@leeola
Copy link

leeola commented Jul 27, 2013

I'm assuming the implementation you have in mind is to append an index only to the duplicate headers and not to every header? (otherwise, you'd lose predictability of every header, ever to gain support for an edge case) So the implementation would have to keep track of generated IDs?

Yup, exactly. No need to modify existing uniques, make index 0 empty by default

Of this header dilemma, duplicate headers lies squarely outside of the 80% sweet spot. And I would relegate the honest-to-goodness, valid, duplicate-header scenario to maybe 5%. Which, IMO is not worth the effort. It's not a technically hard problem. But I don't see enough value in the robust implementation.

I think i would have to agree with you there. While i disagree about a unique requirement for a header (though, perhaps i am thinking about them more structurally, and less contextually), i would agree that duplicates easily fall in the 20% line.

The only time i would really disagree, is if duplicate IDs are error prone somehow, but i have no idea on this. If, for some reason, something broke hard due to them, then that 20% is fatal, and worthy of a fix.

Nevertheless, i agree with your 20% point, and withdraw my opinion hehe.

With all of this discussed, i wonder how long it will take for all of this to be adopted? I've got a lot of links riding on this specific PR, but who knows what PR is actually going to get accepted. I have the feeling that when this issue is finally addressed, i'll be manually reassigning a hundred links hah.

@jasonkarns
Copy link
Contributor Author

The only time i would really disagree, is if duplicate IDs are error prone somehow, but i have no idea on this.

I could imagine actual errors being caused by this if there was any javascript depending on those IDs. But the workaround would be to embed that particular heading in straight HTML, with a custom ID.

I've got a lot of links riding on this specific PR, but who knows what PR is actually going to get accepted. I have the feeling that when this issue is finally addressed, i'll be manually reassigning a hundred links hah.

I, too, though not in the hundreds. :-) Until this feature is added in some form or another, I've just been writing all my headers in direct HTML. Rather annoying, but at least it works correctly today, without needing to run a fork locally.

@adam-p
Copy link

adam-p commented Jul 27, 2013

I'd like to chime in on name vs. ID...

My project (Markdown Here) uses Marked.js to format email. When email is displayed by Gmail (and other clients, probably), IDs get stripped out of elements, but names are left intact.

So ID-based anchoring won't work for me, but name-based will. (And I do have users who specifically want to have ToC links in email.)

Can both be used? Or there be an option for which is used?

@chjj
Copy link
Member

chjj commented Aug 2, 2013

Looking at the diff, there appears to be a bunch of old unused code leftover here. I might just grab the relevant part and commit it myself.

@jasonkarns
Copy link
Contributor Author

Yeah, I started this pr as a branch off another pr. I'll clean it up and
resubmit later today

@jasonkarns
Copy link
Contributor Author

I'm unclear as to the purpose of the various subdirectories under test. What's the purpose of test/new vs test/tests? Which directory should the new headers-with-ids tests go under?

Uses the heading text itself as the ID for the H*. The only
transformation made is to replace whitespace with '-' (hyphen). This
conforms to the HTML5 spec wherein ID attributes can contain any
non-whitespace character. (The previous HTML4 restrictions on ID
attribute values have been relaxed.)

This feature is limited in scope to strictly generating heading IDs.
A future enhancement would be a simple way of letting the author specify
the ID attribute. Perhaps:
  # Some: Crazy Header  {#custom_id}

  to generate: <h1 id="custom_id">Some: Crazy Header</h1>
@jasonkarns
Copy link
Contributor Author

Previous commits have been squashed down to the minimum necessary. New tests were scrapped since the existing tests (once modified) provided coverage for this feature.

Only thing still worth discussing, I think, is the choice of hyphen vs underscore as the whitespace replacement character.

Otherwise, it's ready to merge.

@adam-p
Copy link

adam-p commented Aug 2, 2013

I have nothing new to add to the case I stated above, but in case saying it again will somehow sway anyone: For use in email, anchors need name attributes. id attributes just don't work.

(Of course, if inline element rendering were customizable...)

@jasonkarns
Copy link
Contributor Author

@adam-p While I respect the corner you're stuck in, I would consider that a rare edge case. The HTML spec now considers name attributes deprecated[1] in favor of id. So, use cases similar to yours, I would never recommend them. Until you can customize element rendering (#129), it seems you're limited to manual inline HTML :-(

  1. see http://www.w3.org/TR/html5-diff/#changed-attributes (name is listed as "allowed but discouraged")

@Mithgol
Copy link
Contributor

Mithgol commented Aug 4, 2013

It is (I guess) possible to use cheerio to generate names based on ids.

@chjj
Copy link
Member

chjj commented Aug 4, 2013

@jasonkarns, I guess I should make this clear in the readme: test/new is for any test that isn't part of the original markdown test suite. test/original is only for the original markdown test suite. test/tests is where they both reside after being combined (and the markdown test suite being slightly altered) via the node test --fix script. It's not a problem. I can fix it.

chjj added a commit that referenced this pull request Aug 4, 2013
@chjj chjj merged commit 3b2d4bd into markedjs:master Aug 4, 2013
@leeola
Copy link

leeola commented Aug 6, 2013

My only question with this, that i believe i forgot to raise, is it possible to make hover-anchor links like Github with this system? This is why i originally opted for the "more html" method, rather than the plain ID option.

Are automatic links basically impossible with this method? Probably need to look into the custom generation branches then?

@jasonkarns
Copy link
Contributor Author

@leeolayvar GitHub-style hover anchors are not possible with this method, at least not automatically. Though they could be added with javascript after the fact.

@leeola
Copy link

leeola commented Aug 6, 2013

k, didn't think so, thanks for the confirmation. I guess i'll wait for a token/parser modification system to be implemented to generate real anchors. This will be a useful stopgap until then.

@jonschlinkert
Copy link

Wow, why was this merged in? The regex on this needs to be improved dramatically before this is actually usable. Expecting someone to remember to add commas and colons in reference links is a recipe for broken links and inconsistent results. Also, adding the Id directly to the heading is not a good solution as it makes it far more difficult to hang additional styles on the anchor (such as an icon on hover) without messing up your typography. It just seems like this PR has a lot of personal preference associated with it, and not enough regard for popular conventions.

@jonschlinkert
Copy link

I now see 75dff71, which is good, but using headings instead of anchors is still just a bad idea.

@jasonkarns
Copy link
Contributor Author

Expecting someone to remember to add commas and colons in reference links is a recipe for broken links and inconsistent results.

@jonschlinkert Personally, I think using the text-as-id is simpler for authors. The use case is for in-document links so the link author is the same person as the header author. Remembering to add commas and such is easier than trying to remember which characters get replaced and which character is used for replacement. The author simply copies the header text and replaces whitespace. But 'simple' is subjective.

Adding any additional markup (like anchors) would only go further into the realm of personal taste. Giving headers an ID that can be targeted covers the core use case. Everything else (styling and clickable permalinks) are just enhancements on that core functionality. Getting the core functionality in place and shipped is more important than waiting for resolution on the numerous PRs that each do something slightly different. ( #89 #92 #129 #134 #207 ) The best future version of marked would indeed do some sort of anchor generation, or at least allow customized renderers. But who knows how long it could take to get those PRs resolved and merged? Much better to at least get marked to support linking to headers at all.

@jonschlinkert
Copy link

The author simply copies the header text and replaces whitespace. But 'simple' is subjective.

Fair enough, I imagine we all take our positions based on either what we're familiar with or what we find most advantageous based on what we tend to focus on. IMHO it sounds like your use-cases are more oriented to basic documents, single documents, and/or a small amount of documentation (I really don't intend that as a slight or any kind of jab, so please don't take it that way. These are examples of my use cases:

I'm also doing the documentation for the new lesscss.org website and I'm managing docs for close to 100 other projects. Again, please don't think I'm trying to make a big deal out of "my projects", because I'm just one dude. But are my use cases, and hopefully it helps explain why I don't think this PR is sufficient.

Giving headers an ID that can be targeted covers the core use case.

I disagree. I'm not sure where you got your ideas about that, or whose core use case you're representing, but IMHO, most people would agree that this should be implemented with proper anchors, not a shortcut to slim down on markdown at the cost of being more correct.

Getting the core functionality in place and shipped is more important than waiting for resolution on the numerous PRs

In some cases, but that's a broad generalization that's difficult to debate. In this case, however, I think it would have been better to wait for something idiomatic.

Much better to at least get marked to support linking to headers at all.

Lol, for you. That, in particular, is what stands out to me about this PR. It has personal taste and preferences written all over it.

At the end of the day, it's not a big enough deal for me to keep debating it. We can just fork it or find another solution.

@leeola
Copy link

leeola commented Aug 24, 2013

Well, i'll be interested if Anchors are added somehow. Like i was mentioning earlier, i would like a way to make easy linkable anchors (like Github) without having to layer JavaScript onto for seemingly no reason.

With that said, i do feel the IDs are a nice simple solution if the other PRs involving modifying how the HTML is generated are pulled.

If we are given a in-program ability to modify the output to our needs, than i feel the default should be just IDs. They're clean, simple, and provide a basic use case. If we are not given the ability to modify HTML output anytime soon, then i think just ID based anchor links are woefully inadequate.

@jasonkarns
Copy link
Contributor Author

Here's how I arrived at my definition of 'core functionality':

There are two features we're discussing.

  1. Ability to link to a header as the author or end user. (accomplished with either an ID attribute or anchor element)
  2. Ability to quickly get such a link as an end user. (accomplished with ID+link or anchor+link or anchor/link)

The ability to do No. 2 is predicated on No. 1 already working. Granted, using anchors solves both at the same time. But as we can see from the number of pull requests that attempt to do both, there is disagreement on how it should be implemented. But fundamentally, the existence of a clickable anchor+link, and the ability to style it (hover, etc) are enhancements to the core functionality of "ability to deep-link to an arbitrary header". Adding IDs to the headers solves the basic use case, with minimal fuss. This way we can start linking to headers now without waiting for the larger implementation using anchors (or custom renderers).

@ChrisWren
Copy link
Contributor

What about exposing an option like highlight for users to insert their own HTML for the header tag called headerHTML? The implementation would look like this:

 case 'heading': {

      return this.options.headerHTML(this.token) || '<h'
        + this.token.depth
        + ' id="'
        + this.token.text.toLowerCase().replace(/[^\w]+/g, '-')
        + '">'
        + this.inline.output(this.token.text)
        + '</h'
        + this.token.depth
        + '>\n';
    }

Basically there is a simple default implementation, but users are also allowed to do cool stuff like GitHub's hover links if they define a options.headerHTML function and return the markup.

Thoughts?

@ghost
Copy link

ghost commented Dec 2, 2013

"I could imagine actual errors being caused by this if there was any javascript depending on those IDs."

It's also a problem for epub file validation or, basically, for anything else that does strict parsing of HTML. The fundamental issue here is that legal Markdown is being transformed into illegal HTML. That should never happen, IMO.

The argument that authors shouldn't use duplicate headings doesn't seem persuasive. For one, that's a matter of style, not standards compliance. For another, the Markdown content may not be under your direct control (e.g., web site that allows users to enter markdown which is then transformed into HTML for another purpose).

@jasonkarns jasonkarns deleted the heading-ids branch February 6, 2014 15:45
@vvo
Copy link

vvo commented Oct 7, 2014

For anyone wondering here's what I did to get full GFM rendering using github style and anchor links:

  var renderer = new marked.Renderer();

  marked.setOptions({
    gfm: true,
    renderer: renderer,
    highlight: function (code) {
      return require('highlight.js').highlightAuto(code).value;
    }
  });

  renderer.heading = function (text, level) {
    var escapedText = text.toLowerCase().replace(/[^\w]+/g, '-');

    return '<h' + level + ' class="header"><a name="' +
                  escapedText +
                   '" class="anchor" href="#' +
                   escapedText +
                   '"><span class="octicon octicon-link"></span></a>' +
                    text + '</h' + level + '>';
  };

Then use this CSS file: https://github.com/sindresorhus/github-markdown-css/blob/gh-pages/github-markdown.css

ghost pushed a commit to zergeborg/marked that referenced this pull request May 13, 2016
ghost pushed a commit to zergeborg/marked that referenced this pull request May 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants