New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heading levels in Markdown table of contents #1778

Closed
Parent5446 opened this Issue Jan 14, 2016 · 35 comments

Comments

Projects
None yet
@Parent5446

Parent5446 commented Jan 14, 2016

So I'm not sure if this is at all possible, or what the best way would be to do this, but we're having some issues using the .TableOfContents variable in templates.

There are a couple of points:

  1. Semantically, HTML should only have one <h1> tag per section root.
  2. It would be nice to have the top-level, i.e., <body>-level, <h1> tag generated in the layout template using the .Title attribute of the page. (It works better semantically, rather than having the title in two places.)
  3. When rendering Markdown, the table of contents in .TableOfContents, sensibly, only renders navigation for the headers in the actual content.
  4. Furthermore, the .TableOfContents always treats <h1> as top-level, even if there are no <h1>-level headers.

Because of this, if you comply with (1) and implement (2), and thus only have <h2> or lower headers in your content, the generated table of contents contains an empty top-level <nav> as a result of (3) and (4).

Example table of contents:

<nav id="TableOfContents">
    <ul>
        <li>
            <ul>
                <li><a href="#introduction:e95c9b0e3cf17661856239295171d427">Introduction</a></li>
                <li><a href="#at-a-glance:e95c9b0e3cf17661856239295171d427">At a Glance</a></li>
            </ul>
        </li>
    </ul>
</nav>

This messes with the page semantically since now the navigation has an empty top-level. The way I see it there are two ways to fix this:

  1. Somehow get the header tags in layout templates into the table of contents, so the top-level is not blank.
  2. Get the renderer to remove empty levels, e.g., by treating <h2> as top-level headers if there is no <h1> in the content.

I'm not sure if there is currently an undocumented workaround to implement either of these solutions. But if there isn't, would there be a way to allow for using either of the two solutions to achieve a more semantic table of contents?

@bep

This comment has been minimized.

Member

bep commented Jan 14, 2016

I have not read this in detail, but we get the ToC from https://github.com/russross/blackfriday -- so maybe it is better to discuss it there.

@Dr-Terrible

This comment has been minimized.

Dr-Terrible commented Jan 20, 2016

This messes with the page semantically since now the navigation has an empty top-level. The way I see it there are two ways to fix this: [...]

I am affected by this odd behaviour too. The way .ToC is rendered by Hugo/Blackfriday makes the entire concept of ToC pretty much useless. In the worst case scenario, my ToC is completely messed up and doesn't reflect any more the original header structure in my markdown files. Some times, the ToC's headers are so messed up they aren't correctly rendered by the browser.

The issue here is that blackfriday spits out a bunch of hard coded HTML tags, and then Hugo wrap them in a way that is neither valid HTML code.

A solution is quite simple: don't give users a preformatted .ToC, just give them an indexed array and then let them generate the desired HTML structure by iterating over the array elements.

@moorereason

This comment has been minimized.

Contributor

moorereason commented Jan 28, 2016

@Parent5446 and @Dr-Terrible,
It would be great if one of you would create a blackfriday issue for this.

@bep

This comment has been minimized.

Member

bep commented Jan 28, 2016

The bottom line of this is:

The ToC should not be HTML, it should be a datastructure that people can do with as they please.

@ErjanGavalji

This comment has been minimized.

ErjanGavalji commented Jun 23, 2016

Okay, for some reason, even though toc_levels explicitly specified to 1..6 both in page and in config file, kramdown does not generate ids.

I'm putting this on hold for the time being for I'd prefer to test it with a newer jekyll version (and dependencies), which would take some more time.

I found that the level4 subsections are not that large, so we could get without links there. What do you think, guys?

Cheers,
Erjan

@helmbold

This comment has been minimized.

helmbold commented Jul 21, 2016

I've written a little tool that removes the unnecessary level of nesting from the table of contents. You can run it, after Hugo has generated the contents in the "public" folder.

@DavidCRivera

This comment has been minimized.

DavidCRivera commented Sep 12, 2016

I'm running into an issue that may be related. It seems that when I have a lower level tag i.e. an H6 appearing first in my content before an H5, a TOC is rendered at the start of my content, without my explicitly including a .TableOfContents tag.

What I mean is, if my content looks like:

##### This is an H5 #####

###### This is an H6 ######

Then all is right with the world, and no TOC gets generated. But if it looks like this:

###### This is an H6 ######

##### This is an H5 #####

Then I get the garbled TOC HTML appearing arbitrarily at the start of my content. It doesn't even have the "TableOfContents" ID; it's just a naked NAV tag. Seems like a bug...

@bep

This comment has been minimized.

Member

bep commented Feb 28, 2017

This issue has been automatically marked as stale because it has not been commented on for at least four months.

The resources of the Hugo team are limited, and so we are asking for your help.

If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.

This issue will automatically be closed in four months if no further activity occurs. Thank you for all your contributions.

@bep bep added the Stale label Feb 28, 2017

@bep

This comment has been minimized.

Member

bep commented Mar 1, 2017

Note/Update: This issue is marked as stale, and I may have said something earlier about "opening a thread on the discussion forum". Please don't.

If this is a bug and you can still reproduce this error on the latest release or the master branch, please reply with all of the information you have about it in order to keep the issue open.

If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.

@bep bep added Keep and removed Stale labels Mar 10, 2017

@helmbold

This comment has been minimized.

helmbold commented Mar 27, 2017

This issue is still unresolved.

@thewebmastercom

This comment has been minimized.

thewebmastercom commented May 31, 2017

Any news on this? It appears to be still unresolved.

@mikeblum

This comment has been minimized.

mikeblum commented Jul 9, 2017

I found this open issue when trying to figure out why Hugo's {{ .TableOfContents }} didn't seem to work properly / wasn't styleable. I created a partial for generating TOC trees based on header tags. The general philosophy being that TOCs need to be customisable so I figured that'd work best with a partial. This is more of a way to render headers within a .Content block rather than a formal data structure.

snippet from partials/table-of-contents.html:

 <!-- ignore empty links with + -->
{{ $headers := findRE "<h[1-6].*?>(.|\n])+?</h[1-6]>" .Content }}
<!-- at least one header to link to -->
{{ $has_headers := ge (len $headers) 1 }}
<!-- a post can explicitly disable Table of Contents with toc: false -->
{{ $show_toc := (eq $.Params.toc true) }}
{{ if and $has_headers $show_toc }}
<div class="table-of-contents toc bd-callout">
    <!-- TOC header -->
    <h4 class="text-muted">Table of Contents</h4>
    {{ range $headers }}
        {{ $header := . }}
        {{ range first 1 (findRE "<h[1-6]" $header 1) }}
            {{ range findRE "[1-6]" . 1 }}
                {{ $next_heading := (int .) }}
                <!-- generate li array of the proper depth -->
                {{ range seq $next_heading }}
                    <ul class="toc-h{{ . }}">
                {{end}}
                {{ $base := ($.Page.File.LogicalName) }}
                {{ $anchorId := ($header | plainify | htmlEscape | urlize) }}
                {{ $href := delimit (slice $base $anchorId) "#" | string }}
                <a href="{{ relref $.Page $href }}">
                    <li>{{ $header | plainify | htmlEscape }}</li>
                </a>
                <!-- close list -->
                {{ range seq $next_heading }}
                    </ul>
                {{end}}
            {{end}}
        {{end}}
    {{ end }}
</div>
{{ end }}

Here is how I have it working to render TOCs for Posts:

 <div class="content">
          {{ partial "banner" . }}
          {{ partial "table-of-contents" . }}
          <!-- supports emoji -->
          {{ .Content | emojify }}
</div>
@jirfag

This comment has been minimized.

jirfag commented Oct 8, 2017

@mikeblum thank you for this snippet! I used it and made bootstrap-styled table of contents, my snippet is here

@vassudanagunta

This comment has been minimized.

Contributor

vassudanagunta commented Oct 9, 2017

Once we have #1778, we can more easily provide the TOC as a data structure, using access to the syntax tree, or perhaps writing a new renderer to build during parsing.

@vassudanagunta

This comment has been minimized.

Contributor

vassudanagunta commented Feb 1, 2018

Once we have #1778

I meant to say, once we have #3949 (Upgrade to Blackfriday v2)...

@lb13

This comment has been minimized.

lb13 commented Feb 27, 2018

@mikeblum thank you!

@alexislg2

This comment has been minimized.

alexislg2 commented May 12, 2018

@mikeblum thanks a lot!
I still have an issue with the anchor link. This ($header | plainify | htmlEscape | urlize) does not work with several cases. Examples:

Bonjour, ca va ? shoud return bonjour-ca-va but returns bonjour-ca-va- (note the hyphen at the end)

Also, it does not work with apostrophes. Both href and title do not work. for example let's go gives letamprsquos-go

I am not a go expert so I cannot help.

@branw

This comment has been minimized.

branw commented May 27, 2018

@alexislg2 The last two functions are incorrect. plainify returns a string that is already escaped, so we have to htmlUnescape it. Furthermore, anchors are generated using the anchorize function (a BlackFriday provided feature), not urlize.

Here are the relevant changes to get the partial @mikeblum posted working correctly:

{{ $anchorId := ($header | plainify | htmlUnescape | anchorize) }}
{{ $href := delimit (slice $base $anchorId) "#" | string }}
<li><a href="{{ relref $.Page $href }}">
    {{ $header | plainify | htmlUnescape }}
</a></li>
@skyzyx

This comment has been minimized.

skyzyx commented Sep 3, 2018

I implemented the toc as a partial using code from above, but the logic of the code produced markup that was invalid and not semantically sound. So I rewrote it like so:

https://gist.github.com/skyzyx/a796d66f6a124f057f3374eff0b3f99a

This version intentionally only looks for h2h4. This is because the page title is the h1, and everything else is h2 or below. I also choose to stop at h4 because the value to the reader beyond that is — in my experience — negligible.

Feel free to re-adjust the regexes if you want a broader spectrum of headers.

yihui added a commit to rbind/yihui that referenced this issue Sep 10, 2018

fix the TOC when headers in a post don't start from the first level (…
…i.e. h1) but lower levels (h2, h3, ...)

it seems Hugo/Blackfriday don't want to fix it: gohugoio/hugo#1778
@yihui

This comment has been minimized.

Contributor

yihui commented Sep 10, 2018

In case any one is interested, I just wrote a short JS script to remove the non-existent h1 in TOC so that it can start from h2 instead: https://github.com/rbind/yihui/blob/master/static/js/fix-toc.js One advantage of this solution is that it does not assume whether your TOC starts from h1 or h2.

You can include the script via something like <script src="/js/fix-toc.js></script> after you put it under the static/js/ directory of your site.

Here is an example. If your eyes are quick enough, you can actually see the first <ul> in TOC quickly removed :)

@VincentTam

This comment has been minimized.

VincentTam commented Sep 17, 2018

@skyzyx Thanks for sharing. I'm implement this in my Hugo blog on GitLab under /pages/*/index.md, but this generates an unordered list of links pointing to /post/*/index.md. I believe I'll probably end up with errors similar to those in my recent failed job.

@yihui Thanks for your JavaScript, even though it works only if there's more than one section. I've adapted your script to Beautiful Hugo and published it on GitLab snippet.

// Copyright (c) 2017 Yihui Xie & 2018 Vincent Tam under MIT

(function() {
  var toc = document.getElementById('TableOfContents');
  if (!toc) return;
  do {
    var li, ul = toc.querySelector('ul');
    if (ul.childElementCount !== 1) break;
    li = ul.firstElementChild;
    if (li.tagName !== 'LI') break;
    // remove <ul><li></li></ul> where only <ul> only contains one <li>
    ul.outerHTML = li.innerHTML;
  } while (toc.childElementCount >= 1);
})();
@ryanwhocodes

This comment has been minimized.

ryanwhocodes commented Sep 20, 2018

I implemented the toc as a partial using code from above, but the logic of the code produced markup that was invalid and not semantically sound. So I rewrote it like so:

https://gist.github.com/skyzyx/a796d66f6a124f057f3374eff0b3f99a

This version intentionally only looks for h2h4. This is because the page title is the h1, and everything else is h2 or below. I also choose to stop at h4 because the value to the reader beyond that is — in my experience — negligible.

Feel free to re-adjust the regexes if you want a broader spectrum of headers.

This works a treat @skyzyx 😃

@Beej126

This comment has been minimized.

Beej126 commented Sep 23, 2018

here's another twist on collapsing empty heading levels using hugo static generation vs recurring javascript overhead... the gist is to use string replace to target the empty levels... there's no conditional looping in hugo templates yet so i just applied the basic approach 3 times which covers all my markup scenarios

note the pattern on closing tag carriage returns is slightly different than opening tags

           {{ $toc := .TableOfContents }}
            {{ $toc := (replace $toc "<ul>\n<li>\n<ul>" "<ul>") }}
            {{ $toc := (replace $toc "<ul>\n<li>\n<ul>" "<ul>") }}
            {{ $toc := (replace $toc "<ul>\n<li>\n<ul>" "<ul>") }}
            {{ $toc := (replace $toc "</ul></li>\n</ul>" "</ul>") }}
            {{ $toc := (replace $toc "</ul></li>\n</ul>" "</ul>") }}
            {{ $toc := (replace $toc "</ul></li>\n</ul>" "</ul>") }}
            <!-- count the number of remaining li tags -->
            <!-- and only display ToC if more than 1, otherwise why bother -->
            {{ if gt (len (split $toc "<li>")) 2 }}
              {{ safeHTML $toc }}
            {{ end }}
          {{ end }}
@VincentTam

This comment has been minimized.

VincentTam commented Sep 24, 2018

@Beej126 Thanks for your code. 😄 That's much better than the JavaScript approach. I've tested this for my blog and it works perfectly.

@ryanwhocodes

This comment has been minimized.

ryanwhocodes commented Sep 24, 2018

@Beej126 @VincentTam Looks like there might be potential for a loop there given that the same commands are run three times each...

@VincentTam

This comment has been minimized.

VincentTam commented Sep 24, 2018

@ryanwhocodes That's more elegant, but that won't save you any line because the beginning and the end take two lines.

@Beej126

This comment has been minimized.

Beej126 commented Sep 25, 2018

@ryanwhocodes - it seems like the ideal loop would be a conditional check on whether the last replace had any hits... but we only get a "range" style looping in hugo so far... i.e. finite list iteration... which suggests a "split" approach to generate array... but i couldn't think of a good pattern to split on that would be reliable with nested ul-li nodes... if you can see a good strategy please suggest

@mithuns

This comment has been minimized.

mithuns commented Oct 21, 2018

So, is it supported now out of the box without modifying any partials ? Can someone please post what property to turn this on for a post md file ?

@skyzyx

This comment has been minimized.

skyzyx commented Oct 24, 2018

@helmbold, please no passive-aggressive comments. Nobody owes you (or anybody else) anything.

This is open-source software. If you want this feature so badly, why not offer to sponsor development with cash? Or contribute, yourself?

As a maintainer of a very popular piece of OSS software, I can speak first-hand about the difficulty of trying to handle development and support of OSS, on top of a daytime job + family time + time for myself.

Please, straighten out your perspective.

@helmbold

This comment has been minimized.

helmbold commented Oct 25, 2018

@skyzyx Yes, you're right! I've deleted my comment since I would not like to read such comments in my own project.

@pyrrho

This comment has been minimized.

pyrrho commented Oct 29, 2018

The code that @mikeblum and @skyzyx provided got me pretty far, but I was bitten by a pathological case; one of my pages had numerous headings that contained the same text (Don't ask. It's... it's this whole big thing), and so generated the same id when passed through | plainify | htmlUnescape. The generated ToC only anchored to the first of these redundant headings.

Blackfriday has a workaround for this case already. It appends a counter to the end of the id for each identically-named heading, so rather than having five my-annoying-heading ids, it will generate my-annoying-heading, my-annoying-heading-2, my-annoying-heading-3, etc..

Long story short, I re-tooled the code already shared in this thread to extract the id from the headings rather than re-generate it from the contained text, and to be a bit more verbose about the sub-<ul> elements.

Hope it helps.
https://gist.github.com/pyrrho/1d77cdb98ba58c7547f2cdb3fb325c62

Edit [20 Nov]: @mikeblum's question about explicit heading IDs made me realize my code was deficient when the headings were anything but text. I've expanded the code linked above with a substantially more complex test set, and the ability to correctly translate markdown syntax (e.g. **strong**, _em_, [links]()), html (e.g. <span style"color: red;">explicit blocks</span>), emoji, and the like to the generated <ul>.

@xenophenes

This comment has been minimized.

xenophenes commented Nov 5, 2018

@pyrrho Your solution has come the closest to working for me, but I run into the errors:

error calling partial: template: theme/partials/toc.html:28:53: executing "theme/partials/toc.html" at <after 1>: error calling after: no items left

I'm reading through & trying to understand the code and why "after 1" would be failing - any ideas?

@pyrrho

This comment has been minimized.

pyrrho commented Nov 5, 2018

To immediately, answer you question, @xenophenes, no. I have no idea what that message is suggesting. I'd love to dig into it and try and make this snippet more robust, though. I'd ask we move that discussion to the gist, though, so we don't conflate the discussion in this issue with debugging back-and-forth. And so I have a record there of what broke. And (hopefully) how it was fixed.

VincentTam added a commit to VincentTam/beautifulhugo that referenced this issue Nov 16, 2018

@mikeblum

This comment has been minimized.

mikeblum commented Nov 18, 2018

After taking into account @branw 's changes (thanks by the way!) I've found some issues with how Hugo auto-generates the header ids in BlackFriday:

TOC:

<a href="/post/table-of-contents/#no-entry-sign-headers">
    🚫 headers
</a>

target header:

<h1 id="not-supported">🚫 headers</h1>

I tweaked @branw's fix to at least cosmetically support emoji:

{{ $base := ($.Page.File.LogicalName) }}
{{ $anchorId := ($header | plainify | htmlUnescape | anchorize) }}
{{ $href := delimit (slice $base $anchorId) "#" | string }}
<li>
  <a href="{{ relref $.Page $href }}">
    {{ $header | plainify | htmlUnescape | emojify }}
  </a>
</li>

and tried adding this to my config.toml:

[blackfriday]
  angledQuotes = true
  extensions = ["hardLineBreak"]
  fractions = false
  plainIDAnchors = true

but still no dice on supporting complex headers with UTF-8 nonsense in them. Is there a hook in the processing pipeline to create manual header ids? Ideally I think having the id be generated with

{{ $anchorId := ($header | plainify | htmlUnescape | anchorize) }}

would work nicely but I'm sure there are edge cases that that doesn't take into account.

@pyrrho

This comment has been minimized.

pyrrho commented Nov 20, 2018

@mikeblum it looks to me like Blackfriday is stripping the UTF8( / emoji) from the generated IDs, same as it strips special ASCII characters (&, %, $, etc.);

Input Markdown

## 🚫 headers &&
## 🚫 headers &&
## 🚫 headers &&

Output HTML

<h2 id="headers">🚫 headers &amp;&amp;</h2>
<h2 id="headers-1">🚫 headers &amp;&amp;</h2>
<h2 id="headers-2">🚫 headers &amp;&amp;</h2>

There is extended-markdown syntax for explicitly setting a heading's id, by the by;
Input Markdown

## 🚫 headers {#customized-no-entry-sign-header}

Output HTML

<h2 id="customized-no-entry-sign-header">🚫 headers</h2>

pad92 added a commit to pad92/beautifulhugo that referenced this issue Nov 22, 2018

pad92 added a commit to pad92/beautifulhugo that referenced this issue Nov 22, 2018

pad92 added a commit to pad92/beautifulhugo that referenced this issue Nov 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment