New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable descriptive header IDs. #591

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@halostatue
Copy link
Contributor

halostatue commented Oct 29, 2014

Enable blackfriday.EXTENSION_AUTO_HEADER_IDS to generate the name of the header ID from the text in the header. Works for prefix and underline headers.

WARNING: This change is not backwards-compatible, so it should probably be put under a configuration option. That is, the existing behaviour of Hugo is to use #toc__n_ for header IDs. Now, a header will match its contents.

  # A Header

Under the old model, that would generate:

  <h1 id="#toc_1">A Header</h1>

Under the new model, that will generate:

  <h1 id="#a-header">A Header</h1>

The automatically generated TOC will use the correct values regardless, but it may be advisable to explicitly specify the old #toc__n_ model as part of the header:

  # A Header {#toc_1}

Or, we make a larger change that checks for this particular flag being enabled through a configuration option, defaulting to on. (Older blogs will need to make one change to their config.* file to maintain the current behaviour, and it could also potentially be on a per-page basis. Maybe.)

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Oct 29, 2014

This is a great change. Thanks!

It's failing the ci because there's a few tests that check to make sure the output matches and are looking for "toc". Can you look into that before I merge this?

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Oct 30, 2014

Certainly. Should I try to put this behaviour behind a flag, or do we accept that this is a breaking change, otherwise? (It’ll affect the tests—and probably a number of bits in the documentation, it looks like.)

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Oct 30, 2014

I'm fine with the breaking change.  I can't imagine anyone ever would prefer the prior behavior. 

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Oct 30, 2014

Then, I’ve also fixed the TOC extraction as well. I’m going to merge all of these fixes into a single commit and adjust the description.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Oct 30, 2014

As a side note, this does introduce a small problem on list pages, because all markdown headers that are rendered on the list page will have IDs, and if there are headers that have the same text, multiple headers will result in the same ID.

@halostatue halostatue force-pushed the halostatue:auto-header-ids branch Oct 30, 2014

@bep

This comment has been minimized.

Copy link
Member

bep commented Oct 30, 2014

Duplicate ids on a page is no-go, in my eyes (and many will shout louder). But there must be a way around it?

BTW, I really like the Blackfriday patches you are pulling out of your sleeve.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Oct 30, 2014

The duplicate IDs problem is going to be hard to solve, because Blackfriday doesn’t really keep any tabs on separate renderings. That is, I could produce a further patch to Blackfriday that would do the right thing given:

# My Heading
# My Heading

Such that it would be #my-heading and #my-heading-1 (and maybe even have an option to warn about that). What will be harder is the case where Hugo has two files with # My Heading. They are always going to be parsed separately. A list view that renders the full content will show both <h1 id="my-heading>My Heading</h1> values…and unless we introduce (to Blackfriday) a prefix mechanism like I created in Hugo for the footnotes {{ .UniqueId }}, that won’t be possible to fix. Worse, if we do add that functionality to Blackfriday and use it in Hugo, we now can't just reference [link](#my-heading), we have to do [link](#{{ .UniqueId }}my-heading).

In other words, it’s doable, but it has just gone from nice to ugly. We could go a bit further with Hugo and add a functions to help: [link]({{ .InnerLink "my-heading" }}), but that’s not as memorable, and it still means that the fragment reference now looks something like #0123456789abcdef0123my-heading instead of #my-heading (or maybe it looks like #my-heading-0123456789abcdef0123, which doesn’t look as bad, and would be a suffix feature for Blackfriday).

I can see the value of .InnerLink, especially if we also had crossLink function, used something like [link](crosslink about.md). I have a shortcode that does this for me, so I know it’s feasible (I call it x). What follows is formatted for readability (in use, it has to be a single line to look anything remotely good):

{{ $ref := .Get "ref" }}
{{ $title := .Get "title" }}
{{ $inner := .Inner }}
{{ range first 1 (where .Page.Node.Site.Pages "Name" $ref) }}
  {{ if len $title }}
    <a href="{{ .Permalink }}">{{ $title }}</a>
  {{ else }}
    {{ if len $inner }}
      <a href="{{ .Permalink }}">{{ $inner }}</a>
    {{ else }}
      <a href="{{ .Permalink }}">{{ .Title }}</a>
    {{ end }}
  {{ end }}
{{ end }}

I can see about working up a patch to blackfriday that covers the duplicated header in a single file issue—that shouldn’t be too hard (it means maintaining a bit of extra state, but nothing fancy). I’m a bit less sanguine about adding a prefix or suffix for header generation, although that wouldn’t be too hard, either. My problem with that is what it ends up doing to the Hugo interface.

@bep

This comment has been minimized.

Copy link
Member

bep commented Oct 30, 2014

I see your points and troubles; I guess this is a problem best solved with a good night sleep :-)

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 1, 2014

@bjornerik Any thoughts on how we should resolve this? I’m going to work on a blackfriday patch to fix the in-document duplicate problem, but the cross-document duplicate is 100% a Hugo problem that will require assistance from blackfriday to solve.

I don’t think that the cross-document issue a big deal, to be honest, because:

  1. It only affects list views, and
  2. Most themes don’t show a TOC for list pages.
@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 1, 2014

@halostatue One thought I had was the possibility to remove the ids on the list pages?

The problem isn't really the TOC on the list pages, but the duplicate HTML identities. The browers are fine about it (I guess), but validation purists will be annoyed when "half" of their pages go red in the validator.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 1, 2014

I haven’t gotten deep enough into Hugo’s rendering code to know how well it knows the difference between list and single rendering to know whether it could do this. @spf13, do you know whether this is the case? (We could probably get away with it for Summary in any case, stripping any id in a <h{1-6}> view.)

Browsers are fine about it—people write bad HTML all the time, and they still mostly work. The behaviour in a browser, as far as I can tell, is to ignore the existence of all duplicate IDs after the first.

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 1, 2014

From http://programmers.stackexchange.com/questions/127178/two-html-elements-with-same-id-attribute-how-bad-is-it-really:

HTML 4.01 specification says ID must be document-wide unique.

HTML 5 specification says the same thing but in other words. It says that ID must be unique in its home subtree which is basically the document if we read the definition of it.

Again, I believe most validators will complain.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 1, 2014

I’m not disagreeing with you; I’m just saying that browsers have to deal with that all the time because of hand-written HTML and bad tutorials. You’re more likely to run into problems with jQuery & the like because of this.

I’m certain that I can strip out header IDs if I know that it’s a list context.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 2, 2014

With respect to the first issue, I have opened russross/blackfriday#126 with one implementation and a suggestion for another implementation that I consider problematic (I’ve explained in the commit and pull-request).

I’m leaning toward your suggestion of stripping any IDs in <hn> flags rather than implementing a prefix or suffix mechanism. (I looked at doing that in blackfriday, but I wasn’t sure if such a mechanism should apply only to automatically generated IDs or to all IDs. If it doesn’t apply to all IDs, then explicit IDs have the same sort of collision issue across documents—and that just feels too “magic” to me.

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Nov 2, 2014

I really love this functionality, but also agree with the points made here on both sides. I'm going to wait to merge this until we've solved the duplicate ID issue. That can be in this PR or in another, but I think we should wait for that fix before adding this.

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Nov 2, 2014

@halostatue Hugo isn't setup to handle things very differently between the list and single renderings. I think this is best implemented with a post rendering filter (but I could be wrong).

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 2, 2014

So… something like {{ .Content | stripHeaderIds }}, @spf13? If that’s the way, I’ll figure out how to do that sometime tomorrow.

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Nov 2, 2014

I was thinking more like a transformer... see https://github.com/spf13/hugo/tree/master/transform

An example of one is https://github.com/spf13/hugo/blob/master/transform/livereloadinject.go

That said, I haven't thought through it very much, but I think it would probably be the best place for it. The transform subsystem was designed for post processing actions like this.

One issue there though is that they operate on all posts, not just a set of them. I was thinking of it being an automatic thing where it checks to see for duplicate Ids and appends something to the dupes.... perhaps the post title for instance.

While the stripHeaderIds as a template filter has some merit, I typically avoid any option which requires user action without decision. In this case 100% of users want to avoid duplicate ids. There isn't a decision to make so we shouldn't require them to do something. I also think it's an obscure enough thing that most won't take the time to learn about it.

Another idea is that for all listings we append the post title to the ids... not sure where we would implement that though.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 2, 2014

A list’s top-level . is a &hugo.Node, whereas a post’s top-level . is a &hugo.Page. If we could execute the sort of transform you’re talking about only on non-Page nodes, then I think we can do something smart.

As I said in russross/blackfriday#126, I think that the naive per-page approach in blackfriday is probably sufficient (most markdown documents are fairly short and unique). If we spat out a warning if we recognized a duplicate header (perhaps by reading the TOC) on a page, we might be able to even not need the naive per-page approach added in the blackfriday PR.

In Hugo, we only need to really worry about multiple-document single-rendering duplication (e.g., a Hugo list context), and that’s maybe the point of the transform you’ve suggested, where we either strip the IDs (which I think is more useful in a list context) or we modify the IDs to include the Page.UniqueId on Node-driven pages (maybe harder?).

(As an aside, I have an idea for putting the TOC into a map[string]interface{} structure in any case, because in my use of the documentation I always start with h2-type headers because the Page.Title is rendered as an h1 header—and I want to be able to strip out that empty first layer that blackfriday generates.)

@halostatue halostatue force-pushed the halostatue:auto-header-ids branch 2 times, most recently Nov 5, 2014

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Nov 13, 2014

@halostatue Where does this PR stand?

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 13, 2014

Pretty much where it stood before. I want to try to push forward, but I don't know how to fix the duplicate ID problem (and I haven't moved on the blackfriday PR, either).

I’ll probably have time to look at it next week during RubyConf.

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 14, 2014

@halostatue you will have plenty of time waiting for the gems to download :-)

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 16, 2014

I’ve got a writeup on my blog.

Basically:

  1. I’m going to revisit the blackfriday implementation to have the smarter collision detection I described, except that I’m going to apply it to all header IDs, not just the automatically generated ones. (I may also shift the uniqueness detection to the HTML renderer rather than the parser, as it’s the HTML result that cares about the uniqueness more than other renderers.)
  2. I’m going to also add a blackfriday request to support a header ID suffix. This will be applied to all header IDs, generated or otherwise (including the automatically created #toc_* headers from the table of contents support).
  3. I’m going to revisit this when that is done to append our file ID as the header ID suffix, just as I did for footnote uniqueness.
  4. I’m going to move forward with the {{ .Ref }} and {{ .RelRef }} functions that I suggested on discuss, because no one can add fragment links without it.

It’s imperfect, but I don’t really see a better way to do it.

@halostatue halostatue force-pushed the halostatue:auto-header-ids branch Nov 23, 2014

Enable descriptive header IDs.
Enable blackfriday.EXTENSION_AUTO_HEADER_IDS to generate the name of the
header ID from the text in the header. Works for prefix and underline
headers.

- TOC extraction had to be modified to look for `<li><a href="#`>
  instead of `#toc_` because of this change.
- Fixed a number of tests that depended on the presence of `toc_` with
  as an `id` or as a `href` value.
- Renames the earlier parameter `footnoteref` to `documentId` as it more
  accurately represents the nature of the parameter. The `documentId` is
  appended to all generated headers through the new HTML renderer
  parameter `HeaderIDSuffix`.

@halostatue halostatue force-pushed the halostatue:auto-header-ids branch to af8b1c5 Nov 24, 2014

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 24, 2014

  • I have just updated russross/blackfriday#126 with the more robust internal document collision detection. I expect to add a PR for header ID prefixes and suffixes tomorrow. When all of the checkbox items in my earlier comment are complete, this should be ready for merge.
  • I have added russross/blackfriday#129 (which incorporates the new version of russross/blackfriday#126). I’ll be making modifications to this PR to add the file ID as a suffix. To test this yourself, make sure your locally cached version of russross/blackfriday has my repo (halostatue/blackfriday) as a remote and use the header-prefixes-and-suffixes branch before doing go build or go install for Hugo.
  • I have refreshed this code to use these new features, on the assumption that they will be merged.
@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 24, 2014

Note that this will now fail builds until russross/blackfriday is updated with my two pending PRs.

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 24, 2014

These were merged. @spf13, can you kick build 765 on Travis to rebuild?

@spf13

This comment has been minimized.

Copy link
Contributor

spf13 commented Nov 24, 2014

Merged as 8f9cea7

@spf13 spf13 closed this Nov 24, 2014

@halostatue

This comment has been minimized.

Copy link
Contributor

halostatue commented Nov 25, 2014

Awesome, thanks. I am finishing the last part, which is .Rel and .RelRef with equivalent shortcodes.

@bjornerik, this requires some tweaking of your wrapper code because of where the ref shortcode is used.

@bep

This comment has been minimized.

Copy link
Member

bep commented Nov 25, 2014

@halostatue no problem - looks like I'll have to wait for the REAL Black Friday before I'll get to do stuff with that one.

@halostatue halostatue deleted the halostatue:auto-header-ids branch Nov 25, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment