Fix anchor mess #2057

Closed
bep opened this Issue Apr 7, 2016 · 16 comments

Comments

Projects
None yet
4 participants
@bep
Member

bep commented Apr 7, 2016

See #2056

This will be a breaking change (well, not as in glass breaking), but I think this will make more sense to most users. Duplicate ids are the special case here, not the other way around.

This is a fairly common support question, too -- so If noone protest, I'm just doing it.

/cc @spf13 @halostatue

@bep bep added the Enhancement label Apr 7, 2016

@bep bep self-assigned this Apr 7, 2016

@bep bep added this to the v0.16 milestone Apr 7, 2016

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 7, 2016

Contributor

If this change is made, we probably need to make it so that plain ID anchors are not used by default on index pages (rather than actual pages). It’s when you might have the body of multiple posts in place that you should have unique anchors.

Contributor

halostatue commented Apr 7, 2016

If this change is made, we probably need to make it so that plain ID anchors are not used by default on index pages (rather than actual pages). It’s when you might have the body of multiple posts in place that you should have unique anchors.

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 7, 2016

Contributor

Also, this should be shouted really loudly in the release notes—someone who expects the current behaviour needs to know to change the default back.

Contributor

halostatue commented Apr 7, 2016

Also, this should be shouted really loudly in the release notes—someone who expects the current behaviour needs to know to change the default back.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Apr 7, 2016

Member

If this change is made, we probably need to make it so that plain ID anchors are not used by default on index pages (rather than actual page)

Yes, that would be nice, but that would mean double-rendering of the content page, and I'm reluctant to go that path. Lets think a little on this; I agree about the big bold notice in release notes, I think the sites that needs this are pretty rare.

Member

bep commented Apr 7, 2016

If this change is made, we probably need to make it so that plain ID anchors are not used by default on index pages (rather than actual page)

Yes, that would be nice, but that would mean double-rendering of the content page, and I'm reluctant to go that path. Lets think a little on this; I agree about the big bold notice in release notes, I think the sites that needs this are pretty rare.

@octref

This comment has been minimized.

Show comment
Hide comment
@octref

octref Apr 10, 2016

Contributor

I don't think anyone would use footnotes in non-post pages.
Not having crazy URL for footnote is my preference but I feel it's a more sensible default.

Contributor

octref commented Apr 10, 2016

I don't think anyone would use footnotes in non-post pages.
Not having crazy URL for footnote is my preference but I feel it's a more sensible default.

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 10, 2016

Contributor

@octref You would be wrong. The reason we implemented the hashed anchors for footnotes & other anchors is specifically because people (18 months ago) were saying that their homepage wasn’t validating because they had multiple anchors and footnote anchors showing up on their homepage.

If you use numeric footnotes, and your main page renders multiple posts in full (which was, and probably still is, the default of the default Hugo template, although my site template does not do that), then you will have conflicting anchors and you probably want to have these uniquely generated anchors. The point is that aside from controlling how many posts render in full (or even render in part if there is a footnote before the break)…this may happen even for sites you don’t think it will happen for.

I agree with @bep that this is probably the right change, and that it will confuse people who don’t have this problem, but this code was introduced to fix a very specific problem that was pointed out. (And, BTW, it happens even without footnotes if you use headings so that the anchors to headings are repeated across posts.)

I don’t see a really clean way to fix this. If we’re willing to go “unclean”, we can consider always generating with the unique footnotes, but doing a post-processing step on the render of those anchors and links before rendering them to a Page node (e.g., the equivalent of output.gsub(/(#fnref:)[0-9a-f]{32}:(\d+)/, '\1\2'), but it would be more complex than that, of course).

Contributor

halostatue commented Apr 10, 2016

@octref You would be wrong. The reason we implemented the hashed anchors for footnotes & other anchors is specifically because people (18 months ago) were saying that their homepage wasn’t validating because they had multiple anchors and footnote anchors showing up on their homepage.

If you use numeric footnotes, and your main page renders multiple posts in full (which was, and probably still is, the default of the default Hugo template, although my site template does not do that), then you will have conflicting anchors and you probably want to have these uniquely generated anchors. The point is that aside from controlling how many posts render in full (or even render in part if there is a footnote before the break)…this may happen even for sites you don’t think it will happen for.

I agree with @bep that this is probably the right change, and that it will confuse people who don’t have this problem, but this code was introduced to fix a very specific problem that was pointed out. (And, BTW, it happens even without footnotes if you use headings so that the anchors to headings are repeated across posts.)

I don’t see a really clean way to fix this. If we’re willing to go “unclean”, we can consider always generating with the unique footnotes, but doing a post-processing step on the render of those anchors and links before rendering them to a Page node (e.g., the equivalent of output.gsub(/(#fnref:)[0-9a-f]{32}:(\d+)/, '\1\2'), but it would be more complex than that, of course).

@octref

This comment has been minimized.

Show comment
Hide comment
@octref

octref Apr 10, 2016

Contributor

@halostatue
I didn't even find a page in the showcase which renders posts in full in homepage.

As you said, that is a specific need, so it would probably be better to let those with specific needs (< 5%) set the option to get the specific behavior they want.

Contributor

octref commented Apr 10, 2016

@halostatue
I didn't even find a page in the showcase which renders posts in full in homepage.

As you said, that is a specific need, so it would probably be better to let those with specific needs (< 5%) set the option to get the specific behavior they want.

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 10, 2016

Contributor

When this was made (~ 0.12 => 0.13), the default Hugo theme rendered every post in full on the homepage. I have no clue if it does anymore, because I don’t use the default Hugo theme for my site and haven’t had to worry about it for a while.

Also, although my site isn’t in the showcase (lots of reasons), I render the first post in full on the homepage, then the next 9 posts in part, and then the rest of the posts as title-only. So…even my site could result in the problem described if I had two posts that I used numeric footnotes on before the break.

Contributor

halostatue commented Apr 10, 2016

When this was made (~ 0.12 => 0.13), the default Hugo theme rendered every post in full on the homepage. I have no clue if it does anymore, because I don’t use the default Hugo theme for my site and haven’t had to worry about it for a while.

Also, although my site isn’t in the showcase (lots of reasons), I render the first post in full on the homepage, then the next 9 posts in part, and then the rest of the posts as title-only. So…even my site could result in the problem described if I had two posts that I used numeric footnotes on before the break.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Apr 10, 2016

Member

The alternative would be to add a page ID that is "less ugly" (a counter) ... That is maybe better.

Member

bep commented Apr 10, 2016

The alternative would be to add a page ID that is "less ugly" (a counter) ... That is maybe better.

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 11, 2016

Contributor

A counter incremented for each page rendered? Could be interesting.

Contributor

halostatue commented Apr 11, 2016

A counter incremented for each page rendered? Could be interesting.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Apr 11, 2016

Member

A counter incremented for each page rendered? Could be interesting.

I thought about adding it to Node (so every page, home page etc. have it) -- in normal cases it should be a pretty low integer. It would be useful in other situations too, I guess.

Member

bep commented Apr 11, 2016

A counter incremented for each page rendered? Could be interesting.

I thought about adding it to Node (so every page, home page etc. have it) -- in normal cases it should be a pretty low integer. It would be useful in other situations too, I guess.

@bep bep referenced this issue Apr 11, 2016

Closed

Add Node.ID #2071

@bep bep changed the title from Make plainIDAnchors default true to Use Node.ID for anchor ID Apr 11, 2016

@bep bep closed this in cd55895 Apr 11, 2016

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 11, 2016

Contributor

One problem: this is a breaking change that causes potentially stored URLs to no longer be valid. It’s not a big deal because it’s just related to anchors within a page, but it’s worth noting.

Contributor

halostatue commented Apr 11, 2016

One problem: this is a breaking change that causes potentially stored URLs to no longer be valid. It’s not a big deal because it’s just related to anchors within a page, but it’s worth noting.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Apr 11, 2016

Member

The "not longer valid" doesn't concern me too much; but it would nice if the anchor links were bookmarkable -- I have broken that, so reopen this for some rethinking. Look for an id that is stable AND looks good.

Member

bep commented Apr 11, 2016

The "not longer valid" doesn't concern me too much; but it would nice if the anchor links were bookmarkable -- I have broken that, so reopen this for some rethinking. Look for an id that is stable AND looks good.

@bep bep reopened this Apr 11, 2016

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 12, 2016

Contributor

The nice thing about the md5 is that it is based on the path to the file; it is 100% stable (maybe—I don’t recall off the top of my head whether it uses the partial path or the full path). I would suggest a couple of changes (conceptually; I don’t have time to work on the code right now, sorry):

  1. Go back to a string representation of the DocumentID.
  2. As requested, make plainIdAnchors = true by default and announce it loudly because of the backwards incompatibility involved.
  3. Add another configuration option, uniqueIdFormatter. This would default to md5(the current representation). It could also be md5-base64 (this would shrink the representation from 32 to 24 characters). There may be other options as well. We could probably get away with a CRC32 representation of the path, but that is still 10 characters.
Contributor

halostatue commented Apr 12, 2016

The nice thing about the md5 is that it is based on the path to the file; it is 100% stable (maybe—I don’t recall off the top of my head whether it uses the partial path or the full path). I would suggest a couple of changes (conceptually; I don’t have time to work on the code right now, sorry):

  1. Go back to a string representation of the DocumentID.
  2. As requested, make plainIdAnchors = true by default and announce it loudly because of the backwards incompatibility involved.
  3. Add another configuration option, uniqueIdFormatter. This would default to md5(the current representation). It could also be md5-base64 (this would shrink the representation from 32 to 24 characters). There may be other options as well. We could probably get away with a CRC32 representation of the path, but that is still 10 characters.
@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Apr 12, 2016

Member

This whole relref anchor stuff was a mistake to pull in in the first place, esp. since the author "doesn't have time" to follow up on it. But I guess it is to late to pull all of it out.

Member

bep commented Apr 12, 2016

This whole relref anchor stuff was a mistake to pull in in the first place, esp. since the author "doesn't have time" to follow up on it. But I guess it is to late to pull all of it out.

@bep bep changed the title from Use Node.ID for anchor ID to Fix anchor mess Apr 12, 2016

@bep bep closed this in 06772ee Apr 12, 2016

@halostatue

This comment has been minimized.

Show comment
Hide comment
@halostatue

halostatue Apr 12, 2016

Contributor

I’ll be happy to look at the changes that I outlined in the future, but I am completely swamped with the day job right now. I do miss working on Hugo.

Contributor

halostatue commented Apr 12, 2016

I’ll be happy to look at the changes that I outlined in the future, but I am completely swamped with the day job right now. I do miss working on Hugo.

@anthonyfok

This comment has been minimized.

Show comment
Hide comment
@anthonyfok

anthonyfok Apr 30, 2016

Member

The alternative would be to add a page ID that is "less ugly" (a counter) ... That is maybe better.

A counter incremented for each page rendered? Could be interesting.

I thought about adding it to Node (so every page, home page etc. have it) -- in normal cases it should be a pretty low integer. It would be useful in other situations too, I guess.

While working on #1970, I just discovered that mmark (originally forked from Blackfriday) has implemented EXTENSION_UNIQUE_HEADER_IDS with the comment:

// When detecting identical anchors add a sequence number -1, -2 etc."

So, probably another way to deal with this (now closed) issue is to port this new mmark feature back to Blackfriday.

Member

anthonyfok commented Apr 30, 2016

The alternative would be to add a page ID that is "less ugly" (a counter) ... That is maybe better.

A counter incremented for each page rendered? Could be interesting.

I thought about adding it to Node (so every page, home page etc. have it) -- in normal cases it should be a pretty low integer. It would be useful in other situations too, I guess.

While working on #1970, I just discovered that mmark (originally forked from Blackfriday) has implemented EXTENSION_UNIQUE_HEADER_IDS with the comment:

// When detecting identical anchors add a sequence number -1, -2 etc."

So, probably another way to deal with this (now closed) issue is to port this new mmark feature back to Blackfriday.

tychoish added a commit to tychoish/hugo that referenced this issue Aug 13, 2017

tychoish added a commit to tychoish/hugo that referenced this issue Aug 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment