Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Incomplete auto heading ID for extended Latin and CJK characters #6616
What version of Hugo are you using (
Hey, and thanks for investigating this issue. I saw some ID diffs when testing this, but I kind of closed my eyes as there was so much to do...
I also forgat that we have an
What would be great is if you could:
It would be good if you also stated in your README that this "id format" is Blackfriday compatible, so to speak.
@bep : If possible, I'd like to keep the code intact. I think the implementation is functionally equivalent to the algorithm used in Blackfriday (and other library mentioned in its package documentation, not sure which one is original).
Could you provide counter example strings or reasons (performance, etc.) that I have to replace the code with Blackfriday's?
As you are probably aware, the logic of Blackfriday or the extension code is at the very basic level and cannot satisfy more serious needs (e.g. c++ becomes c by both) and my hope is that somebody adds real extended features, such as the slugify I mentioned in the forum.
My point: for any regressions (that are being reported to goldmark, such as incomplete SmartyPants processing), should we borrow code directly from Blackfriday?
And for your second request,
The thing is, that would take time for me, so in reality, we wouldn't know until someone complains. I understand if you don't want to pull someone else's code into your project, it was just a suggestion.
@bep : I understand your concern. If auto heading ID and anchorize function must produce the same output, those two should rely on the same logic.
Judging from the commit, the "other library" is the original and Blackfriday used its code.
So, how would we go about this? Shall we import the original library or copy its code to make an exported function?
I'll follow your suggestion in general, as you're the expert in Golang and Hugo.
@bep , many thanks for implementing the ID generation code to resolve this issue. I've tested with the latest
However, it appears that there are some loose ends regarding the other issue below:
On previous versions of Hugo, period and apostrophe were maintained. As a sample, you can test with the following examples:
There seems to be no set rule for how to handle punctuation, but it would be ideal if the new code can behave on a par with Blackfriday's, so that all the existing outgoing links to anchors in Hugo sites can remain accessible.
P.S. It's regrettable that I've made the subject of this issue too narrow to highlight the punctuation issue.
This issue is closed. If you still think there are things to do here, raise another issue.
One quick note: Hugo's main strategy in this department is to create GitHub compatible anchor links. I didn't find their spec/implementation (I did not look too hard), but I'm pretty sure that the implementation is correct.
So with that in mind, if you find some cases where the anchors differ from GitHub, then I'll look at it.
This situation isn't ideal. I can add a note that I did ask for the Blackfriday implementation (and tests) in a PR related to this, but didn't get much response, so I implemented my own. I may consider adding "Blackfriday anchors" as an option, but that needs some real convincing for me to spend time on.
If you really want to help this situation, go get some movement in https://talk.commonmark.org/t/anchors-in-markdown/247 -- where people have been washing their hands for the last six years. Which is why we have this situation with many different auto ID implementations, some that support the
Hugo has now taken a stand in this: GitHub is a big player and has set the standard in other Markdown related issues (tables etc.), so it is natural to also follow their lead in anchor IDs.