Strip all HTML when getting the title from the first H1 tag #3564

oprypin · 2024-02-04T21:26:55Z

Not stripping it was a bug, and also inconsistent with how ToC titles are extracted.

per latest status #3357 (comment)

Not stripping it was a bug, and also inconsistent with how ToC titles are extracted.

pawamoy · 2024-02-05T13:27:10Z

Tested locally, no issue detected (mkdocstrings and Material for MkDocs).

ofek · 2024-02-08T18:19:05Z

Nice!

waylan · 2024-02-09T17:04:59Z

I have been working on Python-Markdown/markdown#1441 in my spare time over the last few days and have not been following any updates on GitHub. After creating my PR, I saw this for the first time. It appears that my changes there would directly affect this here. Sorry if I've thrown a wrench in the works.

For example, I removed the markdown.extensions.toc.stashedHTML2text function, which would break the changes here. Instead, I am running all postprocessors (which was removed in this change). I felt like this was necessary to resolve some third-party markup (like the emoji extension). I still need to flesh out some new tests for my changes, but I have verified that the existing tests (for the Python-Markdown lib) are all passing.

Of note here is that I added a new attribute to the toc_tokens which contains the rich text HTML content of each heading along with the associated TOC data. My thinking was that MkDocs could use this new attribute without needing to parse/step through the entire document. As now Markdown already ensures it is fully rendered, MkDocs doesn't need to be dipping into Markdown's internals to do that itself.

The changes in the linked PR are in a draft, so subject to change at this point. Therefore, any feedback is welcome to ensure we don't break things and/or we provide a useable solution. Of course, before MkDocs can make use of our changes we will need to finalize them and make a release.

oprypin · 2024-02-09T17:07:39Z

OK I can revert the usage of markdown.extensions.toc.stashedHTML2text and we can find a common solution

oprypin force-pushed the titl2 branch from 3512eed to 55f0514 Compare February 4, 2024 21:27

Strip all HTML when getting the title from the first H1 tag

66a6d8c

Not stripping it was a bug, and also inconsistent with how ToC titles are extracted.

oprypin force-pushed the titl2 branch from 55f0514 to 66a6d8c Compare February 4, 2024 21:38

oprypin merged commit e755aae into master Feb 8, 2024
34 checks passed

oprypin deleted the titl2 branch February 8, 2024 18:17

vedranmiletic mentioned this pull request Feb 11, 2024

Header and navigation sidebars display incorrectly arbitrary markup #3357

Closed

oprypin mentioned this pull request Feb 24, 2024

Full-featured text extraction from the first H1 heading #3578

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strip all HTML when getting the title from the first H1 tag #3564

Strip all HTML when getting the title from the first H1 tag #3564

oprypin commented Feb 4, 2024

pawamoy commented Feb 5, 2024

ofek commented Feb 8, 2024

waylan commented Feb 9, 2024

oprypin commented Feb 9, 2024

Strip all HTML when getting the title from the first H1 tag #3564

Strip all HTML when getting the title from the first H1 tag #3564

Conversation

oprypin commented Feb 4, 2024

pawamoy commented Feb 5, 2024

ofek commented Feb 8, 2024

waylan commented Feb 9, 2024

oprypin commented Feb 9, 2024