Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip all HTML when getting the title from the first H1 tag #3564

Merged
merged 1 commit into from Feb 8, 2024
Merged

Conversation

oprypin
Copy link
Contributor

@oprypin oprypin commented Feb 4, 2024

Not stripping it was a bug, and also inconsistent with how ToC titles are extracted.

per latest status #3357 (comment)

Not stripping it was a bug, and also inconsistent with how ToC titles are extracted.
@pawamoy
Copy link
Sponsor Contributor

pawamoy commented Feb 5, 2024

Tested locally, no issue detected (mkdocstrings and Material for MkDocs).

@oprypin oprypin merged commit e755aae into master Feb 8, 2024
34 checks passed
@oprypin oprypin deleted the titl2 branch February 8, 2024 18:17
@ofek
Copy link
Contributor

ofek commented Feb 8, 2024

Nice!

@waylan
Copy link
Member

waylan commented Feb 9, 2024

I have been working on Python-Markdown/markdown#1441 in my spare time over the last few days and have not been following any updates on GitHub. After creating my PR, I saw this for the first time. It appears that my changes there would directly affect this here. Sorry if I've thrown a wrench in the works.

For example, I removed the markdown.extensions.toc.stashedHTML2text function, which would break the changes here. Instead, I am running all postprocessors (which was removed in this change). I felt like this was necessary to resolve some third-party markup (like the emoji extension). I still need to flesh out some new tests for my changes, but I have verified that the existing tests (for the Python-Markdown lib) are all passing.

Of note here is that I added a new attribute to the toc_tokens which contains the rich text HTML content of each heading along with the associated TOC data. My thinking was that MkDocs could use this new attribute without needing to parse/step through the entire document. As now Markdown already ensures it is fully rendered, MkDocs doesn't need to be dipping into Markdown's internals to do that itself.

The changes in the linked PR are in a draft, so subject to change at this point. Therefore, any feedback is welcome to ensure we don't break things and/or we provide a useable solution. Of course, before MkDocs can make use of our changes we will need to finalize them and make a release.

@oprypin
Copy link
Contributor Author

oprypin commented Feb 9, 2024

OK I can revert the usage of markdown.extensions.toc.stashedHTML2text and we can find a common solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Header and navigation sidebars display incorrectly arbitrary markup
4 participants