Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(v1): strip html from TOC #1762

Merged
merged 1 commit into from
Sep 13, 2019
Merged

feat(v1): strip html from TOC #1762

merged 1 commit into from
Sep 13, 2019

Conversation

ThisIsMissEm
Copy link
Contributor

Motivation

As a developer using kafkajs, I was annoyed by their TOC links constantly being broken or formatted totally unreadably, so I decided to write a fix.

Have you read the Contributing Guidelines on pull requests?

yes, CLA signed.

Test Plan

Added test cases to verify the output is correct; haven't yet run Docusaurus on itself, but the test cases cover what's changed.

$ jest packages/docusaurus-1.x/lib/core/__tests__/toc.test.js

What changed

The approach here is to first strip the HTML from the heading's content, then rendered it with markdown to get the HTML content for the TOC entry, then to strip the HTML from the rendered content again, as to get the text for the TOC entry's link.

⚠️Adds an additional dependency of striptags (MIT licensed)

closes issue #1703, tulios/kafkajs#450

Example Output

Given the heading of:

## <a name="foo"></a> Foo

This change will now mean the TOC Entry for that heading will be:

{
  hashLink: 'foo',
  rawContent: '<a name="foo"></a> _Foo_',
  content: '<em>Foo</em>',
  children: []
}

Previously it would have been:

{
  hashLink: 'a-name-foo-a-_foo_',
  rawContent: '<a name="foo"></a> _Foo_',
  content: '&lt;a name=&quot;foo&quot;&gt;&lt;/a&gt; <em>Foo</em>',
  children: []
}

Screenshots

Before After
Screenshot 2019-08-18 at 04 35 36 Screenshot 2019-08-18 at 04 35 26

Related PRs

#1740 — I looked at that PR first, but felt like it was removing functionality by just using rawContent, as this actually would result in, for instance, the <a name="foo"></a> from the above example to be outputted into the document twice, which isn't probably the intention.

The approach here is to first strip the HTML from the heading's content, then rendered it with markdown to get the HTML content for the TOC entry, then to strip the HTML from the rendered content again, as to get the text for the TOC entry's link.

Adds an additional dependency of striptags (MIT licensed)

Example TOC Entry, given the heading of:

```markdown
```

```javascript
{
  hashLink: 'foo',
  rawContent: '<a name="foo"></a> _Foo_',
  content: '<em>Foo</em>',
  children: []
}
```

Previously this TOC entry would be:

```javascript
{
  hashLink: 'a-name-foo-a-_foo_',
  rawContent: '<a name="foo"></a> _Foo_',
  content: '&lt;a name=&quot;foo&quot;&gt;&lt;/a&gt; <em>Foo</em>',
  children: []
}
```

closes issue #1703
@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Aug 18, 2019
@docusaurus-bot
Copy link
Contributor

Deploy preview for docusaurus-2 ready!

Built with commit fe3ceee

https://deploy-preview-1762--docusaurus-2.netlify.com

@ThisIsMissEm
Copy link
Contributor Author

It's also perhaps good to note that if the heading has markdown in the middle of it, e.g.,

## Using this _awesome_ tool

Then the hash link will be using-this-awesome-tool, i.e., markdown doesn't result in additional dashes in the URL

@docusaurus-bot
Copy link
Contributor

Deploy preview for docusaurus-preview ready!

Built with commit fe3ceee

https://deploy-preview-1762--docusaurus-preview.netlify.com

@ThisIsMissEm
Copy link
Contributor Author

Also, should this have been a feat or a breaking change? It does mean permalinks to content before this change and after this change won't match.

@Nevon
Copy link

Nevon commented Aug 18, 2019

Fixes #1703

Some context on why this is important for us, that was missing from the discussion in #1740: we manually create anchor tags in our headings because the software we ship contains links back to the documentation within error logs, so we need the links to stay the same even if we end up changing the wording of the heading. This broke with some update of Docusaurus.

The fix looks good to me, but I'm not overly familiar with the internals of Docusaurus.

Copy link
Contributor

@yangshun yangshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThisIsMissEm This looks great, thank you!

We will bump the minor version as it's somewhat breaking for certain users.

P.S. I read your articles before!

@yangshun yangshun merged commit 250a818 into facebook:master Sep 13, 2019
@ThisIsMissEm ThisIsMissEm deleted the fix/1703-html-in-toc branch September 15, 2019 18:53
@ThisIsMissEm
Copy link
Contributor Author

@yangshun hey, can you let me know when the minor version bump is out, that way I can do an PR for the kafkajs project to upgrade us.

@yangshun
Copy link
Contributor

@ThisIsMissEm We just published v1.13.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants