Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(v2): broken link checker should not report false positives when using encoded chars #4407

Merged
merged 1 commit into from
Mar 12, 2021

Conversation

Harvtronix
Copy link
Contributor

@Harvtronix Harvtronix commented Mar 12, 2021

Motivation

I have a project in which I did not want the burden of updating the sidebar.js file to fall upon each contributor, nor did I want them to have to add frontmatter to their files. My solution to that was to export a JSON structure which is the result of walking the docs dir and using all located files and directories as the doc and category identifiers. This all works totally fine and users can happily add markdown files with any name and they will show up 1-to-1 on the generated site.

For example, they may want to have a folder/page called Environment Setup/VS Code, so they would create a file: docs/Environment Setup/VS Code.md.

The problem: When users want to link between pages, they create a link in the usual way:

[FAQ](./FAQ)

But if the destination page has spaces in its path, they'd have to instead write

[VS Code Setup](./Environment%20Setup/VS%20Code)

(since writing this without encoding the spaces is not valid markdown).

These encoded URIs do not resolve to the files on the filesystem during a production Docusaurus build. You end up getting the "Docusaurus found broken links!" message.

It looks like wrapping the URI in angle brackets is another option ([VS Code Setup](<./Environment Setup/VS Code)>), however it looks like support for this type of markdown is not universal.

This fix allows the links to be resolved during the build by calling decodeURI() on the markdown link provided to brokenLinks.ts#isBrokenLink, resulting in the unescaping of characters such as spaces, while not also unescaping things like '/', which could potentially be a bit more dangerous (from my limited perspective).

Have you read the Contributing Guidelines on pull requests?

Yes.

Test Plan

I wrote Jest tests for the following cases:

  • Validate that URIs with encoded spaces are properly resolved to files in the routes list.
  • Validate that URIs with encoded spaces that don't point to valid files are still properly placed in the resulting list of broken links.
  • Validate that characters outside of the scope of decodeURI/encodeURI are not altered.
  • Validate that a file containing a URI-encoded character sequence can still be matched.

Regression Considerations

Since basically any character is allowable in a unix filename, if a user had a file in the docs folder named hello%20world.md, then blindly using decodeURI against all links would cause isBrokenLink to no longer find a route matching hello%20world.md. It would instead look for a file named hello world.md, which won't actually match any file on the filesystem. To combat this, I allow the candidate routes to either be ones obtained by using decodeURI or by using the traditional method of no decoding.

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/facebook/docusaurus, and link to your PR here.)

In order to create markdown links to URIs containing spaces, an
encoded space (%20) must be used. These types of links were not
properly resolved to doc files. This fix allows them to be
resolved by calling `decodeURI` on the links found in markdown files to
unescape characters such as spaces.
@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Mar 12, 2021
@netlify
Copy link

netlify bot commented Mar 12, 2021

@netlify
Copy link

netlify bot commented Mar 12, 2021

Deploy preview for docusaurus-2 ready!

Built without sensitive environment variables with commit 51e4e46

https://deploy-preview-4407--docusaurus-2.netlify.app

@github-actions
Copy link

⚡️ Lighthouse report for the changes in this PR:

Category Score
🟠 Performance 58
🟢 Accessibility 96
🟢 Best practices 100
🟢 SEO 100
🟢 PWA 95

Lighthouse ran on https://deploy-preview-4407--docusaurus-2.netlify.app/classic/

@slorber
Copy link
Collaborator

slorber commented Mar 12, 2021

Thanks, make sense!

I have a project in which I did not want the burden of updating the sidebar.js file to fall upon each contributor, nor did I want them to have to add frontmatter to their files. My solution to that was to export a JSON structure which is the result of walking the docs dir and using all located files and directories as the doc and category identifiers. This all works totally fine and users can happily add markdown files with any name and they will show up 1-to-1 on the generated site.

That looks like a feature we'd like to work on soon: #3464

@slorber slorber changed the title fix: resolve markdown links wtih encoded spaces fix(v2): broken link checker should not report false positives when using encoded chars Mar 12, 2021
@slorber slorber added the pr: bug fix This PR fixes a bug in a past release. label Mar 12, 2021
@slorber slorber merged commit 735b3b3 into facebook:master Mar 12, 2021
@lex111 lex111 added this to the v2.0.0-alpha.72 milestone Mar 15, 2021
@Harvtronix Harvtronix deleted the fix-encoded-page-links branch March 15, 2021 23:28
This was referenced Mar 17, 2021
This was referenced Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: bug fix This PR fixes a bug in a past release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants