Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(v2): markdown reference to file should not be page not found #2064

Merged
merged 1 commit into from
Nov 29, 2019

Conversation

endiliey
Copy link
Contributor

Motivation

See https://v2.docusaurus.io/feedback/p/ability-to-download-static-assets-such-as-pdfs

In v1 it was possible to link to pdfs and other non-img static files in markdown using the syntax [file name] (/path/to/asset.pdf). When the user would click this link, the browser would trigger a download. In v2, docusaurus routing does not allow this and results in a 404, even though hitting my-site.com/path/to/asset.pdf directly will download the file just fine.

It's because we replace all a in MDX with Link. But we should've checked if its a file first. 99.99% webpage url that links to file will have .xxx notation in last part of URL

There might be edge cases like where some bad user has a url like /docs/docusaurus.png in which its a valid internal URL (not a file), its still OK to use as it only causes page refresh.

Have you read the Contributing Guidelines on pull requests?

yes

Test Plan

Before
before - static assets

After

after

@endiliey endiliey added the pr: bug fix This PR fixes a bug in a past release. label Nov 28, 2019
@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Nov 28, 2019
@docusaurus-bot
Copy link
Contributor

Deploy preview for docusaurus-2 ready!

Built with commit 457e867

https://deploy-preview-2064--docusaurus-2.netlify.com

@docusaurus-bot
Copy link
Contributor

Deploy preview for docusaurus-preview ready!

Built with commit 457e867

https://deploy-preview-2064--docusaurus-preview.netlify.com

Copy link
Contributor

@lex111 lex111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the disadvantages if this check move to the Link component?

return !targetLink || !isInternal ? (
// eslint-disable-next-line jsx-a11y/anchor-has-content
<a {...props} href={targetLink} />

Copy link
Contributor

@yangshun yangshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be edge cases like where some bad user has a url like /docs/docusaurus.png in which its a valid internal URL (not a file), its still OK to use as it only causes page refresh.

How about we have a convention that and any path starting with "public" gets whitelisted from client-side navigations? Users can then put in their PDF files within website/static/public.

Do we have all the possible routes on the client side? If we do we can know which paths are not part of the app and then fallback to full page navigation.

On a side note, the TOC for the Client API page is broken... I think it's treating the contents as a component to be rendered.

@wgao19
Copy link
Contributor

wgao19 commented Nov 29, 2019

i feel parsing can be more evil than maintaining a known type of links that are supposed to be anchors

@endiliey
Copy link
Contributor Author

endiliey commented Nov 29, 2019

What will be the disadvantages if this check move to the Link component?

More fine grain control. So note that this PR only changes the behavior of changing syntax in markdown file.

So that [some link](/some/url]) became

<a href="/some/url">Some link</a>

or

<Link to="/some/url">Some link</Link>

So if we move it to Link, if some bad people have a valid /docs/docusaurus.png (not a file), we are going to make them have a page redirect refresh for this. When they especially already use <Link /> example in src/pages/index.js.

How about we have a convention that and any path starting with "public" gets whitelisted from client-side navigations? Users can then put in their PDF files within website/static/public.

I feel that's gonna be more hard (to teach) and might cause lot of parsing problems. Like what if their baseUrl itself is /public or the docs routebasepath is /public. Alright that might be okay, but teaching this kind of thing (which is not a standard) can be hard.
Actually we can get away with better convention like using https: in the starting path so that it won't use client side nav

Do we have all the possible routes on the client side? If we do we can know which paths are not part of the app and then fallback to full page navigation.

We do, but I see lot of problems for these. Like if we have 2000 routes, we gonna have to try and match if there's any matching URL, for every single Link.
And everyone's route can be dynamic. For example: /docs/some-typo-url is in fact will match nested routes /docs/:route. What if my static file is located in website/docs/superman.png. It will be back to this problem Page Not Found.

Another dynamic example is https://v2.docusaurus.io/feedback/p/support-for-markdown-to-do-list-feature

/feedback/p/xxxx is not a valid route in Docusaurus. It will match the wildcard "*" page not found component, but I put a hack in

function NotFound({location}) {
if (/^\/feedback/.test(location.pathname)) {
return <Feedback />;
}

such that if its /feedback/xxx, i render Feedback component.
I can point out a lot more problems such as all the Link component will depends on routes, causing hot reload performance issues and so on.

On a side note, the TOC for the Client API page is broken... I think it's treating the contents as a component to be rendered.

Oh i guess its separate from this PR. Current v2 site also has it.

Here's my recommendation:

  • Use this PR and call it a day, it should solve most of the described problem, the only downside of this PR is that it might causes page refresh (using a instead of Link if the markdown syntax used is [some link](/some/url.xxx). However, a good website should not have a .xxx url in their last part of url as valid route :x (not a file)
  • Or, tell user to use https:// convention, since that forces not using client side.

Any other recommendation is appreciated

Edit: We did it for v1 routing, but blacklisting .html and since its an express.js, routing we start with ^\/ regex (internal only)

function dotfiles() {
return /(?!.*html$)^\/.*\.[^\n/]+$/;
}

Copy link
Contributor

@yangshun yangshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then 😄

@yangshun yangshun merged commit 3430fbf into master Nov 29, 2019
@endiliey
Copy link
Contributor Author

Oh we are one such bad user https://v2.docusaurus.io/docs/docusaurus.config.js is not a file 😭😭😭😭

@yangshun yangshun deleted the endi/markdown-linking branch November 29, 2019 04:18
@yangshun
Copy link
Contributor

Yes, see the edit history of my comment 🤣. I wrote it without realizing you already considered it, then I removed it.

We cannot assume that our internal links don't have a period in them. For example, we have a page docusaurus.config.js and our client API page links to it.

The consequence is mainly a full-page refresh, which still works but not sure if there's a better way around it.

@endiliey
Copy link
Contributor Author

The good thing is that the page refresh is on markdown linking. using Link still works perfectly 🤣

@yangshun
Copy link
Contributor

Yeah! At first I was trying out the docs bottom navigation and was wondering why no full-page refresh lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: bug fix This PR fixes a bug in a past release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants