Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 301 redirect to trailing slash when not specified in createPage.path #19567

Conversation

adrienharnay
Copy link
Contributor

@adrienharnay adrienharnay commented Nov 16, 2019

Description

When using gatsby serve, for pages created with createPage, for a given path /hello there is a 301 redirect to /hello/ even when path does not end with a slash.

The issue can be reproduced on this repository.

The fix is to keep generating public/hello/index.html if createPage.path === '/hello/', but instead to generate public/hello.html if createPage.path === '/hello.

Related Issues

Fixes #19543

@adrienharnay adrienharnay requested a review from a team as a code owner November 16, 2019 23:52
@adrienharnay
Copy link
Contributor Author

Tests pass locally, maybe I missed something?

@blainekasten blainekasten self-assigned this Nov 18, 2019
@@ -7,7 +7,8 @@ const generatePathToOutput = outputPath => {
let outputFileName = outputPath.replace(/^(\/|\\)/, ``) // Remove leading slashes for webpack-dev-server

if (!/\.(html?)$/i.test(outputFileName)) {
outputFileName = path.join(outputFileName, `index.html`)
outputFileName =
outputFileName + `${outputFileName.endsWith(`/`) ? `index` : ``}.html`
Copy link
Contributor

@pieh pieh Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is dangerous - it relies on hosting matching /path/something to /path/something.html which I'm not sure how common is (probably less common than matching index.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any ideas on how we could test this? Are you just thinking this is a difference in OS's potentially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point. Do you have any other idea to avoid the 301 redirect? For example, loading this page triggers it: https://reactjs.org/languages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any ideas on how we could test this?

Other than trying it out on top X of hosting services/software (or reading docs for them), I see no other way :(

Ah, I see your point. Do you have any other idea to avoid the 301 redirect?

Ideally we would only change gatsby serve here and not the output that Gatsby produces right now (that change have breaking potential). Ideally we could configure express.static to not do redirects, but last time I checked this wasn't possible. We might need to:

  1. make feature request to serve-static to make 301 behaviour configurable (so it use regular 200 response instead of 301 for those paths)
  2. vendor serve-static and apply patch we need ourselves?
  3. or look for something that will not do 301 instead of serve-static?

(serve-static is package used by express.static)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, fact that we need to change express.static config to support new format is what I worry about with this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the implications... And I see why my changes are too dangerous.

What about... looping through the public folder instead of using express.static()? Pseudo-code:

const serve = (dir = './public') => {
  const folders = listFolders(dir);
  const files = listFiles(dir);

  for each folder in folders {
    serve(dir + folder);
  }

  [custom logic to serve HTML files without trailing slash]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few additional alternatives:

  1. We can add "200 redirect" ourselves in express middleware before we register express.static:
  router.use((req, res, next) => {
    if (!/.+\..+$/g.test(req.url) && req.url.slice(-1) !== '/') {
      req.url = `${req.url}/`;
    }
    next();
  })

this would "200 redirect non-slash non-extension requests"

Problem is that those might be actual files :/

  1. We can use redirect: false (and maybe fallthrough: true) in option for express.static and add new route handler after express.static to try to send file on our own?
  router.use((req, res, next) => {
    if (!/.+\..+$/g.test(req.url) && req.url.slice(-1) !== '/') {
      // let's try adding trailing `/index.html`, see if file exist and send it if it does
    } else {
      next();
    }
  })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good idea. I just updated the PR, and also updated the test repository, it works well! Just re-install node_modules and run patch-package to test it.

const matchPaths = await readMatchPaths(program)
router.use(matchPathRouter(matchPaths, { root }))
router.use((req, res, next) => {
if (!/.+\..+$/g.test(req.url) && req.url.slice(-1) !== `/`) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity, could we store this regex in a variable that has a descriptive name? mentally grokking this regex is a bit confounding 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or at least add a comment above explaining what this is doing, what it's solving, etc. I'm just worried about the long term maintenance of this fix getting lost in 6-12m

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is copied from my examples, and this is probably not good test to have in retrospect (it infer that resource should be file if it has . in it, but that's true at all), so we should remove at least first part of condition leaving only test for trailing slash.

I wonder what we will get now for legitimate 404s tho, now that serve-static won't immediately return 404

@blainekasten
Copy link
Contributor

@adrienharnay any chance for an update?

@adrienharnay
Copy link
Contributor Author

Hey, sorry for being silent so long! I've updated the PR (and the example repository), and tested that the 404s behave the same before and after the patch (they do). Please let me know if I can do anything else :)
cc. @blainekasten @pieh

Copy link
Contributor

@blainekasten blainekasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I'll get this merged and released soon! Thanks for your hard work and going back and forth with us @adrienharnay !

@blainekasten
Copy link
Contributor

blainekasten commented Dec 3, 2019

@adrienharnay unfortunately, it looks like this fix is breaking some other things. Our e2e tests for image path prefixes is breaking. Here's a screenshot from the issue.

image

I suppose we probably should check that the file exists before sending it?

@adrienharnay
Copy link
Contributor Author

adrienharnay commented Dec 7, 2019

Maybe the tests are testing an implementation detail of the previous 404 page, and now it's changed so the test fails? Can you give me pointers on what is exactly tested here please, not sure I understand the error message

@wardpeet
Copy link
Contributor

wardpeet commented Dec 9, 2019

I'm not 100% sure this is the correct path for gatsby serve. When running gatsby build and you upload to a production server, by default, it will need /hi/ as a URL. It looks like gatsby develop isn't mimicking gatsby serve's/production behavior. So, I would suggest adding a 301 to develop instead.

I think this approach also fixes your issue.

The above comment is just my opinion, so I'm happy to hear your views.

if (req.url.slice(-1) !== `/`) {
res.sendFile(req.url + `/index.html`, {
root,
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the file doesn't exist? A callback argument can be added here to deal with this case and allow for the regular 404 logic to occur:

res.sendFile(...., (error) => {
  if (error) next()
  // if there's no error, request.end() has been called and there's nothing to do.
})

@pvdz
Copy link
Contributor

pvdz commented Apr 20, 2020

@adrienharnay @wardpeet the diff in this PR is rather small. Can we come to a conclusion on what to do here? Merge or close the PR?

@adrienharnay
Copy link
Contributor Author

Well, I'm still convinced the choice to force trailing slashes or not should belong to the user, and this PR solves this. But I won't lie, the lag on this PR got me pretty demotivated, and I now depend on a fork. If the maintainers want me to finalize the PR so that it can get merged, I'll gladly do it though.

@wardpeet
Copy link
Contributor

I've been playing with this PR a bit and what a "default" webserver does. I'm wondering why does gatsby serve needs to mimic the same solution as a host like netlify? Is it e2e tests?

My demo of a default nginx/apache server running on Plesk gives me the same behaviour as serve gives me.
https://gatsby-blog-no-slash.wardpeet.dev/new-beginnings

@fabiosantoscode
Copy link

fabiosantoscode commented Apr 27, 2020

@wardpeet I had a look at what's happening at the HTTP level. Gatsby and the server disagree on whether a trailing slash is needed.

fabio@fabio-thinkpad ♥  curl -I https://gatsby-blog-no-slash.wardpeet.dev/new-beginnings/
HTTP/1.1 200 OK
[...]

fabio@fabio-thinkpad ♥  curl -I https://gatsby-blog-no-slash.wardpeet.dev/new-beginnings
HTTP/1.1 301 Moved Permanently
[...]
Location: https://gatsby-blog-no-slash.wardpeet.dev/new-beginnings/

So really gatsby is hiding the redirects the server is doing by doing its own clientside redirects. Our team had a similar issue with gatsby before, and the google SEO tools told us about it.

Eventually we moved to serving our things from AWS S3 so we stopped needing this, but I'm very partial to slash-less URLs. They're just pretty :)

@wardpeet
Copy link
Contributor

wardpeet commented Apr 27, 2020

So really gatsby is hiding the redirects the server is doing by doing its own clientside redirects. Our team had a similar issue with gatsby before, and the google SEO tools told us about it.

I'm aware of gatsby is hiding this implementation and that a redirection is happening. The problem is that when we build, we always build like you have trailing slashes and it's the server implementation that needs to handle pretty URLs, not us.

gatsby serve is not a production server, it's merely there to run e2e-tests, do a final check if everything is looking alright.

So I'm still unsure what the use case is of not doing a 301 in serve.

@wardpeet
Copy link
Contributor

Hey @adrienharnay! First and foremost, I want to thank you for opening this PR and taking the time to improve Gatsby! It has been 14 days until a response. Gatsby serve is doing exactly what a basic webserver would do. Serve is a small shell around express. You can create your own if you need to.

@wardpeet wardpeet closed this May 12, 2020
@avinson
Copy link

avinson commented May 12, 2020

Eventually we moved to serving our things from AWS S3 so we stopped needing this, but I'm very partial to slash-less URLs. They're just pretty :)

@fabiosantoscode If I can ask, how did you fix this in S3? I still see this redirect behavior using standard gatsby build and s3 plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

301 redirects adding trailing slashes (gatsby build+serve)
7 participants