Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very long path names cause ENAMETOOLONG errors when we try to write out their page data #20699

Closed
KyleAMathews opened this issue Jan 18, 2020 · 10 comments
Labels
type: bug An issue or pull request relating to a bug in Gatsby

Comments

@KyleAMathews
Copy link
Contributor

Most file systems / OSs restrict the length of file names cause computers used to be weak sauce.

https://serverfault.com/questions/9546/filename-length-limits-on-linux/9548#9548

Generally to 255 bytes.

An obscure error someone can run into is they'll create paths longer than this and when we try to write out the page's page data, Node will crash with a mysterious ENAMETOOLONG.

We should just detect when a path is too long & shorten it within the limits.

We can use a simple algorithm that trims back the path name & hashes the trimmed off parts to preserve uniqueness.

To see a real-world example of this happening & an example algorithm go to #20338 (comment)

@KyleAMathews KyleAMathews added not stale type: bug An issue or pull request relating to a bug in Gatsby labels Jan 18, 2020
@tsriram
Copy link
Contributor

tsriram commented Jan 19, 2020

Ah, I was running into this error and it was hard to figure out what was wrong. I thought it'd be failing because I might have some special character in the path. Thanks for reporting this :)

So trimming the file path will actually affect the site URL right? I'm not sure if I like the idea of trimming URLs automatically -- it might have adverse effect in quite a lot of scenarios. In my case (a website which shows IFS Code of a bank), I wanted the URL to reflect the bank name, the state & city where it is in and the branch name. This URL structure is intuitive as well as more SEO friendly. Of course, if there are system limits, I can't have a very big URL. In that case, I'd probably expect a clearer error and I would have my own logic to generate a valid URL / path.

@muescha
Copy link
Contributor

muescha commented Jan 19, 2020

Just note:

There is a limit of 255 for a filename. But path has no limit

see https://en.wikipedia.org/wiki/Comparison_of_file_systems

@KyleAMathews
Copy link
Contributor Author

Thanks for the clarification @muescha

@tsriram this would only be for the file name of the URL path. URL paths can be much longer than file names do they're generally not a problem.

@tsriram
Copy link
Contributor

tsriram commented Jan 20, 2020

@KyleAMathews I'm not sure if I understand this correctly. In the code change you posted on #20338 (comment), you've trimmed slug and it's being used as path in createPage call. Will this not affect where the final HTML file is getting saved and hence the URL of that page?

Or you probably suggest that Gatsby will save the page data to a valid path and map it internally with the right page? 🤔

@KyleAMathews
Copy link
Contributor Author

Or you probably suggest that Gatsby will save the page data to a valid path and map it internally with the right page?

yes, that :-)

@tsriram
Copy link
Contributor

tsriram commented Feb 7, 2020

Cool, that makes sense 👍

@jlkiri
Copy link
Contributor

jlkiri commented Feb 13, 2020

I'd like to try to fix it (not sure yet about how much needs to be changed)

@jlkiri
Copy link
Contributor

jlkiri commented Feb 14, 2020

@KyleAMathews In your example, each of the slugs can potentially exceed 255 bytes.

${bankSlug}/${stateSlug}/${citySlug}/${branchSlug}-branch

Since each of these map to a subdirectory name during build, the very long ones cause ENAMETOOLONG. To avoid crashes we would need to truncate each of long slugs.

I think I still do not understand how static URLs can be left unaffected though. To directly access some (untruncated) URL's html we need that directory structure mirrors the URL. But if we truncate directory names then direct access URL also must be truncated.

@jlkiri
Copy link
Contributor

jlkiri commented Feb 14, 2020

One solution I came up with involves having a map between the original slug and its truncated version. When a URL is accessed, express is unable to serve the HTML, and the middleware uses the URL to look up the real disc path and serve HTML from there. However, the serving part is not strictly controlled by Gatsby, am I right? In other words, if public is served with something other than gatsby serve then there will be no matching middleware.

@jlkiri
Copy link
Contributor

jlkiri commented Jun 25, 2020

Should be fixed by #21518. We probably should close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

No branches or pull requests

6 participants