Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for removal of hashes in output filenames (for Netlify caching) #3130

Closed
james-camilleri opened this issue Dec 29, 2021 · 4 comments
Closed

Comments

@james-camilleri
Copy link

Describe the problem

SvelteKit automatically adds hashes to any output filenames, presumably using the rollup defaults. While this is the best practice in most scenarios, Netlify has indicated that this actively interferes with its caching mechanisms:
https://answers.netlify.com/t/support-guide-making-the-most-of-netlifys-cdn-cache/127

(There also seems to be a Gatsby plugin specifically to handle this scenario for Netlify, interestingly: https://www.gatsbyjs.com/plugins/gatsby-plugin-remove-fingerprints/. There's also a thread where someone from Netlify explains the issues that hashes in filenames cause: gatsbyjs/gatsby#11961)

There is currently no way to override this behaviour - attempting to override it results in the following kind of error:
The value for kit.vite.build.rollupOptions.output.entryFileNames specified in svelte.config.js has been ignored. This option is controlled by SvelteKit. As a result, all Netlify deploys suffer from suboptimal caching.

Describe the proposed solution

This can be solved in one of two ways, as I see it:

  • allow the related rollup configs to be overridden from the svelte config (i.e. kit.vite.build.rollupOptions.output.entryFileNames, kit.vite.build.rollupOptions.output.chunkFileNames, and kit.vite.build.rollupOptions.output.assetFileNames)
  • have the Netlify adapter handle this automatically

The latter would probably be the most user-friendly and the best sensible default, although I haven't been able to figure out if this is something that the existing Adapter API supports.

Alternatives considered

A hackier but possible solution would be a post-build script that does a full find and replace to remove all the hashes from the filenames and the file contents.

Importance

would make my life easier

Additional Information

No response

@Rich-Harris
Copy link
Member

Thanks for the links to previous discussions about this — as far as I've been able to understand them, this approach is essentially trading end user experience for developer experience. It seems that the issue is that since hashed filenames result in all prerendered pages changing (i.e. <link rel="stylesheet" href="/_app/abc123.css"> changes to def456.css), they have to go through a time-consuming build processing step.

But while having stable URLs means the build processing can be skipped (which is beneficial in cases where you have many prerendered pages of which few have changed, but basically no other situation), it means that browsers have to request JS and CSS assets every time they visit your app. True, they'll get a 304 rather than a 200, but you're still making a request and there's still latency involved.

With hashed filenames, we can treat assets as immutable (which will happen automatically in the next version of adapter-netlify — #3222), which means assets can be requested directly from the browser cache. From the perspective of end users, this is strictly better.

All of which is to say that I'm not convinced this is a good move, unless I'm missing something?

@james-camilleri
Copy link
Author

I don't think that's quite how I understood it. From what I gathered on Netlify's documentation, the cache-headers contain instructions to ignore the browser cache and call Netlify's CDN for the assets anyway, then use the browser version if Netlify returns the 304 (see https://www.netlify.com/blog/2017/02/23/better-living-through-caching/). So from the browser cache point of view, with Netlify (unless they've changed their system since then) another request is going to be made anyway. (I have yet to test this out personally, but I will do so over the weekend.)

I've pulled out the relevant bits of the article to save time, this is the gist of it:

With these headers, we are saying specifically:

“max-age=0, must-revalidate, public” = “please cache this content, and then do not trust your cache”. This seems a bit counterintuitive, but there’s a good reason. This favors you as a content creator — you can change any of your content in an instant. Let a broken page out in a deploy? Roll back instantly. Want to make sure that your new marketing site all goes out — text, code, and assets — at the same instant so your visitors don’t experience the dreaded new/old mixture that old, file-at-a-time deploy methods left you with? We’re ready!

"etag" = a version hash, which we send with all content, that the browser provides BACK to our servers when it tries to re-fetch one of these resources (as you saw above — we’ll always set assets to require re-fetching). A browser tries a conditional get in this situation, providing the etag with the initial request to let our server confirm that this version is still current if the etag we would serve with the new content matches, our server returns an HTTP 304 which allows the browser to use its cached content instead of the client having to re-download the content.

From a browser point of view, this sounds terrible — the browser has to check in with our servers for every file it wants to load? Even if it is literally just a page reload in an identical browser session 1 second after the last load? Yes! But it isn’t so terrible due to 2 pieces of magic:

Our CDN
Use of HTTP/2

The CDN makes the check-in from the browser fast — it talks with the node closest to it and that node is ready with an instant answer as to whether the content is usable as-is (Etag matches, no deploys or rollbacks have happened). Using HTTP/2, browsers multiplexes these connections so they can all happen within a single connection and you don’t have to do things like negotiate the HTTPS handshake over and over again.

My personal gripe is not with the added build time - honestly that's why we're pre-rendering things anyway - but it seems to actually affect how Netlify caches the files, according to what I read in the discussions anyhow. If you look at the response here (gatsbyjs/gatsby#11961 (comment)) it seems that the filename changes effectively hinder Netlify's caching - the files have top be re-uploaded and thus purged from the CDN cache.

Realistically, I'm not sure how much of a performance benefit this will be to sites hosted on Netlify. I brought this up originally because Lighthouse complained about slow response times from the server, which I thought was strange given the scope of Netlify's CDN. Theoretically, the current system of hashing the filenames causes the file to be purged from Netlify's CDN at every build (unless I've grossly misunderstood something), even though the file contents are the same, which I know can't be good for performance. How much of an actual speed improvement this will relate to however I don't know, as I'm only working with very small-scale projects that don't have huge audiences anyway and might not be benefiting from the CDN at all. I know there are probably much larger projects being built with SvelteKit though so this may come in to play then.

I don't know enough about Netlify's architecture to know if adding the immutable headers will change this, it may negate the entire discussion. I can do a before/after test once this is live. I'm also not sure if this would affect the caching of the initial HTML file, which is where I'd assume rapid CDN responses are critical.

@Rich-Harris
Copy link
Member

the cache-headers contain instructions to ignore the browser cache and call Netlify's CDN for the assets anyway

Only if the assets don't already have cache-control headers. If you look at https://kit-demo.netlify.app, which is using the most recent Netlify adapter, you'll see that repeat visits result in cache hits:

image

So we're choosing between these two defaults:

  1. Every request results in a 304 at best, a network 200 at worst (Netlify's recommendation)
  2. Page requests result in a 304 at best, a network 200 at worst; this worst case scenario happens more often because pages are invalidated when assets change rather than only when content changes. But asset requests frequently result in a cache 200 rather than a network 200 (SvelteKit's current behaviour)

Imagine a daily visitor to a page where the content changes every 15 days on average but assets change every 5 days on average. The number of assets isn't that important because SvelteKit injects modulepreload links etc to prevent a waterfall. For the sake of argument, let's say it takes 100ms for a 304 and 110ms for a network 200 request (i.e. 100ms TTFB, 10ms download — the download time is typically dwarfed by TTFB, in my experience). We'll ignore time spent in the HTML parser etc.

Under scenario 1, the user will wait 200ms on most days (100ms for the page, 100ms for the assets referenced by that page). Every 5th day, it'll be 210ms (100ms for the unchanged page, 110ms for the assets). Every 15th day, it'll be 220ms (110ms + 110ms). In the course of 15 days they will spend 12 * 200ms + 2 * 210ms + 220ms = 3,040ms.

Under scenario 2, the user will wait 100ms on most days (100ms for the page, 0ms for the cached assets). Every 5th day, including the 15th day, they will wait 220ms (110ms for both page and assets, since both have changed). In the course of 15 days they will spend 12 * 100ms + 3 * 220ms = 1,860ms.

If the app was deployed multiple times a day but the content never changed, scenario 1 provides a modest advantage — 15 * 210ms = 3,150ms rather than 15 * 220ms = 3,300ms — but if both app and content changed very rarely, our daily visitor would wait 3,000ms under scenario 1 but 1,500ms under scenario 2!

Clearly these numbers would look very different if a 200 response took significantly longer than a 304 response, but that hasn't been my observation. All of which is to say that I'm personally very sceptical of relying on 304s everywhere. If the wind is behind you it provides a very modest advantage, but in many typical cases it's much slower than leveraging the browser cache. To my mind, 304s are very helpful for larger assets like images and videos, but shouldn't be relied upon for things like .js and .css files.

It's very possible that someone with more experience and data will tell me I am entirely wrong about all this!

I brought this up originally because Lighthouse complained about slow response times from the server, which I thought was strange given the scope of Netlify's CDN.

It would be interesting to know if serving a 304 would have been significantly quicker. My understanding was that assets were proactively pushed out to edge servers when you deployed an app, in which case I would expect the 200/304 response times to be somewhat similar.

@james-camilleri
Copy link
Author

james-camilleri commented Jan 31, 2022

Sorry for taking so long to get back to you, I got massively distracted and wanted to find time to parse your response thoroughly.

When reading up on how Netlify handles caching I made the assumption that this always applies to all assets – as you mentioned, this only applies for the initial page in reality, all other static assets are are still cached in the browser. I didn't realise this, should have tested a bit more thoroughly myself after all the reading 😁

Regarding whether 304s are quicker for the initial page request, I don't think this would make a difference. I was worried that perhaps the landing page wasn't being cached as efficiently as it could because its content was changing, as explained above. After doing some more reading however, I suspect that Netlify might not be caching pages for low-traffic sites on the free tier at edge nodes anyway, which renders all my attempts at optimisation useless anyhow.

Going to close this the existing implementation is clearly the right way to go about it. Thanks for taking the time to explain, I appreciate it! And kudos to the whole team for the excellent work with SvelteKit, I'm really loving it. ❤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants