Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Default ServiceWorker to avoid cascading invalidation #3661

Closed
devongovett opened this issue Oct 20, 2019 · 20 comments
Closed

RFC: Default ServiceWorker to avoid cascading invalidation #3661

devongovett opened this issue Oct 20, 2019 · 20 comments
Labels
✨ Parcel 2 💬 RFC Request For Comments

Comments

@devongovett
Copy link
Member

Background

Currently in v2, we include a content hash in filenames by default (except entries). Since #3414 this has also included all child bundles as well. This is necessary because the URLs of child bundles are in parents in order to load them, but it means that if a child bundle changes, all of its ancestors must be invalidated, negating any caching benefit. We discussed some possible solutions to this problem, and @philipwalton's recent article also brought this back into our minds.

Service Workers

The best solution seems to be to use a Service Worker to handle the caching instead of the browser's HTTP cache, and avoid content hashes in filenames entirely. This means that the filenames will not change between builds, so invalidating a child bundle does not invalidate its parents. In order to handle caching, a Service Worker with a manifest will be generated including hashes for all of the bundles. The Service Worker will request assets from the network, and cache them using the hash from the manifest as a cache key. Whenever the app is deployed, the manifest will be updated, and only the bundles that changed will need to be downloaded since they aren't in the Service Worker's cache.

Service workers have one major problem though: They break the browser reload button. If you serve cache first, the service worker will not be updated until AFTER the page has been reloaded and the user is already seeing stale content. By default, the new service worker won't even be activated until the user closes and reopens all of their tabs for that website either. This can be mitigated by self.skipWaiting() but even then you still need 2 reloads.

The solution I came up with for this is to put the manifest in a separate tiny JSON file parcel-manifest.json that would be deployed alongside the app. The service worker would load this file as part of its installation step, and on top-level page navigation events. This way, the service worker would not change between builds, only the manifest would. This avoids the double reload problem because the service worker would get the refreshed manifest prior to reloading the page.

It would look something like this:

parcel-manifest.json

{
  "index.html": "48f7a7cb",
  "index.js": "6fbe6e5b"
}

service-worker.js (pseudocode)

self.addEventListener('install', event => {
  // fetch and cache manifest
});

self.addEventListener('fetch', event => {
  if (!manifest || event.navigation.mode === 'navigate') {
    // update manifest
  }

  let cacheKey = getCacheKey(manifest, event.request);
  if (cache.match(cacheKey)) {
    return event.respondWith(cached);
  }

  let res = fetch(event.request);
  cache.put(cacheKey, res);
  event.respondWith(res);
});

The downside here is one extra network call for the manifest on each navigation. However, I believe this is still overall better than the current situation with no service worker at all, and it works according to user expectations. Most of the time, you only need to pay the cost of one tiny network call to get the manifest (your entry HTML is now cacheable!), and if anything is invalidated, only the bundles that changed are downloaded instead of all of its ancestors as well. In addition, the app would work offline by default since we'd return cached content using a cached manifest, and we could do the same after a short timeout for lie-fi situations as well.

Fallback

For fallback on old browsers that don't support service workers, we'd connect this feature to ES module output by default since they have very similar support matrices. If you use <script type="module"> you'd get service worker caching with no content-hashed filenames, while normal scripts would continue to be content-hashed as they are today.

Feedback

Please comment with your feedback! It would be greatly appreciated to hear lots of perspectives on this.

@jeffposnick
Copy link

(I'm moving over some of the latest thoughts from the Twitter thread, as the latest thinking there doesn't match the initial proposal.)

About updates: if, inside your fetch handler, you find resources in the fresh manifest with new hashes, what do you do then? If you delay responding to the navigation until after the cache is updated, then you're now waiting even longer before the browser can render any HTML.

cache on demand. manifest itself is updated and cached, but individual requests aren't made until they are requested (presumably a user is already loading the page at that point).

So you'd purge changed assets in the fetch handler, but repopulate on the fly? Gotcha. If so, I'd be worried about version mismatches when repopulating on the fly if assets are lazy-loaded. (C.f. https://pawll.glitch.me/)

Good point. Another option would be to embed the manifest into the service worker in addition to the external file. So, on install you'd have the manifest already, on activate you'd delete old cache entries, and on navigate you'd still get the updated manifest.

But that doesn't solve version mismatch for assets that lack hashes in their URLs. If your SW manifest says that /vendor.js should be version Y, you can definitely delete old version X, but if you cache-on-demand it might store version Z. Lazy-loading makes this more likely.

Ah, makes sense. Maybe a hash would still need to be included in the filenames so you didn't overwrite old versions on deploy, but the service worker would add the hash to the name at load time instead of at build time. This way you'd always get subresources from the same build.

This would probably be important for CDN caches too which might default to long term caching.

Summarizing this proposal so as to continue the conversation here:

  • the manifest would be embedded in the service worker, not stored in a separate file
  • asset filenames would still include hashes
  • the service worker's fetch handler would be responsible for translating "stale" hashes into "fresh" hashes, and caching the fresh responses on-the-fly.

Do I have that right, @devongovett?

@jakearchibald
Copy link

Service workers have one major problem though: They break the browser reload button.

This seems like a mischaracterisation. A service worker will do nothing by default. If you aren't telling it to serve from the cache API, it won't. An empty service worker will behave exactly like no service worker.

Also, if you're only serving immutable content from the service worker, the reload button will work exactly the same.

I like Phil's article, but I worry it might be a sledgehammer to crack a nut. A simpler short-term solution would be a JS system that let you use a local map. Something like AMD, or SystemJS, with a custom loader, would let you map module names to hashed urls.

A medium-term solution could depend on top-level await, where:

import { bar } from './foo-abc123.js';

Is rewritten to:

const fooPromise = import(someKindOfMap['foo.js']);
const { foo } = await fooPromise;

Again, you're using someKindOfMap which is defined somewhere.

The longer term solution is import maps.

The key is to ensure that your map has a similar lifecycle to the rest of your app. Putting the map in the service worker (like in Phil's article) works great if the rest of your site/app is already using the service worker lifecycle. I think this is where you're running into the "reload" problem. The service worker isn't the problem, the problem is tying the map to its lifecycle.

You'd have the same problem if you put the map in another file and gave it a max-age.

The map needs to have the same lifecycle as the thing ultimately executing the script. This is why they're traditionally inlined in the HTML, as it ensures this synchronisation. Putting it in the service worker kinda breaks this model, as the version of the current service worker may not represent the version of the page (unless the page was also served from a cache controlled by the service worker, but then you're dictating things about the lifecycle of the app/site that the developer may not expect).

This isn't just because the service worker updates lazily, it's also because there's one of it for multiple pages. One page may have been open for days, another may have loaded 10 seconds ago. Developers rarely prepare for running multiple version of their site at the same time, but it's possible. In this case, the map in the service worker may be correct for one of the tabs, but not the other. Again, this is why the map is often inlined with the page.

If I was to try and emulate this with a service worker, I'd do something like this in the fetch event:

  1. If the request isn't a subresource request, return.
  2. Does the URLs query string contain 'attempt-mapping'? If not, return. This allows a quick exit for things you don't care about.
  3. respondWith the following:
  4. postMessage the client asking for its map, wait for a response.
  5. Map from one URL to the other, fetch it, return it.

You could cache the map against the client ID within the service worker as an optimisation.

This means you need to set up a message listener in the page before sending any requests that contain attempt-mapping, but that's kinda how import maps will work anyway.

@jeffposnick
Copy link

If I were to summarize some concerns here, there are at least three competing interests to keep in mind when designing whatever you go with:

Preventing inconsistency

Loading most of the page with one deployment's assets, and then pulling in additional assets later on (e.g. lazy-loading) that are associated with a different deployment is a worst case scenario. This is even more of a problem if runtime caching ends up saving versions of assets that don't match what you think should be saved based on the manifest, since that asset mismatch could persist on subsequent loads/indefinitely.

Preventing performance regressions

Adding in a service worker is not, by default, performance neutral. A cold SW startup takes somewhere between tens and hundreds of milliseconds, depending on device + browser combination. (C.f. https://web.dev/google-search-sw#problem:-service-worker-overhead and https://www.youtube.com/watch?v=IBpQlNeq5-o)

If you plan on handling navigations cache-first then the benefit of shipping a SW normally outweighs that cost, but if you're not caching your HTML, then using navigation preload (currently only supported in Chrome) is necessary to mitigate the SW overhead.

There are some performance benefits to serving your assets cache-first, but you can get those same performance benefits with hashes in filenames and proper Cache-Control headers (which, I know you want to avoid due to cascading invalidation).

So I think the performance question to ask is whether the cost of SW startup on non-Chrome browsers is likely to be outweighed by the benefits of additional cache hits due to non-cascading cache invalidations.

Avoiding the need for client-side code

This was one of the stated desires from the Twitter thread—it sounded like Parcel does not want to be in the business of injecting client-side JavaScript into the web apps it bundles.

There are a number of problems that go away if you are willing to ask developers who use Parcel's service worker to include some additional client-side code. For example, the pseudo-import-map that Jake just described is a possibility, and so is using some of the common techniques for showing a "Reload for new content" toast message along with aggressive precaching.

@jeffposnick
Copy link

My personal opinion is that "preventing inconsistency" has to be your top priority, and I think it's worthwhile to make "preventing performance regressions" your second priority.

So I think you should consider whether relaxing your "no client-side code" would be a possibility, for folks who opt-in to using a Parcel-generated service worker.

@jeffposnick
Copy link

FWIW, an example of a service worker that shares conceptual similarities with what Jake describes can be found as part of this AppCache Polyfill library.

One difference is that in that project, IndexedDB is used to share manifest info between client pages and the service worker, keyed on client ID, rather than postMessage().

Notably, this approach only works if you add code to both the window client and service worker.

(It's fair to say that this implementation also makes compromises on the "preventing performance regressions" side of things, but fidelity with the AppCache specification was prioritized over performance.)

@devongovett
Copy link
Member Author

@jeffposnick @jakearchibald thanks for all of your feedback! Let me start by saying this is a research project so far - nothing has been decided on our side. I'm open to any and all proposals. 😄

@jeffposnick your summary is pretty close - I was kinda spitballing on twitter, so some of them may have blended together. I think the last idea was to keep the manifest in a separate file, and replace the hashes in filenames with the updated versions from the manifest in a service worker.

It seems like there is at least one concern with that, which is if you have multiple tabs open and you only reload/navigate in one. Possibly keying the manifest by client id would work to solve that? I'm not sure how or if it is possible to delete those manifests when tabs are closed though...

@jakearchibald

I like Phil's article, but I worry it might be a sledgehammer to crack a nut. A simpler short-term solution would be a JS system that let you use a local map. Something like AMD, or SystemJS, with a custom loader, would let you map module names to hashed urls.

It might be overkill, but I didn't see another option that would cover as many cases as service worker. A JS system with a local map would only work for resources loaded from JS. That excludes resources referenced from CSS or HTML, which seem pretty important. If I had to guess, I'd say there are probably more resources referenced that way than via JS in most apps. You'd have a few code split points for your JS, but each page might have a bunch of images etc.

Import maps aren't here yet, but that's a possible solution when they arrive - assuming they can handle more than just JS too.

The service worker isn't the problem, the problem is tying the map to its lifecycle.

Yeah that's true, and it's why I proposed making the map a separate file to be loaded by the service worker.

This isn't just because the service worker updates lazily, it's also because there's one of it for multiple pages.

This seems like a problem. Service worker might need to have a map per tab somehow.

The benefit of loading the map in the service worker though is that the HTML can be cached for offline support. If the service worker is reliant on the page to send it the map, then either you have the double reload problem or you can't cache the HTML.

@devongovett
Copy link
Member Author

This was one of the stated desires from the Twitter thread—it sounded like Parcel does not want to be in the business of injecting client-side JavaScript into the web apps it bundles.

There are a number of problems that go away if you are willing to ask developers who use Parcel's service worker to include some additional client-side code. For example, the pseudo-import-map that Jake just described is a possibility, and so is using some of the common techniques for showing a "Reload for new content" toast message along with aggressive precaching.

I'd definitely like to allow developers who want to write their own service worker or use something like workbox to have full control if they desire. This proposal was only for a default to replace deep content hashed filenames. Ideally, the solution would work incrementally. By default, you get a service worker that caches on demand, and supports reloading as expected. You should be able to opt into precaching, be able to implement a custom reload button for updates, etc.

@jakearchibald
Copy link

@devongovett

A JS system with a local map would only work for resources loaded from JS. That excludes resources referenced from CSS or HTML, which seem pretty important.

Yeah, good point. I guess that's why the import: scheme of import maps is so important. Sorry, I should have realised that.

The service worker isn't the problem, the problem is tying the map to its lifecycle.

Yeah that's true, and it's why I proposed making the map a separate file to be loaded by the service worker.

That has the versioning issue you mentioned with multiple tabs. Also, how does the service worker know where to get this manifest from? If it's in a consistent location, it needs to be no-cache, which means you've got a network-dependent resource in front of all your well-cached assets. Or it needs to be versioned, meaning you need some way for the page to tell the service worker which manifest it should use.

The benefit of loading the map in the service worker though is that the HTML can be cached for offline support.

Are you sure you're in a position to decide to do that? Eg, what if the page contains sensitive logged in data?

@devongovett
Copy link
Member Author

To summarize, here's the current idea:

  • keep content hashing filenames, but don’t include child bundles in the hash
  • initial load doesn’t use service worker, gets content hashed bundles as normal
  • service worker installs and downloads manifest, and associates it with current tab
  • on deploy, you deploy a new manifest and new content hashed bundles. don’t delete old content hashed bundles though - old tabs may still request them.
  • on navigate, service worker updates manifest for current tab
  • service worker replaces hashes in filenames with current hashes from the manifest for the current tab for each request, and caches assets on demand.
  • we'd have some sort of cleanup logic to delete associations between tabs and manifests, and the cached assets.

@devongovett
Copy link
Member Author

Also, how does the service worker know where to get this manifest from? If it's in a consistent location, it needs to be no-cache, which means you've got a network-dependent resource in front of all your well-cached assets. Or it needs to be versioned, meaning you need some way for the page to tell the service worker which manifest it should use.

Yeah. But that's already the case. You currently have to no-cache your HTML file today.

Are you sure you're in a position to decide to do that? Eg, what if the page contains sensitive logged in data?

Good point... though entry HTML would only be cached if Parcel generates it and it's part of the manifest, which means it's static (not server generated).

@jakearchibald
Copy link

Yeah. But that's already the case. You currently have to no-cache your HTML file today.

But aren't we now talking about doing this with HTML plus some kind of manifest file?

@devongovett
Copy link
Member Author

But aren't we now talking about doing this with HTML plus some kind of manifest file?

Theoretically, HTML could be long term cached with this approach since it would be invalidated by the manifest. (e.g. SW could load https://mysite.com/index.html?v=1234 to cache bust). Possibly not that useful though. Not sure.

@devongovett
Copy link
Member Author

The alternative as you say is to embed the manifest in the HTML file and postmessage it into service workers. Would require a bit more logic to be embedded though.

@philipwalton
Copy link

philipwalton commented Oct 22, 2019

Hey @devongovett, thanks for starting this conversation!

Service workers have one major problem though: They break the browser reload button. If you serve cache first, the service worker will not be updated until AFTER the page has been reloaded and the user is already seeing stale content.

This is true, but cache-first navigation are definitely the fastest way to load a page, and when combined with techniques like the app shell pattern or composing cached and network content via streams (which I discussed at CDS last year), you can often get the best of both worlds (though perhaps this isn't an option for a tool like Parcel).

And while it's true that users will always have to wait until the next load to get an update, I've personally found that it's rare that an update is so critical I'll actually bother to inform the user about it (I only inform the user of updates when it's actually critical that they get it, and in my last 25 releases that's only happened once).

By default, the new service worker won't even be activated until the user closes and reopens all of their tabs for that website either. This can be mitigated by self.skipWaiting() but even then you still need 2 reloads.

I think this is a problem that affects developers in development far more than it affects real users in production. And I'm working on a devtools feature to at least help address the former.

For the latter, I find that skipWaiting() works pretty well. The only situations in which I wouldn't use skipWaiting() are situations in which I'm building an app where I specifically want to control its versioning across all tabs, and in those cases I want to use cache-first and always have the 1-version-behind behavior.

If you're considering bypassing the SW lifecycle entirely, then it doesn't sound like you need to be concerned about using skipWaiting().

The solution I came up with for this is to put the manifest in a separate tiny JSON file parcel-manifest.json that would be deployed alongside the app. The service worker would load this file as part of its installation step, and on top-level page navigation events.

I'd caution against any solution that blocks subresource loading on a network request. I suspect that'll have a much bigger performance impact that you're envisioning. Also note that you'll have to pay the cost of this network request every time the service worker starts up, not just on navigation requests.

Have you considered actually attempting to polyfill some basic Import Maps behavior? E.g. specifying the resource map using <script type="importmap"> syntax, and then running some code on the window to postMessage() the mapping to the service worker:

if (navigator.serviceWorker && navigator.serviceWorker.controller) {
  navigator.serviceWorker.controller.postMessage({
    type: 'IMPORT_MAP',
    payload: JSON.parse(document.querySelector('script[type=importmap]').innerText)),
  });
}

After receiving the message event, the service worker could reconcile the hashes in the map with the hashes of the resources being requested, and it could update any caches accordingly (after responding to all subresource requests).

This means the only thing you'd be blocking on is a postMessage() from the window, and again only on navigation requests.

Another benefit of this approach is as soon as Import Maps ship natively in Chrome, you'll be able to use them immediately. The service worker can be informed as to whether the browser supports Import Maps by passing that as a URL param when registering the service worker (e.g. .register('/sw.js?maps=' + supportsMaps)). If the browser does support Import Maps, it won't wait for the message event to respond to requests—or perhaps it won't register the service worker at all.

The only thing this loses over your suggestion of storing the mapping in a separate file is versioning the HTML file itself, but honestly if you're trying to get the benefits of caching the initial navigation request then I'd just go all in on a cache-first strategy.

@devongovett
Copy link
Member Author

I'd caution against any solution that blocks subresource loading on a network request. I suspect that'll have a much bigger performance impact that you're envisioning. Also note that you'll have to pay the cost of this network request every time the service worker starts up, not just on navigation requests.

We'd cache the manifest in cache storage as well as in memory for service worker restarts, and only block on navigation requests, not sub resources. But yes, you're right that in the worst case where there is updated HTML we'd have to wait for the manifest to load first. In the best case, however, where it's cached (likely most of the time), it could be faster since the manifest is probably much smaller than the HTML. 🤷‍♂

The postMessage idea is an interesting one too, and perhaps the right default. With the right API, we can hopefully expose the manifest to service worker code so that users can implement precaching etc. themselves as well.

@jamiebuilds
Copy link
Member

jamiebuilds commented Oct 22, 2019

I'm just going to quickly throw out there that in Parcel 2 I believe this can all be worked out in plugin land. One of the main motivations of making so many things in Parcel exposed as plugins was to enable the community to experiment with ideas like this.

So I would propose this RFC be implemented as "experimental" plugins that we could build out and have audited by everyone interested. I'd be happy to put an app in production with a fairly-stable-but-still-experimental plugin and give feedback.

@devongovett
Copy link
Member Author

Yeah hopefully it's all possible. It may take coordination between multiple plugins though. E.g. the namer needs to not content hash or change the way it adds hashes, a runtime plugin would be needed to generate the service worker, and possibly a reporter to generate the manifest. All of these need to be coordinated to work together in a config. Also unfortunately, especially in the namer case, it's hard to extend the default behavior - you'd have to rewrite the whole namer.

@jamiebuilds
Copy link
Member

jamiebuilds commented Oct 23, 2019

Also unfortunately, especially in the namer case, it's hard to extend the default behavior - you'd have to rewrite the whole namer.

Could we expose parts of it as a library? (Edit: Let's move this to Slack)

@kevincox
Copy link
Contributor

I've created #4200 to discuss the manifest generation part of this.

@janat08
Copy link

janat08 commented Jun 5, 2024

Im unclear on how to use the said manifest, and so would appreciate a default service worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ Parcel 2 💬 RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

8 participants