Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider allowing multiple worker thread instances for a single registration #756

Open
wanderview opened this issue Sep 30, 2015 · 107 comments

Comments

@wanderview
Copy link
Member

Currently gecko has a bug where multiple service worker threads can be spun up for the same registration. This can happen when windows in separate child processes open documents controlled by the same registration. Per the spec, the FetchEvents should be dispatched to the same worker instance. In practice, we spin up different work threads for each child process and dispatch there.

While discussing how to fix this, we started to wonder if its really a problem. Service workers already discourage shared global state since the thread can be killed at any time. If the service worker script is solely using Cache, IDB, etc to store state, then running multiple thread instances is not a problem.

It seems the main issue where this could cause problems is for code that wants to use global variables while holding the worker alive using waitUntil() or respondWith(). This is theoretically observable by script, but seems to have limited useful applications.

How would people feel about loosening the current spec to allow a browser to spin up more than one worker thread for a single registration? When to execute multiple threads and how many would be up to the browser implementation.

@flaki
Copy link

flaki commented Oct 4, 2015

Service workers already discourage shared global state since the thread can be killed at any time. If the service worker script is solely using Cache, IDB, etc to store state, then running multiple thread instances is not a problem.
I really like this.

Are there any apparent "low-hanging fruit" gains expected from allowing this, except for the obvious gains from reduced complexity, not having to make sure there is only one thread running of the SW script?
I can see this becoming a "want"-ed feature in the (maybe-not-so-distant) future where very high number of (potentially low- or medium-speed) cores will be present in mainstream devices, and sites with complex SW-s become a thing (a sort-of client-side server, if you will). In those cases I can see this becoming an actually useful features, just like how load-balancing between stateless (or sticky-state) high-demand servers operate now.

@mkruisselbrink
Copy link
Collaborator

My initial reaction was that allowing this would never work. There are too many places in the spec that assume there is only one active ServiceWorker, and all kinds of things would break in confusing ways if the ServiceWorker you can postMessage to via controller.postMessage is not necessarily the same worker as the worker who is handling fetch events (or there might even be multiple workers handling fetch events for a single client). But after thinking about it a bit more I don't think any of the problems that would occur by allowing multiple copies of the same service worker to be running at the same time are truly insurmountable.

But I'm also not sure if there really is much value in allowing this. I imagine this could be useful in situations where a serviceworker wants to do some relatively heavy CPU bound task in the process of handling some fetch (or other) event, and we don't want to block other fetch requests on that task?

I think what we should be supporting for such situations is that the SW can spin up a dedicated/shared worker, whose lifetime is linked to the lifetime of the SW (or maybe to the event it is currently handling, but just the SW would work fine; after all it won't get killed until the fetch event the processing is for has finished), just like a regular website would have to spin up a separate worker to do heavy CPU bound tasks in response to an event.

And if we want to allow long-running CPU bound tasks in response to an event, which last past a response has been send to a fetch event, I'm not sure if those should be allowed without some kind of UI presence (for example allowing a new kind of Notification with progress indicator to serve as the UI that keeps a shared worker alive).

@wanderview
Copy link
Member Author

To be clear, I'm only suggesting we state the browser may spin up multiple SW instances, but not make it a major feature. We would still spec explicit solutions for heavy CPU tasks, etc.

I just question if maintaining one-and-only-one instance is worth the code complexity and perf hit in a multi-process browser architecture.

Can chrome produce multiple service worker instances for different content processes today? Or do you already have a solution to guarantee a single SW instance across the entire browser?

@mkruisselbrink
Copy link
Collaborator

Chrome already guarantees a single SW instance across the entire browser. Spinning up multiple SW instances does seem like something worth exploring though. I do think we'll have to clarify how various things that return a ServiceWorker instance deal with this, and what happens if you postMessage to them.

In particular this seems like it could be helpful in cases of misbehaving service workers (where we can then fire up a new SW and start sending fetch events there, without having to wait for some timeout blocking all other fetches), or just in general in situations where there is a performance concern (foreign fetch comes to mind as well; having potentially every website you're browsing to try to fetch something from the same SW could result in noticeable slowness).

@jakearchibald
Copy link
Contributor

We've discussed this before a few times. I've always been pretty keen on it, but if memory serves @slightlyoff was less keen.

I'm thinking of http2-optimised pages made up of 100s of requests, is there a performance benefit to multiple workers?

@delapuente
Copy link

IMHO, this is an implementation detail low enough to not live at the spec level. In the spec we should specify some warranties, for example saying that logically there is only one service worker per scope although it could be implemented as several ones.

@wanderview
Copy link
Member Author

It has to be in the spec becauae it's observable with the right combination of waitUntil(), postMessage(), and worker global variables.

@delapuente
Copy link

By sending global state via postMessage() or respondWith()? If so, well, I suppose it could be a recommendation.

@wanderview
Copy link
Member Author

You use set some global state in the SW, then use an evt.waitUntil() to hold the SW alive, and then send a postMessage() or other event triggering mechanism that checks the global state.

@jakearchibald
Copy link
Contributor

Some thoughts on our options here:

  1. Let the browser create concurrent instances of the service worker whenever it wants
  2. Allow particular events to opt into or out of concurrency, so fetch events could be split across many instances but message events may not
  3. Make concurrency an opt in/out feature by the developer

I'd prefer the first, but the others are ways out if it breaks the web in ways we can't work around.

@mkruisselbrink
Copy link
Collaborator

I agree that 1 would be the preferred outcome.

Another variant of 2 could be to make concurrency a per-client thing. So all message/fetch events related to the same client would always end up in the same worker, but other events can end up anywhere. That might have the lowest risk of breaking things, would probably address the browser-vendor implementation concerns, but obviously wouldn't allow some of the optimisations that full concurrency could, as it might very well be beneficial to process multiple fetch events from the same client in multiple threads.

@wanderview
Copy link
Member Author

wanderview commented Aug 2, 2016

From an implementor's point of view (1) is very attractive. It gives us the most flexibility to optimize with the smallest amount of complexity. Options (2) and (3) restrict us more and add more complexity to support unique worker instances across processes.

@jakearchibald
Copy link
Contributor

all message/fetch events related to the same client would always end up in the same worker, but other events can end up anywhere … wouldn't allow some of the optimisations that full concurrency could

Agreed, that's what lead me to 2. That way you would have concurrency with fetch, but not with message/push. We'll see what users think. We don't need to consider 2/3 unless 1 is unattainable.

@wanderview
Copy link
Member Author

If we did multiple SW instances I think we could maybe still support one-to-one messaging with the instances. For example, when a SW receives a fetch event it could construct a new MessageChannel. It could then postMessage() the channel back to the client. The client could then use the channel to communicate with that single instance directly.

I think that would work, but maybe I don't fully understand MessageChannel.

@wanderview
Copy link
Member Author

I also have some local patches that spin up N thread instances for each service worker. Let me know if there is a particular site that I should test.

@jakearchibald
Copy link
Contributor

https://www.flipkart.com/ might be an interesting one to try.

Yeah your MessageChannel idea would work.

@mkruisselbrink
Copy link
Collaborator

Having the SW send the MessageChannel to the client in its onfetch event works, provided that:

  1. We implement some way to expose the to-be-created client in the fetch event for a navigation (I'm not sure if we were planning to just have a client id for this new client, or if there would be an actual Client the SW can lookup via Clients.get for this?), and
  2. The same queueing of messages posted to clients returned by openWindow is also applied to messages posted to these clients. I think this should just work as well (at least as specced).

@jakearchibald
Copy link
Contributor

Yeah, that's the plan. clients.get(fetchEvent.reservedClientId) would get you a client and posted messages would be buffered.

@jakearchibald
Copy link
Contributor

Some more thoughts:

Postmessaging to a SW will land in one instance, whereas SharedWorker and BroadcastChannel will expose multiple running instances. Probably not an issue, but wanted to write it down while I remembered.

@wanderview
Copy link
Member Author

In theory, ServiceWorker.postMessage() could spawn a new instance each time. Right?

@jakearchibald
Copy link
Contributor

Yeah, whereas BroadcastChannel could land in multiple instances at once.

@aliams
Copy link
Contributor

aliams commented Aug 3, 2016

Just wanted to chime in that I also support option 1 as well.

@arcturus
Copy link

arcturus commented Aug 5, 2016

Just leaving some feedback related to:

https://jakearchibald.com/2016/service-worker-meeting-notes/

Are you maintaining global state within the service worker?
Would this change break your stuff?
If so, could you use a more persistent storage like shared workers / IndexedDB instead?

So far we have been already using indexedDB (with localForage for simple key/value) the fact that the SW can be killed was strong enough to not to keep states on memory and persist them.

@wheresrhys
Copy link

wheresrhys commented Aug 5, 2016

Re: the PR above. We were using persistent state stored in a variable, but have been able to switch to indexedDB. In time it'd be nice to have a higher level API for syncing shared state between sw instances as polling an indexedDB store is a bit hacky, and accessing indexedDB on every fetch event wouldn't be particularly performant. Similarly, I'd love to see a cookies API land in the near future - we're considering implementing something similar to what we've done for feature flags for cookies in the meantime

@bkimmel
Copy link

bkimmel commented Aug 5, 2016

I use Clients.claim a lot just to choke down the simplicity of the SW life cycle (which for me has always been the most complicated aspect of dealing with Service Workers). How would that work with a litany of SWs claiming the client in rapid succession?

@tapananand
Copy link

tapananand commented Dec 9, 2016

Sorry If I am out of context and late, but just leaving some feedback related to:

https://jakearchibald.com/2016/service-worker-meeting-notes/

Are you maintaining global state within the service worker?
Would this change break your stuff?
If so, could you use a more persistent storage like shared workers / IndexedDB instead?

We have an app where we show some user generated HTML content (including JS/CSS, etc) in an iframe and the resource references inside that HTML are not present on any server but are available via a web worker running on a different origin.
So, we were using Service Worker to intercept the resource requests and fulfill them by bringing the resource content from the worker. Our implementation was to first have a handshake kind of thing in which the service worker is given one end of a MessageChannel and the worker is given the other and they communicate with each other via the MessagePorts.
Now, for this we have to maintain the service worker end of the port in the ServiceWorkerGlobalScope, so if is unreliable we have a problem with this change as we will loose the communication channel with the worker due to this. It does not seem to be possible to store this port in indexedDB as well
One interesting thing is to be able to use shared workers inside service worker, and then instead have the shared worker communicate with the other web worker and pass the content back to the service worker. So that may help.

@isonmad
Copy link
Contributor

isonmad commented Dec 30, 2016

For what it's worth, one of the more prominent guides to the Push API depends on global state being maintained in a service worker (i.e. a MessagePort).

But that's probably bad already even without concurrency, right, since user agents can already shut a SW down at any time it doesn't have events to keep it alive?

@jakearchibald
Copy link
Contributor

F2F: Interested to hear if Apple's take on this has changed at all.

@code-tree
Copy link

@wanderview I ended up making my app rely on global scope, so I'm hoping if multi-threading happens it'll be enabled via an option somehow.

I would recommend doing this in the window and not in the service worker. You can download the new language pack and save it in cache all from the window.

I could, but my app also auto-updates, and I want to avoid the situation where it does a big update twice, simply because two windows are open. Having SW manage all updates seems to work better in this case.

You need to use the IDB transaction itself as the "lock". If the transaction does not complete it rolls back any changes.

I tried doing this but I couldn't get it to work, and it doesn't seem like IDB locks stay locked for very long. "Transactions are expected to be short-lived, so the browser can terminate a transaction that takes too long, in order to free up storage resources that the long-running transaction has locked." (Mozilla)

I ended up using promises to queue tasks, which requires global scope. Each new call to do_blocking_task() adds the passed task to the end of a queue of promises.

let CURRENT_TASK = Promise.resolve(null)
async function do_blocking_task(task){
    CURRENT_TASK = CURRENT_TASK.then(task)
}

@isonmad
Copy link
Contributor

isonmad commented Apr 1, 2017

I would recommend doing this in the window and not in the service worker. You can download the new language pack and save it in cache all from the window.

I could, but my app also auto-updates, and I want to avoid the situation where it does a big update twice, simply because two windows are open. Having SW manage all updates seems to work better in this case.

Wouldn't a SharedWorker be ideal for "only one instance can ever be doing this" use cases?

@jakearchibald
Copy link
Contributor

Copying this from another thread. F2F notes about multi-service worker requirements:

F2F: background SW (push, notification, sync, bg fetch) vs foreground SW (fetch, postMessage) - MS & Apple want this for service workers to exist when the browser is closed. Should we spec this? Fetches for notification icons go through the background SW in Edge, since there isn't a foreground SW.

What happens to the clients API? How do we represent multiple instances.

Facebook global state case:

  • Notification click
  • Post to client "navigate"
  • Client navigates (may be real nav, or pushstate)
  • Client posts back to SW, please focus me

How does the client message the correct service worker?

Edge can work around this by still ensuring there's only one SW at a time. Either using the bg SW, or fg SW.

We need to think more about use cases that require speaking to a particular SW, or otherwise depend on global state.

@jakearchibald
Copy link
Contributor

Random thought: If we made multiple instances an opt-in (at registration time), browsers that require it for particular features like push could reject push subscriptions if multiple instances wasn't enabled.

That doesn't solve Facebook's case though.

@jakearchibald
Copy link
Contributor

Types of multi-instance use:

  • For performance: spin up n instances and distribute fetch events amongst them. Minor exploration doesn't show this to be a benefit.
  • For isolation: When the browser is closed, allow another process to run the service worker. But when the browser does open, run the service worker in there too. This seems to be what Edge/Safari are wanting it for.

Option 1: No multiple instances.

No change to the spec. Safari/Edge would have to find a way to work around this by switching from one worker to another, potentially serialising state to cater for use-cases like Facebook's.

Option 2: Edge and Safari go ahead and use multiple instances, and we see what breaks.

If only a few sites break, we could reach out to them. Facebook's case for instance could be fixed by passing message ports or using broadcast channel (#1185 (comment)).

If everything turns out fine, we only have to make a minor spec change to recognise there may be a pool of instances that can receive top-level events.

If lots of sites break, but postMessage is the main cause, Edge/Safari could use tricks so messageEvent.source.postMessage tries to go back to the instance that sent the original message. That would certainly fix the Facebook case. If this works we could look at speccing it, but I'm not sure how.

Option 3: Multiple instances becomes an opt-in feature, but Edge/Safari reject push subscriptions if it isn't enabled.

navigator.serviceWorker.register(url, { multiInstance: true });

This is kinda messy, as we'd have to think of what happens if the registration is updated to opt-out after push subscriptions have been created. Also, it'll be tough for developers if they have to enable it for Edge/Safari, but disable it for Chrome/Firefox (because they don't support it).

@wanderview
Copy link
Member Author

Also, it'll be tough for developers if they have to enable it for Edge/Safari, but disable it for Chrome/Firefox (because they don't support it).

I guess I had assumed something like this would be saying "I'm ok with multi-instances, but I understand there is no guarantee on how many instances I will see."

@jakearchibald
Copy link
Contributor

As long as they're actively testing in a browser that does the multi instance thing in a big way. If a new browser comes along and does multi instance in a different way, we could be back to square one.

I pondered earlier whether opt-in-multi-instance-mode could do something to enforce multi-instance-like behaviour. Eg create a new realm for every top level event. That sounds super expensive, but maybe UAs have tricks here.

@NekR
Copy link

NekR commented Aug 16, 2017 via email

@delapuente
Copy link

Is there a 4th option in which multiple-instances could be an implementation detail, avoiding exposing it to the developer, making global state to be shared somehow?

@wanderview
Copy link
Member Author

Is there a 4th option in which multiple-instances could be an implementation detail, avoiding exposing it to the developer, making global state to be shared somehow?

I don't see a way to implement this given SW instances would likely live in different processes and global state provides synchronous access.

@jakearchibald
Copy link
Contributor

For others reading along, here are the notes from the call #1173 (comment).

tl;dr: MS are going to implement multiple instances and see what breaks. If all is well, we can spec the idea of a pool of service workers. If developers want to talk to a particular instance, MessageChannel already provides that, but we may in future expose multiple service worker instances via the clients API.

@slightlyoff
Copy link
Contributor

Just to be very clear, MSFT is trying a very specific version of multi-instance: one instance for receiving push messages and one instance for handling fetches. This isn't parallelism for fetch handling.

@NekR
Copy link

NekR commented May 25, 2018

So, has Edge shipped this model with separate SW for push notifications?

@aliams
Copy link
Contributor

aliams commented May 25, 2018

Yes, the April 2018 Update runs the push event handler in the background. We're investigating forwarding that event handling to the foreground if Microsoft Edge is open.

@mfalken
Copy link
Member

mfalken commented Oct 26, 2018

F2F:

  • Edge changed their model so now the same service worker gets push and fetch events (assuming Edge is open).
  • But allowing browsers to run multiple SW instances in general has potential advantages for performance, reliability, simplicity. We are all interested and experimentation might happen

@aliams
Copy link
Contributor

aliams commented Oct 26, 2018

As @mattto mentioned, the latest version (Windows 10 October 2018 Update) forwards the push event from the background to Microsoft Edge in the foreground if it is open. This means that the push event handler in such a case would be run in the same service worker execution context as the fetch event handler.

@wanderview
Copy link
Member Author

FWIW, I think I have run across an actual case where multiple service worker threads could offer a performance boost:

https://bugs.chromium.org/p/chromium/issues/detail?id=1293176#c11

Here someone is loading a dev environment with unbundled modules. Its hitting the SW with 1000+ requests on load. From tracing it appears the SW thread is being maxed out and becoming a bottleneck.

Just offering this as a data point as to when it might make sense to do this from a performance angle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests