Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve startup performance of workers #3153

Closed
jfirebaugh opened this issue Sep 6, 2016 · 11 comments
Closed

Improve startup performance of workers #3153

jfirebaugh opened this issue Sep 6, 2016 · 11 comments
Assignees
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage

Comments

@jfirebaugh
Copy link
Contributor

jfirebaugh commented Sep 6, 2016

One of the main determinants of time-to-first-render (TTFR) is how fast the workers are able to boot and begin processing tile data. On my laptop, the earliest I see tile requests getting made by workers is at about the 2000ms mark. We should try to improve this.

Ideas:

@jfirebaugh jfirebaugh added the performance ⚡ Speed, stability, CPU usage, memory usage, or power usage label Sep 6, 2016
@mourner mourner self-assigned this Sep 7, 2016
@mourner
Copy link
Member

mourner commented Sep 7, 2016

Reduce the amount of code included in the worker blob

I found a way to reduce it to only the things it needs — see PR browserify/webworkify#30. It reduces the size from 1.12MB to 467KB, although I'm not sure whether it actually affects time-to-first-render that much — can you check @jfirebaugh?

Create only one blob, not one per worker

It seems to be too cheap to bother — the whole workify process up to blob URL creation takes just a few milliseconds in my tests.

@jfirebaugh
Copy link
Contributor Author

Before:

image

After:

image

It definitely helps blob creation and worker boot time, although the overall effect on TTFR is only a few hundred milliseconds. I think much of the benefit is being lost due to poor parallelization, and we can recoup it with improved main thread scheduling (boot workers as early as possible, start style XHR as early as possible, reduce validation overhead).

Also it seems that the first message sent to the worker incurs a significant penalty (seen as orange "Function Call" bar in the DedicatedWorker timeline of the "After" results. I wonder if Chrome does lazy evaluation of worker source. Maybe we should try posting a no-op message right after creating the worker.

@anandthakker
Copy link
Contributor

I wonder if Chrome does lazy evaluation of worker source

@jfirebaugh I've been looking into TTFR as well this morning -- I'm seeing a "compile script" block before the first function call:

screen shot 2016-09-07 at 1 21 15 pm

@jfirebaugh
Copy link
Contributor Author

Yeah, I see that too... just wondering why processing the first message takes so much time, but it's not attributed to any specific function in gl-js.

@jfirebaugh
Copy link
Contributor Author

Looking at this further, my hunch is the unattributed time is actually deserialization of the message data, so this goes back to "Reduce the overhead of transferring layers to the worker".

@mourner
Copy link
Member

mourner commented Nov 11, 2016

Here's what contributes to TTFR if you set up explicit console.log checkpoints across the code (timings are ms since previous checkpoints):

thread event time since prev
main loaded GL JS 269ms
main created map 64ms
main style loaded 191ms
main style created 47ms
worker worker initialized 497ms
worker got style layers 14ms
worker started parsing tile 247ms
worker parsed non-symbol layers 85ms
worker got symbol deps 55ms
worker symbols placed 90ms
main got tile buffers 20ms

You can see here that sending style layers isn't the bottleneck. Here's where most bottlenecks are instead:

  • Getting the worker to run (by far the biggest contributor for some reason)
  • Loading assets ("time to first byte" when requesting things like style, tilejson, sprites, tiles & glyphs)

We need to focus on investigating and fixing the first if possible.

@mourner
Copy link
Member

mourner commented Nov 11, 2016

If you use a minified GL JS build, worker initialization happens in 290ms after creating the style, down from 500ms. So it looks like it's linearly dependent on the size of the worker bundle.

#3034 should help with this a bit because worker bundle parts will be loaded lazily, e.g. it won't bundle geojson-vt & supercluster until you add a GeoJSON source.

Another thing that might help is rewiring some dependencies so that unnecessary code is not bundled on the worker side. One example is validation code, which takes 7% of the bundle — it's required by some StyleLayer methods but none of those get called on the worker side.

@mourner
Copy link
Member

mourner commented Nov 11, 2016

Also it seems that the first message sent to the worker incurs a significant penalty (seen as orange "Function Call" bar in the DedicatedWorker timeline of the "After" results. I wonder if Chrome does lazy evaluation of worker source. Maybe we should try posting a no-op message right after creating the worker.

It doesn't look like it's a first message penalty (tried, doesn't make any difference). According to my checkpoints research, it simply takes a while for a worker to start a thread, load the blob and evaluate the JS bundle.

Around ~120ms is spent evaluating the bundle (measured by inserting console.log checkpoints in the beginning and the end of the generated bundle in webworkify). Which is yet another reason to reduce the worker bundle size and/or break it down into parts.

@mourner
Copy link
Member

mourner commented Nov 11, 2016

Here's a minimal snippet of code that proves that it takes a long time for a Worker to parse its code:

var src = 'console.log("worker: " + performance.now());' + Array(100000).join('(function(){})();'); 
new Worker(URL.createObjectURL(new Blob([src], {type: 'text/javascript'})));
console.log('main: ' + performance.now());

It takes the same time if you create barebones worker and then call importScripts of an expensive script from it.

@jfirebaugh
Copy link
Contributor Author

jfirebaugh commented Nov 11, 2016

Worker startup is paying the cost of both parsing/executing the bundle for the first time, and then when actually doing work, running very slowly at first before the optimizer kicks in. And all of that is on a per-worker basis -- AFAICT there's no sharing of compiler/optimizer data between workers. This is why creating only a single worker at startup time is better for TTFR, even when multiple workers are better once reaching a steady state.

@mourner
Copy link
Member

mourner commented Apr 9, 2019

Just stumbled upon this tweet and it sounds promising for TTFR — we should definitely test it out.

In Chrome, any JavaScript files in a service worker cache are bytecode-cached automatically.
This means there is 0 parse + compile cost for them on repeat visits. 🤯
https://v8.dev/blog/code-caching-for-devs#use-service-worker-caches

@karimnaaji karimnaaji mentioned this issue Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

No branches or pull requests

4 participants