Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncLocalStorage kills 97% of performance in an async environment #34493

Closed
danielgindi opened this issue Jul 23, 2020 · 12 comments
Closed

AsyncLocalStorage kills 97% of performance in an async environment #34493

danielgindi opened this issue Jul 23, 2020 · 12 comments
Labels
async_hooks Issues and PRs related to the async hooks subsystem. performance Issues and PRs related to the performance of Node.js.

Comments

@danielgindi
Copy link

danielgindi commented Jul 23, 2020

  • Version: 12.18.3
  • Platform: macos
  • Subsystem: async_hooks

What steps will reproduce the bug?

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();

let fn = async () => /test/.test('test');

let runWithExpiry = async (expiry, fn) => {
    let iterations = 0;
    while (Date.now() < expiry) {
        await fn();
        iterations++;
    }
    return iterations;
};

(async () => {

    console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

    asyncLocalStorage.run({}, () => {});
    let withAls = await runWithExpiry(Date.now() + 1000, fn);
    console.log(`Performed ${withAls} iterations (with ALS enabled)`);

    asyncLocalStorage.disable();
    let withoutAls = await runWithExpiry(Date.now() + 1000, fn);
    console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

    console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

Output:

Performed 180407 iterations to warmup
Performed 205741 iterations (with ALS enabled)
Performed 6446728 iterations (with ALS disabled)
ALS penalty: 96.8%

How often does it reproduce? Is there a required condition?

In any case that await is used.

What is the expected behavior?

Should be around 10% penalty.

What do you see instead?

I see 97% reduction in performance.

Additional information

I've played in the past with a Zone polyfill of my own (zone-polyfill), and at the beginning I tried using the stack traces that's generated in latest V8 versions, where they keep their context after await calls.
Combining that with the basic technique to track context, I was able to track the zone but due to the string nature of the stack traces - I had to map them in the memory but there was no WeakMap available. Using a NAN solution I got to about 15% penalty.
Now assuming that these capabilities are available on the c++ side and much more natively, with the option to directly leverage the compiler instead of generating stack traces to query for info - I'd assume max 10% penalty with a native implementation.

@HarshithaKP HarshithaKP added the async_hooks Issues and PRs related to the async hooks subsystem. label Jul 24, 2020
@puzpuzpuz puzpuzpuz added the performance Issues and PRs related to the performance of Node.js. label Jul 26, 2020
@puzpuzpuz
Copy link
Member

I've tried to run this test on the latest master and got the following result:

Performed 769807 iterations to warmup
Performed 747431 iterations (with ALS enabled)
Performed 8319900 iterations (with ALS disabled)

So, it's slightly better than the reported result, yet ALS (or async_hooks to be more precise) have a significant overhead in such simple synthetical benchmarks where are promises are used heavily and there are no significant computation or I/O operations going on.

I've also tried to run benchmark/async_hooks/promises.js with modifications from #34523 with the following no-op hook:

static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  env->new_async_id();
}

and got the following result:

$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 885,253.5326989443
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 251,938.54719793968
async_hooks/promises.js asyncHooks="enabledWithInitOnly" n=1000000: 787,156.9010261587
async_hooks/promises.js asyncHooks="disabled" n=1000000: 1,583,621.8483830877

As you may see in this benchmark with get around 50% of the disabled performance in situations when the no-op hook is installed. And that means that 50% is the best overhead we can get in such benchmarks with the current V8 PromiseHook integration. In reality the overhead will be higher, as we need to do many operations, not only the env->new_async_id() one.

BTW I've also experimented with internal fields (a couple of them) instead of symbol properties and got no significant performance improvements. Mainly that's because of the fact that with this approach the hook has to do more operations on average when compared with the current implementation.

@danielgindi
Copy link
Author

danielgindi commented Jul 28, 2020

I've tried to run this test on the latest master and got the following result:

Performed 769807 iterations to warmup
Performed 747431 iterations (with ALS enabled)
Performed 8319900 iterations (with ALS disabled)

So, it's slightly better than the reported result, yet ALS (or async_hooks to be more precise) have a significant overhead in such simple synthetical benchmarks where are promises are used heavily and there are no significant computation or I/O operations going on.

I've also tried to run benchmark/async_hooks/promises.js with modifications from #34523 with the following no-op hook:

static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  env->new_async_id();
}

and got the following result:

$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 885,253.5326989443
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 251,938.54719793968
async_hooks/promises.js asyncHooks="enabledWithInitOnly" n=1000000: 787,156.9010261587
async_hooks/promises.js asyncHooks="disabled" n=1000000: 1,583,621.8483830877

As you may see in this benchmark with get around 50% of the disabled performance in situations when the no-op hook is installed. And that means that 50% is the best overhead we can get in such benchmarks with the current V8 PromiseHook integration. In reality the overhead will be higher, as we need to do many operations, not only the env->new_async_id() one.

BTW I've also experimented with internal fields (a couple of them) instead of symbol properties and got no significant performance improvements. Mainly that's because of the fact that with this approach the hook has to do more operations on average when compared with the current implementation.

Well, only slightly better. with ALS enabled is 8.9% of with ALS disabled, which means that 91% is lost, and not 97% (which was the consistent number I got on all machines).

If there was a way to only get a hook installed for a case of an await returning to parent context - that would help us have a js-land implementation with much less overhead.

@puzpuzpuz
Copy link
Member

Well, only slightly better.

Yes and that aligns with the benchmark results I got for #34523. There is still a lot to wish for in terms of the PromiseHook overhead. However, in real-world applications the overhead will be much smaller.

If there was a way to only get a hook installed for a case of an await returning to parent context - that would help us have a js-land implementation with much less overhead.

That's what PromiseHook used by async_hooks allows you to do. Yet, it assumes native code.

@Qard
Copy link
Member

Qard commented Jul 31, 2020

I'd consider this is a somewhat unrealistic micro-benchmark as it's awaiting a non-promise in a loop which is naturally going to have a lot more machinery in the barrier marking and tracking then in the non-async function call it is giving to the await.

I did a few runs locally, replacing the function with some more realistic variations:

let fn = () => /test/.test('test'); - the original code, for baseline comparison:

Performed 744791 iterations to warmup
Performed 790158 iterations (with ALS enabled)
Performed 7946914 iterations (with ALS disabled)

let fn = () => Promise.resolve('test'); - at least creates an actual promise, though no actual asynchronous behaviour beyond deferring to the next microtask queue tick:

Performed 873666 iterations to warmup
Performed 962120 iterations (with ALS enabled)
Performed 8148736 iterations (with ALS disabled)

let fn = () => new Promise(setImmediate); - somewhat more realistic representation of a promise you'd actually typically see in reality:

Performed 5477 iterations to warmup
Performed 56758 iterations (with ALS enabled)
Performed 62029 iterations (with ALS disabled)

let fn = () => new Promise(process.nextTick); - reasonably realistic representation of higher-priority promises you might see in reality:

Performed 158118 iterations to warmup
Performed 582238 iterations (with ALS enabled)
Performed 2683626 iterations (with ALS disabled)

Most promises would be I/O related and would sit somewhere between the last two tests.

It's fair to want the performance to be better, but the path your code tested is actually quite different, according to the spec, from what an actual typical promise-based await would look like. I don't think it's too helpful to rely on micro-benchmarks, especially with a system this complex. I do think there may be optimization potential with AsyncLocalStorage to use the underlying machinery of async_hooks directly in a way that can avoid the numerous barrier crosses of the hook callback functions, though that would be a fairly complex change. 🤔

@danielgindi
Copy link
Author

I agree that micro-benchmarks do not present the whole picture.
But it does present the performance difference of this specific feature, and how it will affect your code if you have a lot of awaits/Promises. And if you do have a loop over little-work async code somewhere in your app, something that happens a lot of time, you will be surprised to see a major hit in performance.

So if the hit in the micro benchmark was 20%, that would be acceptable. But that's not the case.

@Qard
Copy link
Member

Qard commented Jul 31, 2020

As I said, what you were testing was not async. The performance difference would almost never look like that in reality because an actual async promise has a bunch more overhead than the non-async test function you awaited in your test code.

@danielgindi
Copy link
Author

So you're saying the numbers I've posted are not bad enough?
And this is intentional, as I wanted to measure the impact on the await.

@puzpuzpuz
Copy link
Member

@Qard thanks for these experiments. I also agree that ALS is meant to be used in real-world applications and it makes sense to benchmark it in a different way. Micro-benchmarks are certainly valuable for various optimizations, however web applications do more than what's done in the original snippet and ALS' overhead will be much lower for them.

@danielgindi however, this doesn't mean that async_hooks + promise integration doesn't need any performance improvements. But, as it was already discussed, all low-hanging fruit optimizations are already implemented. So, it may take some time and significant efforts to come up with more advanced optimizations.

@mcollina
Copy link
Member

So you're saying the numbers I've posted are not bad enough?

They are bad. However it is very rare that await is the bottleneck of any application. In order for await to be the bottleneck, what being await would have to be synchronous or fake-async.

By using a setTimeout of 2ms:

const { AsyncLocalStorage } = require('async_hooks');
const asyncLocalStorage = new AsyncLocalStorage();
const { promisify } = require('util')

let fn = promisify(setTimeout).bind(null, 2);

let runWithExpiry = async (expiry, fn) => {
      let iterations = 0;
      while (Date.now() < expiry) {
                await fn();
                iterations++;
            }
      return iterations;
};

(async () => {

      console.log(`Performed ${await runWithExpiry(Date.now() + 100, fn)} iterations to warmup`);

      asyncLocalStorage.run({}, () => {});
      let withAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withAls} iterations (with ALS enabled)`);

      asyncLocalStorage.disable();
      let withoutAls = await runWithExpiry(Date.now() + 10000, fn);
      console.log(`Performed ${withoutAls} iterations (with ALS disabled)`);

      console.log('ALS penalty: ' + Math.round((1 - (withAls / withoutAls)) * 10000) / 100 + '%');
})();

We get:

$ node a
Performed 44 iterations to warmup
Performed 4214 iterations (with ALS enabled)
Performed 4400 iterations (with ALS disabled)
ALS penalty: 4.23%

This is totally acceptable and well within range.

@danielgindi
Copy link
Author

@mcollina I get that. Although some async operations may take less than 2ms, still being async and taking advantage of libuv threads.

I'm going to do a wet test today on a large scale application that has all imaginable scenarios implemented, and that should service hundreds of millions of users. I'll let you guys know how it goes :-)

@puzpuzpuz
Copy link
Member

Since #36394 has landed and got shipped in v16.2.0, I tried running the original script with that release. Here is the result:

Performed 655186 iterations to warmup
Performed 3084236 iterations (with ALS enabled)
Performed 6097186 iterations (with ALS disabled)
ALS penalty: 49.42%

From 96.8% down to 49.42% here.

The script from #34493 (comment) (the 2ms setTimeout() one) gives the following result:

Performed 45 iterations to warmup
Performed 4202 iterations (with ALS enabled)
Performed 4277 iterations (with ALS disabled)
ALS penalty: 1.75%

From 4.23% down to 1.75% here.

The improvement is quite dramatic (thanks to @Qard) especially considering that the original script does almost nothing on the hot path, but creating promises. I think this issue can be now closed.

@devsnek devsnek closed this as completed May 20, 2021
@danielgindi
Copy link
Author

This is great! Real dramatic improvement. 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

No branches or pull requests

6 participants