This repository has been archived by the owner. It is now read-only.

AsyncWrap public API proposal #18

Merged
merged 1 commit into from Jan 27, 2017

Conversation

Projects
None yet
@trevnorris
Contributor

trevnorris commented Apr 27, 2016

After much investigation and communication this is the API that has
surfaced. Meant to be minimal, not impose any performance penalty to
core when not being used, and minimal impact when it is used, this
should serve public needs that have been expressed over the last two
years.

@nodejs/ctc I'd like the initial review explicitly from the CTC before this is opened for too much external debate. Because experience has shown me that there will be suggestions/changes for those who want specific features and/or additions to suit their specific use case. Usually not taking the time to realize that this API is enough. They just need to write the additional code for the hooks.

Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
Show outdated Hide outdated XXX-asyncwrap-api.md
@indutny

This comment has been minimized.

Show comment
Hide comment
@indutny

indutny May 2, 2016

Member

LGTM, except the mentioned nits.

Member

indutny commented May 2, 2016

LGTM, except the mentioned nits.

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@mhdawson

mhdawson May 3, 2016

Member

@trevnorris any chance to you have a branch somewhere with this API implemented that I could checkout and experiment with ?

Member

mhdawson commented May 3, 2016

@trevnorris any chance to you have a branch somewhere with this API implemented that I could checkout and experiment with ?

@trevnorris

This comment has been minimized.

Show comment
Hide comment
@trevnorris

trevnorris May 3, 2016

Contributor

@mhdawson Not completely. While writing this a few tweaks were added for API consistency. Most of it is implemented in process.binding('async_wrap'), but scope() and support for multiple listeners (i.e. it doesn't return a new instance when called) aren't there. Easiest way to see it in action is look at test-async-wrap*.

Contributor

trevnorris commented May 3, 2016

@mhdawson Not completely. While writing this a few tweaks were added for API consistency. Most of it is implemented in process.binding('async_wrap'), but scope() and support for multiple listeners (i.e. it doesn't return a new instance when called) aren't there. Easiest way to see it in action is look at test-async-wrap*.

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@mhdawson

mhdawson May 4, 2016

Member

My biggest question after reading this and doing some tests with process.binding('async_wrap'); is:

don't we have to define what you can/cannot assume in terms of the object passed as 'this' ?

I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case. If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future. Key would be to document what will or won't change across releases in terms of what you get for 'this' and the shape of those objects.

As a concrete example for:

crypto.randomBytes(): this is -> InternalFieldObject {
  ondone:
   { [Function]
     [length]: 0,
     [name]: '',
     [arguments]: null,
     [caller]: null,
     [prototype]: { [constructor]: [Circular] } } }
crypto.pbkdf2() this is -> InternalFieldObject { ondone: { [Function] [length]: 2, [name]: '' } }

and its not clear how I figure out from what's being passed to the hooks how you figure out which specific hook triggered the callback.

Maybe more than is currently encoded in the private api is going to be encoded into type in the init call, currently it just looks like the provider. If the type will help to identify the specific callback per provider then defining what will be in type and the values would help.

Member

mhdawson commented May 4, 2016

My biggest question after reading this and doing some tests with process.binding('async_wrap'); is:

don't we have to define what you can/cannot assume in terms of the object passed as 'this' ?

I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case. If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future. Key would be to document what will or won't change across releases in terms of what you get for 'this' and the shape of those objects.

As a concrete example for:

crypto.randomBytes(): this is -> InternalFieldObject {
  ondone:
   { [Function]
     [length]: 0,
     [name]: '',
     [arguments]: null,
     [caller]: null,
     [prototype]: { [constructor]: [Circular] } } }
crypto.pbkdf2() this is -> InternalFieldObject { ondone: { [Function] [length]: 2, [name]: '' } }

and its not clear how I figure out from what's being passed to the hooks how you figure out which specific hook triggered the callback.

Maybe more than is currently encoded in the private api is going to be encoded into type in the init call, currently it just looks like the provider. If the type will help to identify the specific callback per provider then defining what will be in type and the values would help.

@mike-kaufman

This comment has been minimized.

Show comment
Hide comment
@mike-kaufman

mike-kaufman May 4, 2016

Per @mhdawson's comments, are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this.

mike-kaufman commented May 4, 2016

Per @mhdawson's comments, are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this.

@trevnorris

This comment has been minimized.

Show comment
Hide comment
@trevnorris

trevnorris May 4, 2016

Contributor

I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case.

I'm fine with the idea that each constructor receives its own unique provider id. Not sure what you mean by "their callbacks", and the expected this would be an instance of the specified provider.

If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future.

All of this is already exposed via ._handle on pretty much everything. For most (maybe all) handles you can access the user's constructed instance via this.owner. Reason I'm not passing that by default is because it doesn't always exist, and always passing the handle attached to the C++ class instance is more consistent.

As for what we can change, there's no guarantee what fields will be available. Part of the initial concept of this API was users who wanted to know what node was doing. Not have just another abstracted API, that could be done easily enough in another way. I'm sure there'll be disagreement about what we should be able to rely on once a branch has reached stable, but that was never part of the initial design plan. It's basically "accessing this is equivalent to accessing _handle, and as such there's no guarantee to what fields are available". The one possible exception is that .owner is always made available so if it exists then users can get access to the JS object instance.

Re: InternalFieldObject That can be easily enough changed so every constructor has their own provider id. As explained above.

are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this.

Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically. Ideally in the future there could be a basic set of calls that could be standardized (e.g. .providerType()), and this isn't information that isn't now available. e.g.

'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
var handle;
async_wrap.setupHooks({ init() { handle = this } });
async_wrap.enable();
var server = require('net').createServer().listen(8080);
print(server._handle === handle);
server.close();
// output: true

I use it for debugging as well. With the understanding that things change, that's part of its utility. By addition of the id, explicitly passing the provider, etc. we're not forcing use of the handle on anyone. Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful.

Contributor

trevnorris commented May 4, 2016

I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case.

I'm fine with the idea that each constructor receives its own unique provider id. Not sure what you mean by "their callbacks", and the expected this would be an instance of the specified provider.

If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future.

All of this is already exposed via ._handle on pretty much everything. For most (maybe all) handles you can access the user's constructed instance via this.owner. Reason I'm not passing that by default is because it doesn't always exist, and always passing the handle attached to the C++ class instance is more consistent.

As for what we can change, there's no guarantee what fields will be available. Part of the initial concept of this API was users who wanted to know what node was doing. Not have just another abstracted API, that could be done easily enough in another way. I'm sure there'll be disagreement about what we should be able to rely on once a branch has reached stable, but that was never part of the initial design plan. It's basically "accessing this is equivalent to accessing _handle, and as such there's no guarantee to what fields are available". The one possible exception is that .owner is always made available so if it exists then users can get access to the JS object instance.

Re: InternalFieldObject That can be easily enough changed so every constructor has their own provider id. As explained above.

are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this.

Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically. Ideally in the future there could be a basic set of calls that could be standardized (e.g. .providerType()), and this isn't information that isn't now available. e.g.

'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
var handle;
async_wrap.setupHooks({ init() { handle = this } });
async_wrap.enable();
var server = require('net').createServer().listen(8080);
print(server._handle === handle);
server.close();
// output: true

I use it for debugging as well. With the understanding that things change, that's part of its utility. By addition of the id, explicitly passing the provider, etc. we're not forcing use of the handle on anyone. Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful.

@mike-kaufman

This comment has been minimized.

Show comment
Hide comment
@mike-kaufman

mike-kaufman May 5, 2016

Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically.

Providing storage for the async context is different than exposing the handle though.

I use it for debugging as well.

IMO, I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly. It still meets the criteria of providing arbitrary storage associated with the "handle", it provides a place to define a common interface across handles, and it can evolve independently of the underlying handle.

Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful.

I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this?

mike-kaufman commented May 5, 2016

Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically.

Providing storage for the async context is different than exposing the handle though.

I use it for debugging as well.

IMO, I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly. It still meets the criteria of providing arbitrary storage associated with the "handle", it provides a place to define a common interface across handles, and it can evolve independently of the underlying handle.

Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful.

I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this?

@trevnorris

This comment has been minimized.

Show comment
Hide comment
@trevnorris

trevnorris May 5, 2016

Contributor

Providing storage for the async context is different than exposing the handle though.

Creating and tracking a new async context for every handle, and tracking it, is expensive. By attaching properties directly to the handle instance GC will take care of it all automatically, and at the least expense.

I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly.

This can be, or at least should be, construct-able by the user. Creating all these new objects filled with properties is expensive, and you're missing that printing the actual contents of the handle is useful. And I don't share the concern about possibly needing to standardize properties in the handle and making it difficult for node to move forward. I've been aiming for a more standardized lower-level API, and "hiding" properties on an object in a significant way has become easier with ES6. But this is a separate topic.

I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this?

Here's a really basic example script that should explain how useful it is to be able to see the handles themselves while debugging:

'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
const ctx_array = [];

async_wrap.setupHooks({
  init() { /*print(this)*/ },
  pre() {
    if (ctx_array.indexOf(this) === -1) {
      ctx_array.push(this);
      print(this);
    }
  },
});
async_wrap.enable();

process.on('exit', () => print(ctx_array.length));

require('net').createServer(function(c) {
  require('fs').readFile(__filename, () => {
    c.end(new Buffer(1024 * 1024 * 100).fill('a'));
    this.close();
  });
}).listen(8080);

require('net').connect(8080, function() { this.resume() });

In there you'll see a WriteWrap which encapsulates the writing of the data from server to client, and gives access to the buffer being written. Useful for inspecting all TCP packets going through the server. Also the GetAddrInfoReqWrap which indicates there was a dns lookup for a host. Which is available under hostname. Or the TCPConnectWrap which gives you information about the remote server attempting to connect. Or the ShutdownWrap that alerts us that the connection is closing. The FSReqWrap is useful that we can see the contents of the file that's been read in, and even the position of the file that was read.

I hope this demonstrates the utility for being able to analyse each handle. All of the things mentioned in the previous paragraph cannot be obtained any other way. Removing the ability to see the handle would be a blow to the API, and basically be one step towards moving it to nothing more than a continuation-local-storage API.

Contributor

trevnorris commented May 5, 2016

Providing storage for the async context is different than exposing the handle though.

Creating and tracking a new async context for every handle, and tracking it, is expensive. By attaching properties directly to the handle instance GC will take care of it all automatically, and at the least expense.

I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly.

This can be, or at least should be, construct-able by the user. Creating all these new objects filled with properties is expensive, and you're missing that printing the actual contents of the handle is useful. And I don't share the concern about possibly needing to standardize properties in the handle and making it difficult for node to move forward. I've been aiming for a more standardized lower-level API, and "hiding" properties on an object in a significant way has become easier with ES6. But this is a separate topic.

I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this?

Here's a really basic example script that should explain how useful it is to be able to see the handles themselves while debugging:

'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
const ctx_array = [];

async_wrap.setupHooks({
  init() { /*print(this)*/ },
  pre() {
    if (ctx_array.indexOf(this) === -1) {
      ctx_array.push(this);
      print(this);
    }
  },
});
async_wrap.enable();

process.on('exit', () => print(ctx_array.length));

require('net').createServer(function(c) {
  require('fs').readFile(__filename, () => {
    c.end(new Buffer(1024 * 1024 * 100).fill('a'));
    this.close();
  });
}).listen(8080);

require('net').connect(8080, function() { this.resume() });

In there you'll see a WriteWrap which encapsulates the writing of the data from server to client, and gives access to the buffer being written. Useful for inspecting all TCP packets going through the server. Also the GetAddrInfoReqWrap which indicates there was a dns lookup for a host. Which is available under hostname. Or the TCPConnectWrap which gives you information about the remote server attempting to connect. Or the ShutdownWrap that alerts us that the connection is closing. The FSReqWrap is useful that we can see the contents of the file that's been read in, and even the position of the file that was read.

I hope this demonstrates the utility for being able to analyse each handle. All of the things mentioned in the previous paragraph cannot be obtained any other way. Removing the ability to see the handle would be a blow to the API, and basically be one step towards moving it to nothing more than a continuation-local-storage API.

@mhdawson

This comment has been minimized.

Show comment
Hide comment
@mhdawson

mhdawson May 6, 2016

Member

@trevnorris what I was referring to in respect to "their callbacks" was that for the CRYPTO provider there are multiple cases were callbacks are wrapped such that they pre, post methods are invoked (as in my example). I think your comment about making each of these have their own provider id addresses that question.

I terms of the discussion about visibility of the handles, from what you describe we should document both in this eps what it's ok/not ok to use the handlers for and what expectations are. For example:

  • it is ok to store data in the handle by adding fields, but it is your responsibility to ensure that the namespace is unique enough that the names will not collide with any additions made in future Node.js versions
  • you may choose to inspect the contents of the handle, however, these are not part of the public API and will change between releases.
  • The list of providers may change from release to release, it is up to your code to handle any additions/deletions in a graceful manner.

If we believe that documenting a list like this is enough protection from being boxed in later when users of the API are broken by later Node.js releases and complain, then passing the handles would be fine. If we were concerned that despite the warnings we'd still be trying to avoid breakage passing some other field from wrapper could make sense.

Member

mhdawson commented May 6, 2016

@trevnorris what I was referring to in respect to "their callbacks" was that for the CRYPTO provider there are multiple cases were callbacks are wrapped such that they pre, post methods are invoked (as in my example). I think your comment about making each of these have their own provider id addresses that question.

I terms of the discussion about visibility of the handles, from what you describe we should document both in this eps what it's ok/not ok to use the handlers for and what expectations are. For example:

  • it is ok to store data in the handle by adding fields, but it is your responsibility to ensure that the namespace is unique enough that the names will not collide with any additions made in future Node.js versions
  • you may choose to inspect the contents of the handle, however, these are not part of the public API and will change between releases.
  • The list of providers may change from release to release, it is up to your code to handle any additions/deletions in a graceful manner.

If we believe that documenting a list like this is enough protection from being boxed in later when users of the API are broken by later Node.js releases and complain, then passing the handles would be fine. If we were concerned that despite the warnings we'd still be trying to avoid breakage passing some other field from wrapper could make sense.

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell May 6, 2016

Member

@trevnorris a couple more clarifying questions ...

Let's say I create a hook and some dependency module I'm using creates a hook... when those are called, are they passed the same id and handle, different id's same handle, same id's different handle or different ids and different handles? (and by handle here I mean the js object that wraps the actual handle). The main reason I ask is that if I'm attaching additional context to the handle, it would be helpful to also know that other hooks could be attaching their own context to the same handle.

I'm still wondering about the potential cost of creating too many of these which is why I think describing the specific lifecycle from when a hook is created to when it is destroyed would be very helpful. While I understand that you've designed and implemented this to be as low impact on performance as possible, there is a non-zero cost to calling these hooks. Have you had the opportunity yet to benchmark an upper limit to the number of hooks that can be created without having a serious impact on performance? My key concern with this is that an app developer may not have any idea that dependency modules they may be using could be going out and creating hooks. Depending on how many such dependencies they have, they could end up seeing degraded performance without any clear indication as to why since installing the hook appears to be a completely transparent operation (that is, there's no indication that a new hook was created).

Member

jasnell commented May 6, 2016

@trevnorris a couple more clarifying questions ...

Let's say I create a hook and some dependency module I'm using creates a hook... when those are called, are they passed the same id and handle, different id's same handle, same id's different handle or different ids and different handles? (and by handle here I mean the js object that wraps the actual handle). The main reason I ask is that if I'm attaching additional context to the handle, it would be helpful to also know that other hooks could be attaching their own context to the same handle.

I'm still wondering about the potential cost of creating too many of these which is why I think describing the specific lifecycle from when a hook is created to when it is destroyed would be very helpful. While I understand that you've designed and implemented this to be as low impact on performance as possible, there is a non-zero cost to calling these hooks. Have you had the opportunity yet to benchmark an upper limit to the number of hooks that can be created without having a serious impact on performance? My key concern with this is that an app developer may not have any idea that dependency modules they may be using could be going out and creating hooks. Depending on how many such dependencies they have, they could end up seeing degraded performance without any clear indication as to why since installing the hook appears to be a completely transparent operation (that is, there's no indication that a new hook was created).

@trevnorris

This comment has been minimized.

Show comment
Hide comment
@trevnorris

trevnorris Sep 19, 2016

Contributor

@Qard I've outlined my stance on working with Promises in Chromium but tracker as MicrotaskQueue lacking introspection APIs. There are some clarifications in my next response.

TL;DR: As far as the asynchronous execution stack is concerned, calls to .then() aren't important. What's important is when the callback passed to .then() is called.

The same can be said for the event emitter. We don't care about every call to .on(). We care when the callback passed to the event is called. Example using event emitter similar to how Promises should work:

const e = new (require('events'));
e.on('foo', () => a.push(require('async_hooks').currentId()));
const a = [];

setTimeout(() => e.emit('foo'), 100);
setTimeout(() => e.emit('foo'), 200);
setTimeout(() => console.log(a[0] === a[1]), 300);

// output: false

Taking this a step further, and to something closer to a Promise, we'll add two listeners with different current id's:

const e = new (require('events'));
const currentId = require('async_hooks').currentId;
e.on('foo', () => a1 = currentId());
let a1, a2;

setImmediate(() => e.on('foo', () => a2 = currentId()));
setTimeout(() => e.emit('foo'), 100);
setTimeout(() => console.log(a1 === a2), 200);

// output: true

As we can see, the currentId() of both listeners are the same. Because they were both executed in the same space.

Now let's juxtapose the above with an example using Promises:

const currentId = require('async_hooks').currentId;
const p = new Promise(res => setTimeout(res, 100));
p.then(() => console.log(currentId()));

setTimeout(() => p.then(() => console.log(currentId()), 100);

How this would work is that the call to res will create a new asynchronous resource. Thus calling init(). Any callbacks passed to that Promise's .then() will be wrapped in that resource's before()/after(). So all callbacks in all .then() calls will all have the same current id. Regardless of which callstack the .then() was called in.

Further, the return value of a .then() is a resolved Promise. Which is the same asynchronous resource type use for res. Callbacks of any chained calls could be traced back to the root origin (i.e. all stacks should be traceable back to id 1). There are a few nuances that need to be hammered out, but can be discussed later.

Though looking back at your example I believe we agree when events should be fired for Promises.


Concerning splitting init to create and queue, I don't see the merit in it for the following reasons:

  • A resource (e.g. TCPWrap) is never allocated until it's needed. We don't preemptively create handles. If, in the future, any were to be created early then they would be made to not fire init() until it needs to be used.
  • Whenever a handle is taken from the pool to be reused it's given a new unique id. Which means the init() callback couldn't receive the id of the handle. And in some cases the resource may only exist in C++, removing the ability to pass the handle of the object to the init() callback.

The end of life for a handle is determined by when that resource could be free'd. If it's instead placed in a pool instead of being free'd is of no concern. At that point it's no more than a void*. Which can be easily tracked using snapshots (one of my earlier patches was to make all internal classes that inherit from AsyncWrap show up in a snapshot).

  • Reuse of handles is the exception. Calling queue would almost always happen immediately after create. Causing nothing but additional overhead most of the time. I'm even mulling over whether it'd be appropriate to create something like handleAfter and requestAfter. Since the later implies a destroy(). Thus removing the need to call one more callback for cases like nextTick().

Here is possibly near the most basic HTTP server:

require('http').createServer((req, res) => res.end('bye')).listen(8080);

Here's the list of handles w/ id's created for doing a simple curl 'http://localhost:8080/':

TCPWRAP(4)
TickObject(5)
TCPWRAP(6)
Timeout(7)
TIMERWRAP(8)
PARSER(9)
TickObject(10)
Timeout(11)
TIMERWRAP(12)
TickObject(13)
Timeout(14)
TickObject(15)
TickObject(16)
Timeout(17)
TickObject(18)
TickObject(19)
TickObject(20)
TickObject(21)
TickObject(22)
Timeout(23)
TickObject(24)
TickObject(25)

There are 22 in all. 12 of those from nextTick(). Doing 20k req/sec with no hooks in place means at least 440k new id's/sec. While I've gone to abnormal lengths to make sure just the checks don't affect performance, they aren't free and I'm still struggling today to regain performance. Currently if all 4 hooks were added we'd increase the number of calls by about 8.8 million 1.7 million/sec. One more hook would add another half million to that count.

  • The only argument I can see for needing queue is to play nice with Promises. Promise executors run synchronously. Which makes the executor just another function call in the synchronous call stack and is of no concern to async hooks. Also if async hooks concerns itself with calls to .then() it must then also concern itself with calls to .on() for consistency. Both of these are outside the purview of async hooks.

Aside: this exercise has been a demonstration that nothing is free, and if Zones were to require we wrap every .on()/.emit() we're going to suffer a greater performance loss. There are a total of 30 .on()/.emit() calls made for a single HTTP request.

The creation of async hooks was to give insight into the life cycle of the event loop. I can appreciate people want different things out of this, but, and especially for, the initial implementation I plan on keeping it down to as bare minimum as possible. In fact I'm considering removing the fact that currentId() can be called w/o any hooks enabled. Since the user can create an empty hook, it's easy enough to signal you want that functionality w/o incurring the overhead of calling empty functions.

Excuse the long post. I personally dislike it when others do this to me, but couldn't find a better way to communicate everything.

Contributor

trevnorris commented Sep 19, 2016

@Qard I've outlined my stance on working with Promises in Chromium but tracker as MicrotaskQueue lacking introspection APIs. There are some clarifications in my next response.

TL;DR: As far as the asynchronous execution stack is concerned, calls to .then() aren't important. What's important is when the callback passed to .then() is called.

The same can be said for the event emitter. We don't care about every call to .on(). We care when the callback passed to the event is called. Example using event emitter similar to how Promises should work:

const e = new (require('events'));
e.on('foo', () => a.push(require('async_hooks').currentId()));
const a = [];

setTimeout(() => e.emit('foo'), 100);
setTimeout(() => e.emit('foo'), 200);
setTimeout(() => console.log(a[0] === a[1]), 300);

// output: false

Taking this a step further, and to something closer to a Promise, we'll add two listeners with different current id's:

const e = new (require('events'));
const currentId = require('async_hooks').currentId;
e.on('foo', () => a1 = currentId());
let a1, a2;

setImmediate(() => e.on('foo', () => a2 = currentId()));
setTimeout(() => e.emit('foo'), 100);
setTimeout(() => console.log(a1 === a2), 200);

// output: true

As we can see, the currentId() of both listeners are the same. Because they were both executed in the same space.

Now let's juxtapose the above with an example using Promises:

const currentId = require('async_hooks').currentId;
const p = new Promise(res => setTimeout(res, 100));
p.then(() => console.log(currentId()));

setTimeout(() => p.then(() => console.log(currentId()), 100);

How this would work is that the call to res will create a new asynchronous resource. Thus calling init(). Any callbacks passed to that Promise's .then() will be wrapped in that resource's before()/after(). So all callbacks in all .then() calls will all have the same current id. Regardless of which callstack the .then() was called in.

Further, the return value of a .then() is a resolved Promise. Which is the same asynchronous resource type use for res. Callbacks of any chained calls could be traced back to the root origin (i.e. all stacks should be traceable back to id 1). There are a few nuances that need to be hammered out, but can be discussed later.

Though looking back at your example I believe we agree when events should be fired for Promises.


Concerning splitting init to create and queue, I don't see the merit in it for the following reasons:

  • A resource (e.g. TCPWrap) is never allocated until it's needed. We don't preemptively create handles. If, in the future, any were to be created early then they would be made to not fire init() until it needs to be used.
  • Whenever a handle is taken from the pool to be reused it's given a new unique id. Which means the init() callback couldn't receive the id of the handle. And in some cases the resource may only exist in C++, removing the ability to pass the handle of the object to the init() callback.

The end of life for a handle is determined by when that resource could be free'd. If it's instead placed in a pool instead of being free'd is of no concern. At that point it's no more than a void*. Which can be easily tracked using snapshots (one of my earlier patches was to make all internal classes that inherit from AsyncWrap show up in a snapshot).

  • Reuse of handles is the exception. Calling queue would almost always happen immediately after create. Causing nothing but additional overhead most of the time. I'm even mulling over whether it'd be appropriate to create something like handleAfter and requestAfter. Since the later implies a destroy(). Thus removing the need to call one more callback for cases like nextTick().

Here is possibly near the most basic HTTP server:

require('http').createServer((req, res) => res.end('bye')).listen(8080);

Here's the list of handles w/ id's created for doing a simple curl 'http://localhost:8080/':

TCPWRAP(4)
TickObject(5)
TCPWRAP(6)
Timeout(7)
TIMERWRAP(8)
PARSER(9)
TickObject(10)
Timeout(11)
TIMERWRAP(12)
TickObject(13)
Timeout(14)
TickObject(15)
TickObject(16)
Timeout(17)
TickObject(18)
TickObject(19)
TickObject(20)
TickObject(21)
TickObject(22)
Timeout(23)
TickObject(24)
TickObject(25)

There are 22 in all. 12 of those from nextTick(). Doing 20k req/sec with no hooks in place means at least 440k new id's/sec. While I've gone to abnormal lengths to make sure just the checks don't affect performance, they aren't free and I'm still struggling today to regain performance. Currently if all 4 hooks were added we'd increase the number of calls by about 8.8 million 1.7 million/sec. One more hook would add another half million to that count.

  • The only argument I can see for needing queue is to play nice with Promises. Promise executors run synchronously. Which makes the executor just another function call in the synchronous call stack and is of no concern to async hooks. Also if async hooks concerns itself with calls to .then() it must then also concern itself with calls to .on() for consistency. Both of these are outside the purview of async hooks.

Aside: this exercise has been a demonstration that nothing is free, and if Zones were to require we wrap every .on()/.emit() we're going to suffer a greater performance loss. There are a total of 30 .on()/.emit() calls made for a single HTTP request.

The creation of async hooks was to give insight into the life cycle of the event loop. I can appreciate people want different things out of this, but, and especially for, the initial implementation I plan on keeping it down to as bare minimum as possible. In fact I'm considering removing the fact that currentId() can be called w/o any hooks enabled. Since the user can create an empty hook, it's easy enough to signal you want that functionality w/o incurring the overhead of calling empty functions.

Excuse the long post. I personally dislike it when others do this to me, but couldn't find a better way to communicate everything.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Sep 20, 2016

Member

(Sorry, I probably just don’t have the time to give this a full review, so I’m going to have to abstain from voting for now.)

Member

addaleax commented Sep 20, 2016

(Sorry, I probably just don’t have the time to give this a full review, so I’m going to have to abstain from voting for now.)

@Trott

This comment has been minimized.

Show comment
Hide comment
@Trott

Trott Sep 20, 2016

Member

@addaleax No shame in abstaining. People should do it more often (including me). There's a lot to follow/know on the project and it's impossible to keep up with it all.

Member

Trott commented Sep 20, 2016

@addaleax No shame in abstaining. People should do it more often (including me). There's a lot to follow/know on the project and it's impossible to keep up with it all.

@Trott Trott removed the tsc-agenda label Sep 21, 2016

@Trott

This comment has been minimized.

Show comment
Hide comment
@Trott

Trott Sep 21, 2016

Member

This has been ratified by the CTC. Status can be changed from DRAFT to ACCEPTED and this can be merged. I've removed the ctc-agenda label.

Member

Trott commented Sep 21, 2016

This has been ratified by the CTC. Status can be changed from DRAFT to ACCEPTED and this can be merged. I've removed the ctc-agenda label.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Sep 21, 2016

Member

@Trott wait who voted?

Member

Fishrock123 commented Sep 21, 2016

@Trott wait who voted?

@Trott

This comment has been minimized.

Show comment
Hide comment
@Trott

Trott Sep 21, 2016

Member

@Fishrock123

9 votes in favor: @indutny @Trott @trevnorris @rvagg @evanlucas @ChALkeR @Fishrock123 @mscdex @cjihrig

0 votes against:

3 abstentions: @misterdjules @addaleax @thealphanerd

6 members who did not cast a vote and did not indicate they were abstaining: @chrisdickinson @shigeki @bnoordhuis @mhdawson @ofrobots @jasnell

Member

Trott commented Sep 21, 2016

@Fishrock123

9 votes in favor: @indutny @Trott @trevnorris @rvagg @evanlucas @ChALkeR @Fishrock123 @mscdex @cjihrig

0 votes against:

3 abstentions: @misterdjules @addaleax @thealphanerd

6 members who did not cast a vote and did not indicate they were abstaining: @chrisdickinson @shigeki @bnoordhuis @mhdawson @ofrobots @jasnell

@Qard

This comment has been minimized.

Show comment
Hide comment
@Qard

Qard Sep 21, 2016

Member

@trevnorris Thanks for the deep explanation.

My hope with the create/queue event split was that CLS could link then() and on() callbacks to where they were attached. Without support for that, CLS will need to continue to monkey-patch a bunch of things, which probably has quite a bit more performance impact than if async_wrap was able to provide the appropriate hooks.

In its current state, async_wrap only solves part of the problem of async transaction tracing. You can get the code path the internals took, but you can't really get the more contextually useful path of how it got through the users code.

Member

Qard commented Sep 21, 2016

@trevnorris Thanks for the deep explanation.

My hope with the create/queue event split was that CLS could link then() and on() callbacks to where they were attached. Without support for that, CLS will need to continue to monkey-patch a bunch of things, which probably has quite a bit more performance impact than if async_wrap was able to provide the appropriate hooks.

In its current state, async_wrap only solves part of the problem of async transaction tracing. You can get the code path the internals took, but you can't really get the more contextually useful path of how it got through the users code.

@Jeff-Lewis

This comment has been minimized.

Show comment
Hide comment
@Jeff-Lewis

Jeff-Lewis Sep 22, 2016

@trevnorris By the way, thank you for all your work in Node and async_wrap.

As far as the asynchronous execution stack is concerned, calls to .then() aren't important. What's important is when the callback passed to .then() is called.

Is this how the current unofficial async_wrap works? I'm sorry I'm having trouble following the changes in behavior b/w this EPS and the current released async_wrap.

The real hope for myself and I think many others using CLS, is if this EPS version eliminates or help reduces the monkey-patching needed in order to have reliable CLS in node?

I might be off-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in? For the non-embedded folks, hardware and scaling is only getting cheaper.

Jeff-Lewis commented Sep 22, 2016

@trevnorris By the way, thank you for all your work in Node and async_wrap.

As far as the asynchronous execution stack is concerned, calls to .then() aren't important. What's important is when the callback passed to .then() is called.

Is this how the current unofficial async_wrap works? I'm sorry I'm having trouble following the changes in behavior b/w this EPS and the current released async_wrap.

The real hope for myself and I think many others using CLS, is if this EPS version eliminates or help reduces the monkey-patching needed in order to have reliable CLS in node?

I might be off-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in? For the non-embedded folks, hardware and scaling is only getting cheaper.

@trevnorris

This comment has been minimized.

Show comment
Hide comment
@trevnorris

trevnorris Sep 22, 2016

Contributor

@Jeff-Lewis

Is this how the current unofficial async_wrap works?

The current implementation is a hybrid. There were a lot of requests to allow multiple hook instances. Instead of having a single global set. This required propagating the state of hooks for each execution stack.

The real hope for myself and I think many others using CLS, is if this EPS version eliminate or help reduce the monkey-patching needed in order to have reliable CLS in node?

When it comes to patching .then(), with the impending async/await syntax no amount of monkey patching will help. We'll have to depend on V8 API.

I might be of-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in?

I put in many hours in an attempt to make this functionality opt-in, but the problem is a certain amount of state still needs to be maintained and propagated because a hook may be enabled at any time. In my PR nodejs/node#8531 I purposely left the commits intact so anyone could view the effort to keep the old functionality in.

Contributor

trevnorris commented Sep 22, 2016

@Jeff-Lewis

Is this how the current unofficial async_wrap works?

The current implementation is a hybrid. There were a lot of requests to allow multiple hook instances. Instead of having a single global set. This required propagating the state of hooks for each execution stack.

The real hope for myself and I think many others using CLS, is if this EPS version eliminate or help reduce the monkey-patching needed in order to have reliable CLS in node?

When it comes to patching .then(), with the impending async/await syntax no amount of monkey patching will help. We'll have to depend on V8 API.

I might be of-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in?

I put in many hours in an attempt to make this functionality opt-in, but the problem is a certain amount of state still needs to be maintained and propagated because a hook may be enabled at any time. In my PR nodejs/node#8531 I purposely left the commits intact so anyone could view the effort to keep the old functionality in.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Sep 26, 2016

Member

but you can't really get the more contextually useful path of how it got through the users code.

@Qard I think @trevnorris is going to be making parentId in init be the conceptual parent, and getCurrentId within init be the technical parent. Does that solve this issue?

Member

Fishrock123 commented Sep 26, 2016

but you can't really get the more contextually useful path of how it got through the users code.

@Qard I think @trevnorris is going to be making parentId in init be the conceptual parent, and getCurrentId within init be the technical parent. Does that solve this issue?

@Qard

This comment has been minimized.

Show comment
Hide comment
@Qard

Qard Sep 26, 2016

Member

Maybe. I'll have to dig into it at some point to see how it works. I don't work in APM anymore, so I haven't been paying super close attention to this stuff lately. 😕

Member

Qard commented Sep 26, 2016

Maybe. I'll have to dig into it at some point to see how it works. I don't work in APM anymore, so I haven't been paying super close attention to this stuff lately. 😕

@AndreasMadsen AndreasMadsen referenced this pull request Sep 27, 2016

Closed

[async_hooks] tracking issue #29

23 of 27 tasks complete
006: AsyncHooks public API proposal
After much investigation and communication this is the API for
AsyncHooks that has evolved. Meant to be minimal, not impose any
performance penalty to core when not being used, and minimal impact when
it is used, this should serve public needs that have been expressed over
the last two years.

PR-URL: #18
Reviewed-By: Fedor Indutny <fedor@indutny.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>

@trevnorris trevnorris merged commit 523e9aa into master Jan 27, 2017

@trevnorris trevnorris deleted the async-wrap-ep branch Jan 27, 2017

@AndreasMadsen AndreasMadsen referenced this pull request May 12, 2017

Closed

async_wrap,src: promise hook integration #13000

3 of 4 tasks complete
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.