Skip to content
This repository has been archived by the owner. It is now read-only.

AsyncWrap public API proposal #18

Merged
merged 1 commit into from Jan 27, 2017
Merged

AsyncWrap public API proposal #18

merged 1 commit into from Jan 27, 2017

Conversation

@trevnorris
Copy link
Contributor

@trevnorris trevnorris commented Apr 27, 2016

After much investigation and communication this is the API that has
surfaced. Meant to be minimal, not impose any performance penalty to
core when not being used, and minimal impact when it is used, this
should serve public needs that have been expressed over the last two
years.

@nodejs/ctc I'd like the initial review explicitly from the CTC before this is opened for too much external debate. Because experience has shown me that there will be suggestions/changes for those who want specific features and/or additions to suit their specific use case. Usually not taking the time to realize that this API is enough. They just need to write the additional code for the hooks.

* `parentId` {Number}

Called when a class is constructed that has the possibility to trigger an
asynchronous event. This does mean the instance _will_ trigger a

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 27, 2016
Member

does not mean

longer needs it.

Because the callback is called during deconstruction of the class instance it
is not safe to access the JS handle during `destroy()` execution. Reason this

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 27, 2016
Member

Does that mean that this is not that handle, i.e. undefined?

This comment has been minimized.

@trevnorris

trevnorris Apr 28, 2016
Author Contributor

Yes. To reiterate a comment I just posted as to why this is the case:

Now, destroy() doesn't have a this because it's essentially a static method. It's unsafe to access a C++ class instance while deconstructing. So only the id of the handle is passed to JS so any storage resources can be released.

So you could essentially see undefined as the handle instance at that point in time. Since it has been (or soon will be) free'd.

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 28, 2016
Member

But is the handle instance (i.e. if you access this) actually undefined here?

This comment has been minimized.

@trevnorris

trevnorris Apr 28, 2016
Author Contributor

Yes. Logging this will show undefined.

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

The inconsistency here between destroy and the other callbacks is another argument in favor of avoiding the use of this and passing the context as an argument. The destroy method simply would not have that argument set. As @chrisdickinson mentions it's more of a stylistic choice but I'd certainly prefer it

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Nit: The reason this...

This comment has been minimized.

@trevnorris

trevnorris Apr 29, 2016
Author Contributor

@jasnell I disagree that there's fundamental inconsistency. When destroy() is called the handle should no longer available for access. It reflects the fact that by accessing the handle you have undefined behavior. Remember that it's possible to store the object prior to calling destroy(), and while the handle may exist in JS accessing it leads to undefined behavior because the backing C++ class no longer exists. Thus making this === undefined is because this is literally undefined at that point.

This comment has been minimized.

@trevnorris

trevnorris Apr 29, 2016
Author Contributor

screw it. i'm going through and making it safe to access this regardless of where it's accessed. can't having the application abort, even if they screwed up. so this in destroy() will be set to the handle.

everything can be safely accessed. So it is not recommended to blindly
iterate over the object's properties. Despite this, it is important to give
access of the handle to the user so they can utilize tracking methods of their
choosing. For example:

This comment has been minimized.

@chrisdickinson

chrisdickinson Apr 27, 2016

It'd be nice to avoid meaningful this in new APIs, but I understand if it's impossible due to perf concerns.

This comment has been minimized.

@trevnorris

trevnorris Apr 28, 2016
Author Contributor

I'd be lying if I said it was for performance. It's done this way because it makes syntactic sense. The callback is running as an extension of the handle, thus the handle becomes the calling context for the function. Why do you have an aversion to using this? Other than disallowing the use of arrow functions, which I don't accept as a valid argument over having a syntactically correct API, I don't see any disadvantage. While the advantage is that it more closely mimics the C++ class counterpart. Which is important here because they're so closely bound.

Now, destroy() doesn't have a this because it's essentially a static method. It's unsafe to access a C++ class instance while deconstructing. So only the id of the handle is passed to JS so any storage resources can be released.

This comment has been minimized.

@chrisdickinson

chrisdickinson Apr 29, 2016

The argument against this is primarily one of convention — anecdotally, many JS APIs have been moving away from "load-bearing this" over the last few years. When I was poking at the asyncwrap API (pre-docs), I was surprised that this carried pertinent info — it was not obvious to me that the hook was running as an extension of the handle.

It's admittedly a nit — but if it comes at zero cost & makes the API more obvious, it might be worth addressing.

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

I agree with @chrisdickinson on this point.


Node currently doesn't have sufficient API to notify calls to Promise
callbacks. In order to do so node would have to override the native
implementation.

This comment has been minimized.

@chrisdickinson

chrisdickinson Apr 27, 2016

I think we have to solve this before the API can be made public.

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 27, 2016
Member

This is a V8 API problem, right?

This comment has been minimized.

@chrisdickinson

chrisdickinson Apr 27, 2016

Yep — it was on my plate last but I ran out of time to work on it. It was looking like the solution would be pre/post hooks for executing microtaskqueue items — my original approach of making the queue itself pluggable wouldn't work because there are more than just JS functions in that queue.

This comment has been minimized.

@trevnorris

trevnorris Apr 28, 2016
Author Contributor

@chrisdickinson If we absolutely have to I'd prefer shimming native promises over not shipping this. We don't currently offer any type of support for Promises in this way. e.g. domains loosing context:

const domain = require('domain');
const print = process._rawDebug;
domain.create().run(() => Promise.resolve(1).then((val) => print(process.domain)));
// output: undefined

And TBH refusing this type of insight into node just because v8 doesn't provide us with more than a black box is a serious pain.

If, in fact, the CTC more unanimously decides that it's important that we wait for v8 to provide the appropriate APIs to accomplish this then I'd at least like to get approval for the EP so I can finish the implementation exposed via process.binding(). Then at least it's exposed in a way that can still be useful to many and will be ready once v8 provides what we need.

This comment has been minimized.

@chrisdickinson

chrisdickinson Apr 29, 2016

Right — this is a problem with domains as well, and I empathize with your desire to avoid making AsyncWrap dependent on V8 changes. However, without promise support, we can't make the sort of assertions we'd like to make — namely, being able to reliably back-associate any event with the event that queued it. This is somewhat less of an issue for domains, since domains and promises ostensibly both provide solutions for asynchronous error handling, and (leaving aside the domains-as-CLS use case) generally don't have much overlap.

On the upside, this is something we only have to solve once, and after that point any further microtask use from V8 is taken care of.

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Unless I overlooked it, something should be included about whether hooks are blocking or not. What happens, for instance, if I drop an endless loop into pre() for instance? Is there any risk to passing this to setImmediate() or process.nextTick() in the hook callback? That sort of thing.

This comment has been minimized.

@trevnorris

trevnorris Apr 29, 2016
Author Contributor

@jasnell as in whether async wrap callbacks run on the same thread? didn't think specifying that would be necessary, but can if you think so. as far as keeping a reference to this indefinitely, i'm in the middle of a PR to fix it so you can.

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

I think the documentation should at least give a recommendation one way or the other as to whether the callbacks should be sync or async and what the possible ramifications are.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

However, without promise support, we can't make the sort of assertions we'd like to make — namely, being able to reliably back-associate any event with the event that queued it.

+1.

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

However, without promise support, we can't make the sort of assertions we'd like to make — namely, being able to reliably back-associate any event with the event that queued it.

This isn't completely accurate. It requires that we always drain the microtask queue in a specific way, and you will loose call stack information, but passing the correct parent id along is possible.

// trace filtering of events.
const providers = async_wrap.Providers;
// init() is called during object construction. The object handle being

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Can you add a bit more up front about the over all design. Such as, what it a handle and what role does it play here? I know this isn't the API doc but it would help level set for anyone coming in and doing a review

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

It would be useful to describe the APIs in terms of the intended user model. For example, doesn't init() really mean enqueueAnAsyncCallback(). In the case of nextTick callbacks & promises, it doesn't correspond to an object being constructed, right?

This comment has been minimized.

@Fishrock123

Fishrock123 May 4, 2016
Member

enqueueAnAsyncCallback()

Pardon? This isn't actually where the callback gets called?

This comment has been minimized.

@mike-kaufman

mike-kaufman May 4, 2016

My understanding of the user model:

  • init(id, type, parentId) - invoked when an async callback as identified by id is eligible to be invoked. parentId is the async ID of the currently executing parent at the time init() is called.
  • pre(id) - invoked immediately before an async callback as identified by id is invoked
  • post(id, hasThrown) - invoked immediately after an async callback as identified by id has been invoked.
  • destroy(id) - invoked when an async callback as identified by id is no longer elligible to be called.

Above is brief wrt the "user model". It can be expanded on to provide more succinct definitions. E.g., an async callback is defined as a javascript function which is enqueued for execution at some later time. It is identified by a unique ID refered to as the async ID. It has the following lifecycle events: init() - ...

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

For example, doesn't init() really mean enqueueAnAsyncCallback(). In the case of nextTick callbacks & promises, it doesn't correspond to an object being constructed, right?

Yes it does. It always does. It's always constructed at the time of call. You're confusing init() with something like when a callback is passed to the instances event handler, e.g. on('data' (which is the JS abstraction of the handle and has nothing to do with how the handle operates). The callback that will be called by MakeCallback is always defined at construction time. That will then trigger any callbacks from the user that have been abstracted via the JS API.

This comment has been minimized.

@Fishrock123

Fishrock123 May 5, 2016
Member

Does .created() make more sense?

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

I used init as short for "initialized", which I believe well explains what's mechanically happening.

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

init() should be fine as a name but as @mike-kaufman and I have indicated a few times, this would be much clearer if it included a bit more detail about the lifecycle of a hook. That's not a criticism, just a suggestion.

// post() is called just after the handle's callback has finished, and will be
// called regardless whether the handle's callback threw. If the handle's
// callback did throw then hasThrown will be true.
function post(id, hasThrown) { }

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

If hasThrown is true, is there any mechanism here for determining what the error was? Is there any intent for this to be used in any kind of recovery scenario? If not, that should be made explicit

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 30, 2016
Member

Some notes that I asked trevor to outline should cover this, but you'd use process.on('unhandledException'), and you can register it safely in pre and cleanup in post without risking other code being caught.

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

It's also useful if you want to see that someone is using domains or uncaughtException to handle errors. Rather than allowing the application to crash.

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

Ok, but that's not quite what I asked. Inside post, I look and see that hasThrown is true. Then what? As a suggestion, when writing up the docs for this, it needs to be clear how this particular method relates to the actual handling of the error so that there's no confusion. People will see the hasThrown and will wonder.

This comment has been minimized.

@trevnorris

trevnorris Jun 21, 2016
Author Contributor

Ah, good point. Maybe instead we just pass the error object, if it occurred.

This comment has been minimized.

@rvagg

rvagg Jul 6, 2016
Member

is there resolution to this @trevnorris? seems to be a dangling artifact that needs to be fixed

This comment has been minimized.

@trevnorris

trevnorris Aug 9, 2016
Author Contributor

Going to remove it for now. Not a breaking change and can be added later in a minor version.

// Allow callbacks of this AsyncHook instance to fire. This is not an implicit
// action after running createHook(), and must be explicitly run to being
// executing callbacks.
asyncHook.enable();

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Does this return a value? Perhaps a ref to the hook so calls can be chained?

This comment has been minimized.

@mhdawson

mhdawson May 4, 2016
Member

Did you consider if we'd want to enable selectively. ie be able to enable only for a subset of the supported providers ?

This comment has been minimized.

@Fishrock123

Fishrock123 May 5, 2016
Member

Is there a benefit do doing that here rather than in your hook handlers?

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

@mhdawson That is coming in the future, but not spec'd for initial API. It would roughly look something like (mind you I'm changing API names on the fly, but you'll get what I mean):

const AsyncHook = require('async_wrap');
const pflags = AsyncHook.ProviderFlags;  // Providers by flag enum
const ah = new AsyncHook({ init, pre, post });
ah.filter(pflags.TCPWRAP | pflags.UDPWRAP | pflags.CRYPTO);
ah.enable();

The above will only trigger the callbacks for the three flags provided. There are a couple things ATM that need to be addressed:

  1. Currently stored in an enum, not by or-able flag. Easy fix. Though worried we may run out of room.
  2. Is there some type of propagation that should happen automatically under the hood? Otherwise the user is left with no stack traces, resource tracing, etc. Because of this I'm considering how useful this feature would be.

This comment has been minimized.

@mhdawson

mhdawson May 5, 2016
Member

Ok good to hear you have considered it and you have a good start at what it might look like.

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

Granted that you said that filtering would come in the future so just take this with a grain of salt for now but please just put the filter flags on the constructor ;-)

const ah = new AsyncHook({init,pre,post}, flags.TCPWRAP | flags.UDPWRAP);
lifetime of `AsyncWrap`. These callbacks will also be called to emulate the
lifetime of handles and requests that do not fit this model. For example,
`HTTPPARSER` instances are recycled to improve performance. So the `destroy()`
callback would be called manually after a connection is done using it, just

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Is there any reason to indicate that the handle is not actually being destroyed in cases such as HTTPPARSER? I can't think of any but want it to be clear.

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 30, 2016
Member

Probably not, actual actions being done are a lot more interesting & useful than handles that are idley waiting from exterior events.

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 30, 2016
Member

We could maybe add notes in the docs that some Providers actually use shared handles and these are then representative of individual events from those handles.

This comment has been minimized.

@jasnell

jasnell Apr 30, 2016
Member

I'd recommend it. My main concern would be someone either accidentally or unknowingly keeping one of the reusable handles around after destroy. Since the handle is not actually destroyed, you could end up with some odd, hard to debug edge cases. I'm not saying there's anything we should do other than simply document the fact.

This comment has been minimized.

@mhdawson

mhdawson May 2, 2016
Member

Would it be possible to make it so that even though the underlying resource is re-used, the same handle is not re-used ? That was avoid the problem James mentions above.

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

Is there any reason to indicate that the handle is not actually being destroyed in cases such as HTTPPARSER? I can't think of any but want it to be clear.

There's no reason to. The resource will be assigned a new id and treated like a newly constructed object. We're just saving the hit of allocation and instantiation.

I'd recommend it. My main concern would be someone either accidentally or unknowingly keeping one of the reusable handles around after destroy.

You're thinking of the JS handle. We don't care about that. Internally we remove the pointer to the class from the JS object so it can no longer be referenced.

Would it be possible to make it so that even though the underlying resource is re-used, the same handle is not re-used ?

resource == handle.

This comment has been minimized.

@mhdawson

mhdawson May 5, 2016
Member

Ok, I think what you said is what I had in mind at least to some extent. There will be a new id even if the underlying resource is reused.

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

Awesome, ok. Thank you.

This comment has been minimized.

@AndreasMadsen

AndreasMadsen May 10, 2016
Member

@trevnorris to be clear, the init hook will also be called manually, when the handle is reused?

This comment has been minimized.

@trevnorris

trevnorris Jun 21, 2016
Author Contributor

@AndreasMadsen Sorry, didn't see this. Yes, it will be manually called. When a resource is brought out of the pool it will be treated as if it were a new resource. All we're basically omitting is the memory allocation and class instantiation cost.

asynchronous event. This does mean the instance _will_ trigger a
`pre()`/`post()` event. Only that the possibility exists.

The `this` of each call is that of the object being constructed. While core

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Recommend repeating here the exception for destroy

that asynchronous branch.

#### `asyncHook.scope()`

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

An example illustrating this would be helpful

// Pretend init, pre, post are all defined
const asyncHook = async_wrap.createHook({ init, pre, post });
asyncHook.enable();

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

If enable() returned the hook these could be chained

List of all providers that may trigger the `init` callback.

### Constructor: `AsyncHook`

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

API style nit: perhaps export either the Constructor or createHook function as the default export of the async_wrap module the way we do with events?

E.g. const createHook = require('async_wrap'), but based on whichever is expected to be the most used construction method.

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

Maybe renaming the require'able to async_hook and have AsyncHook = require('async_hook') then can do new AsyncHook(...).

This comment has been minimized.


#### `async_wrap.Providers`

List of all providers that may trigger the `init` callback.

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

What data type? Array of strings? Array of objects?

const net = require('net');
// Pretend init, pre, post are all defined
const asyncHook = async_wrap.createHook({ init, pre, post });

This comment has been minimized.

@jasnell

jasnell Apr 29, 2016
Member

Is there a practical limit to how many of these should be created per process? What is the performance impact of creating too many of these?

This comment has been minimized.

@Fishrock123

Fishrock123 Apr 30, 2016
Member

As far as I know, almost entirely dependant on the work done from the hook than the hook itself.

@indutny
Copy link
Member

@indutny indutny commented May 2, 2016

LGTM, except the mentioned nits.

@mhdawson
Copy link
Member

@mhdawson mhdawson commented May 3, 2016

@trevnorris any chance to you have a branch somewhere with this API implemented that I could checkout and experiment with ?

@trevnorris
Copy link
Contributor Author

@trevnorris trevnorris commented May 3, 2016

@mhdawson Not completely. While writing this a few tweaks were added for API consistency. Most of it is implemented in process.binding('async_wrap'), but scope() and support for multiple listeners (i.e. it doesn't return a new instance when called) aren't there. Easiest way to see it in action is look at test-async-wrap*.


Since its initial introduction with the `AsyncListener` API, `AsyncWrap` has
slowly evolved to ensure a generalized API that would serve as a solid base for
module authors who wished to add events to the life cycle of the event loop.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

I think it would be worthwhile to describe these life-cycle events at a high level. That is, describe the user model for this API. A lot of what I read below is tied up with implementation details around the native handles. I think most users are concerned about when a callback was enqueued, when that callback started execution, and when that callback completed execution.

This comment has been minimized.

@trevnorris

trevnorris May 4, 2016
Author Contributor

the high level of node's event loop is already covered in detail in the-event-loop-timers-and-nexttick.md. i can reference that in the EP. also remember this isn't the actual docs entry. it's meant to be a high level overview of the change.

A lot of what I read below is tied up with implementation details around the native handles.

async wrap has always been partially tied to implementation details. that's the reason for its existence, and attempting to abstract that would remove the utility from the user. for example I was doing testing between 4 and 6 and realized that the order of events fires differently for certain operations. this is exactly the type of thing I wanted to know. it helps me debug my code. if we wanted all this abstracted away then we could simply wrap all the js calls. that's not the purpose for async wrap.

I think most users are concerned about when a callback was enqueued

It doesn't work like this. An oncomplete or similar is assigned to the newly created request upon construction. Hence why I chose to notify on construction of the instance

I think what you want is when the I/O request itself was created, which I believe the actual request instance to perform I/O is always created when requested by the user. So init() will handle that. the implementation is also simpler than injecting at all I/O calls. e.g. uv_write(), and since they're run at approximately the same time any counter measurements should be sufficiently correct.

Then pre() will be called when the async operation is complete, and post() will be called when the callback has completed execution. Covering your final points, and both of these are documented in the Hook Callbacks section.

As far as measuring fire-to-completion, it's impossible to do reliably in node. Best we can give is when node knows the request completed, but since all completed requests are queued by the kernel until node asks for them it's impossible to get perfect measurements. For example if 100 requests complete at the same time and each callback runs for 10 ms, we wouldn't know the last request completed until 1 second after it actually did. This is fundamentally part of node and not something we can do much about. Users will be able to track these details, and referring back to exposing implementation details of node this is exactly why it's important and not something that can be abstracted in a way to be useful.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

Thanks Trevor for the link to the doc on the event loop - that's really helpful for me. If the goal of the API is to "add events to the life cycle of the event loop", then why isn't the event loop's phase exposed through API calls? Now, I'm not saying that it should be part of the API, but given stated goal, it isn't clear to why this is omitted.

Which is one of the things I'm driving at: Clarification/simplification around the goals API's goals will help get everyone on the same page around the utility of the API & how it is to be used. Ideally, this would include some definitions, goals/non-goals and some canonical use cases/examples.

async wrap has always been partially tied to implementation details. that's the reason for its existence, and attempting to abstract that would remove the utility from the user.

Can you expand on your example here? I'm not following how the description of the API impacts the ability to understand event ordering.

if we wanted all this abstracted away then we could simply wrap all the js calls. that's not the purpose for async wrap.

Again, precisely what I'm driving for - what is the purpose of the API? My understanding is the goal is to expose lifecycle events around async code execution. If not, then that's fine. If the goal is to provide lifecycle events around handles, then great, but it begs the question of "what is a handle and why does the user care?" (per @jasnell's comment).

It doesn't work like this. An oncomplete or similar is assigned to the newly created request upon construction. Hence why I chose to notify on construction of the instance

Sorry, I'm not following this.

I think what you want is...

I'm not following. We may be talking past each other with terminology. What I need is a notification model around async code execution in node. That is, I want to know precisely the following:

  • when is a callback c "made available" for invocation, and what is the parent invocation "making it available"?
  • when does a callback c begin invocation?
  • when does a callback c end invocation?
  • when is a callback c no longer available for invocation (arguably optional, but let's stick with it for now as it cleanly maps to proposed destroy() call).

As far as measuring fire-to-completion, it's impossible to do reliably in node.

I think this is OK, but again, it will help to explicitly list the goals & non-goals of the API. At a minimum it will get everyone reading this on the same page.

I'm happy to take a stab at writing up what I think is a user model for the API, and something that fits cleanly into current async wrap implementation. Likely not 100% correct, but at a minimum useful to highlight differences in understanding. Let me know if people think this is a useful effort.

This comment has been minimized.

@Fishrock123

Fishrock123 May 5, 2016
Member

but it begs the question of "what is a handle and why does the user care?" (per @jasnell's comment).

I think this has already been answered but if you don't know it may be more helpful to dig though node core while we're working out a potential API for it. We shouldn't expect 100% of things to be documented for end users in an EP. (see the other EPs)

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

I've been through the code. A clear definition of a "handle" and how it relates to the goals of the API aren't in the EP. Again, I'm happy to write this up and iterate on it.

This comment has been minimized.

@Fishrock123

Fishrock123 May 6, 2016
Member

I think that is probably out of scope. This exposes existing mechanics in a more usable way.

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

I don't think telling someone to just go dig through node core while we're working out a potential API for it is really all that helpful (or friendly). @mike-kaufman appears to be making the point that while this new API appears useful it's not clear given the description in this eps exactly how it would be used and what benefit it brings. Several examples and a description of the lifecycle of a hook would make that much clearer. No one is asking for 100% of things to be documented for end users. What is being asked for is a bit more clarity... and I think a bit more patience would likely be a good thing also.

This comment has been minimized.

@Fishrock123

Fishrock123 May 6, 2016
Member

I still think defining exactly what a handle is, is out of scope here.

// pre() is called just before the handle's callback is called. It can be
// called 0-N times for handles (e.g. TCPWRAP), and should be called exactly 1
// time for requests (e.g. FSREQWRAP).
function pre(id) { }

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

for debug purposes, I suggest passing along the type to all callbacks. There were some cases with current APIs where I wasn't observing init() getting called for some IDs, but was seeing pre()/post(). Having the type in pre()/post)() would help to debug situations like this.

This comment has been minimized.

@jasnell

jasnell May 3, 2016
Member

Agree. I can imagine that in some cases (such as logging) I may not want to have to maintain state to know what type of thing id references.

This comment has been minimized.

@Fishrock123

Fishrock123 May 4, 2016
Member

Might be easier to attach type to the handle?

This comment has been minimized.

@mike-kaufman

mike-kaufman May 4, 2016

Might be easier to attach type to the handle?

Not clear how that will work for the destroy() case where there is no handle. It may be worthwhile defining an async context type that can be passed to these events. Such a type can evolve to include appropriate information such as the type, handle-specific state, etc...

This comment has been minimized.

@mike-kaufman

mike-kaufman May 4, 2016

The other useful bit of information is the invocation count. i.e., in cases where pre()/post() can be called 0..n times, what will uniquely identify an invocation? I agree that this can be tracked by individually by the hooks, but it is part of the identity of an invocation of a callback. (i.e., an execution of a callback is uniquely identifed by the tuple [asyncId, invocationId])

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

Sure, let's use "handle". How would you define "handle"? And what term would you use for an invocation of the callback associated with the "handle" and how would you define this?

I'm not claiming the invocation_id is necessary to keep internally. I am claiming that the invocation_id uniquely identifies a specific invocation of a method associated with a "handle".

This comment has been minimized.

@Fishrock123

Fishrock123 May 5, 2016
Member

@mike-kaufman Just make a Stacktrace?

How would you define "handle"?

C++ I/O handler

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

Just make a Stacktrace?

Sorry, I'm not following your suggestion.

C++ I/O handler

That's not the case for promises or nextTick() timers. Are there other outliers not represented by an c++ IO handler? Moreover, if the goal of the API is to provide "lifecycle events around async code execution", defining a "handle" as a C++ I/O handler doesn't relate to the goals of the API.

This comment has been minimized.

@Fishrock123

Fishrock123 May 6, 2016
Member

Sorry, I'm not following your suggestion.

Literally just check a strack trace, Besides, that is the only way to know what you are asking for.

That's not the case for promises or nextTick() timers.

Fine. "I/O Handler". Next.

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

Fine. "I/O Handler". Next.

Again, this isn't helpful. @Fishrock123 you have the advantage of being far more familiar with the Node.js internals and therefore make certain assumptions about what should just be "known". The questions that @mike-kaufman is asking are not unreasonable and are the kinds of questions that anyone trying to use this API would be asking. Being flippant doesn't address the concern. Can I ask you to please be more patient.

function init(id, type, parentId) { }
// pre() is called just before the handle's callback is called. It can be
// called 0-N times for handles (e.g. TCPWRAP), and should be called exactly 1

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

Also, I don't understand the distinction between "handles" and "requests" and why this impacts the expected number of calls. This is an example of where the implementation details are bleeding through the API.

This comment has been minimized.

@Fishrock123

Fishrock123 May 4, 2016
Member

Which will happen regardless due to how close this API is to tracking what actually happens... which is the point.

Some clarification could perhaps be used though.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 4, 2016

Which will happen regardless due to how close this API is to tracking what actually happens... which is the point.

Is this what actually happens when you consider promises & next tick callbacks? In these cases, there is no "handle", right? There is no "object" being constructed during init().

My point is, if the goal of the API is to provide life cycle events around async code execution, then the API should be described in terms of life cycle events of async code execution. I believe that getting to a crisp definition of these events will help ensure API consistency across different types of handles, and help drive correct semantics for next tick & promise callbacks.

This comment has been minimized.

@chrisdickinson

chrisdickinson May 4, 2016

With promises there is a handle (or maybe a "request" is a better term), in the form of the microtask queue entry — which does not correspond to the promise, but to a call to resolve/reject() (if there are downstream promises dependent on an upstream promise-yet-to-be-settled) or .then (if the upstream promise is already resolved, but a new dependent promise is introduced.)

This comment has been minimized.

@trevnorris

trevnorris May 4, 2016
Author Contributor

It's not an implementation detail. It's a concrete detail in how node works. handles do work via requests. e.g. when a TCP server handle receives a new connection, it receives a new connection request. which is then converted into a handle that can receive requests like new data packet. this has an important distinction when wanting to track metrics and resources.

and just because an object isn't an instance of ReqWrap doesn't mean it doesn't fit in this model. hypothetically all async activity could be made a subclass of HandleWrap or ReqWrap, but we don't for performance reasons.

There is no "object" being constructed during init().

TickObject() is constructed for every call to nextTick(). Promise has an internal construction mechanism. Promises construct a new object. The ECMA spec shows that each new promise has several internal fields set for each newly constructed instance.

But technically they would each be a request. They only trigger once (Promises will fire on multiple provided resolve/reject callbacks, but only once) and they themselves will never hold references to another request.

This comment has been minimized.

@chrisdickinson

chrisdickinson May 6, 2016

But technically they would each be a request. They only trigger once (Promises will fire on multiple provided resolve/reject callbacks, but only once) and they themselves will never hold references to another request.

Just to be sure we're all on the same page about promises-as-an-async-resource: the asynchronous component of a promise is not tied to Promise object creation, but rather through the addition of a handler (for a resolved settled promise) or the resolution of a value (for a pending promise with existing handlers.) The asynchronous resource is the microtask — not the promise. In node handle parlance, a promise is a factory for creating requests, not a request in and of itself.

// createHook() returns an instance of the AsyncHook constructor that can
// control how those hooks are used. For example, enabling or disabling their
// execution.
asyncHook = async_wrap.createHook({ init, pre, post, destroy });

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

I would expect a corresponding unhook() call to deregister a set of callbacks.

This comment has been minimized.

@jasnell

jasnell May 3, 2016
Member

Yes, I was wondering about this also. It appears that with this API there is no way to explicitly remove a hook. Clear documentation about the lifecycle of a hook would be fantastic.

This comment has been minimized.

@Fishrock123

Fishrock123 May 4, 2016
Member

Maybe better as new Hook()? As far as I understand it doesn't actually attach until you enable() it, so the counter is disable().

This comment has been minimized.

@mike-kaufman

mike-kaufman May 4, 2016

My understanding is adding a "hook" instance will put it in the list of async hooks. On each callback, this list needs to be iterated, and for each hook, test if enabled & if enabled, then invoke the hook. So, a remove() call is different from disable(); it explicitly removes the hook from the set of async hooks. Let me know if my understanding is incorrect.

This comment has been minimized.

@trevnorris

trevnorris May 4, 2016
Author Contributor

disable() is a flag that toggles which set of callbacks need to be called for that instance. so if there's only one instance and it's disabled then the normal checks won't anything needs to be called.

since even after a hook has been disabled, it will continue to receive notifications on async paths already taken (100% intentional design). if you want an api that removes even those calls then that comes with the caveat that resource cleanup is then on the user I can think of a way to make that possible.

Maybe better as new Hook()?

I could live with that.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

If I understand correctly, desired behavior is that if init() fires, then all callbacks fire for that handle instance. An explicit remove() will break this assertion.

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

so basically running hooks.remove() means that those callbacks will never fire again. even if they've propagated though other asynchronous there are a couple technical difficulties there that'll have to be worked out, but probably possible to implement efficiently.

// Enable hooks for the current synchronous execution scope. This will ensure
// the hooks are not in effect in case of multiple returns, or if an exception
// is thrown.
asyncHook.scope();

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

I'm not following the use of the scope() function. Can you elaborate on this?

This comment has been minimized.

@trevnorris

trevnorris May 4, 2016
Author Contributor

Running the following:

hook.enable();
net.createServer(fn).listen(8080);
hook.disable();

will allow hook's callback to continue firing on all callbacks that are triggered by the new Server instance. Though this can be error prone if there are multiple return locations. e.g.

setTimeout(() => {
  hooks.enable();
  if (checkState())
    return;
  fs.open(path, r, fn);
  hooks.disable();
}, 100);

Because of the early return and not first running hooks.disable() that set of hooks could remain enabled indefinitely. So hooks.scope() would guarantee that they are disabled when the current stack unwinded. e.g.

setTimeout(() => {
  hooks.scope();
  if (checkState())
    return;
  fs.open(path, r, fn);
}, 100);

I'll probably end up needing to place in massive bold letters that it lasts until the current stack has unwound, or else users will think it's function scope specific. Like const or let.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

Someone could achieve analogous functionality with a try/finally block, right?

This comment has been minimized.

@trevnorris

trevnorris May 5, 2016
Author Contributor

@mike-kaufman I don't see how those two address the same point.

This comment has been minimized.

@mike-kaufman

mike-kaufman May 5, 2016

If I understand scope() properly, this

() => {
  hooks.scope();
  if (checkState())
    return;
  fs.open(path, r, fn);
}

is equivalent to this:

() => {
  try {
    hooks.enable();
    if (checkState())
      return;
    fs.open(path, r, fn);
  }
  finally {
    hooks.disable();
  }
}

This comment has been minimized.

@jasnell

jasnell May 6, 2016
Member

Based on the description that would be my understanding as well @mike-kaufman ... only that try/finally bring along it's own additional cost.

At the risk of repainting the bike shed, my key concern with scope() is with the name -- it's not clear at all and @trevnorris has already indicated that there could be some confusion around what scope it's actually referring to. Not sure if I have a better suggestion tho. I know it's ugly (and don't have a better suggestion) but something like hook.forThisStack() would be clearer but admittedly is a horrible name.

This comment has been minimized.

@trevnorris

trevnorris Jun 21, 2016
Author Contributor

@jasnell What about .syncScope() or .stackScoped()?

This comment has been minimized.

@rvagg

rvagg Jul 6, 2016
Member

I like .syncScope() for this, it adds a lot more clarity to what it's trying to do


Execution of the `nextTickQueue` and `MicrotaskQueue` are slightly special
asynchronous cases. Because they execute in the same `MakeCallback()` as the
asynchronous callback they will ultimately end up with the same parent as the

This comment has been minimized.

@mike-kaufman

mike-kaufman May 3, 2016

Because they execute in the same MakeCallback() as the asynchronous callback they will ultimately end up with the same parent as the originating call

IMO this needs to be addressed. Specifically, nextTick() & promises need to maintain correct semantics wrt their async parent and wrt the callbacks that are being fired.

@mhdawson
Copy link
Member

@mhdawson mhdawson commented May 4, 2016

My biggest question after reading this and doing some tests with process.binding('async_wrap'); is:

don't we have to define what you can/cannot assume in terms of the object passed as 'this' ?

I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case. If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future. Key would be to document what will or won't change across releases in terms of what you get for 'this' and the shape of those objects.

As a concrete example for:

crypto.randomBytes(): this is -> InternalFieldObject {
  ondone:
   { [Function]
     [length]: 0,
     [name]: '',
     [arguments]: null,
     [caller]: null,
     [prototype]: { [constructor]: [Circular] } } }
crypto.pbkdf2() this is -> InternalFieldObject { ondone: { [Function] [length]: 2, [name]: '' } }

and its not clear how I figure out from what's being passed to the hooks how you figure out which specific hook triggered the callback.

Maybe more than is currently encoded in the private api is going to be encoded into type in the init call, currently it just looks like the provider. If the type will help to identify the specific callback per provider then defining what will be in type and the values would help.

@mike-kaufman
Copy link

@mike-kaufman mike-kaufman commented May 4, 2016

Per @mhdawson's comments, are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this.

@trevnorris
Copy link
Contributor Author

@trevnorris trevnorris commented May 4, 2016

I'm thinking there needs to be a mapping between the providers, their callbacks and what object you can expect 'this' to be in each case.

I'm fine with the idea that each constructor receives its own unique provider id. Not sure what you mean by "their callbacks", and the expected this would be an instance of the specified provider.

If that's true then I start to wonder about how much of the internals we'll be exposing and how that might constrain what we change in the future.

All of this is already exposed via ._handle on pretty much everything. For most (maybe all) handles you can access the user's constructed instance via this.owner. Reason I'm not passing that by default is because it doesn't always exist, and always passing the handle attached to the C++ class instance is more consistent.

As for what we can change, there's no guarantee what fields will be available. Part of the initial concept of this API was users who wanted to know what node was doing. Not have just another abstracted API, that could be done easily enough in another way. I'm sure there'll be disagreement about what we should be able to rely on once a branch has reached stable, but that was never part of the initial design plan. It's basically "accessing this is equivalent to accessing _handle, and as such there's no guarantee to what fields are available". The one possible exception is that .owner is always made available so if it exists then users can get access to the JS object instance.

Re: InternalFieldObject That can be easily enough changed so every constructor has their own provider id. As explained above.

are there specific reasons why the actual handle object is being passed to the hooks? I am also concerned about the level of internal details being exposed here. Would be nice to understand the use cases for this.

Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically. Ideally in the future there could be a basic set of calls that could be standardized (e.g. .providerType()), and this isn't information that isn't now available. e.g.

'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
var handle;
async_wrap.setupHooks({ init() { handle = this } });
async_wrap.enable();
var server = require('net').createServer().listen(8080);
print(server._handle === handle);
server.close();
// output: true

I use it for debugging as well. With the understanding that things change, that's part of its utility. By addition of the id, explicitly passing the provider, etc. we're not forcing use of the handle on anyone. Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful.

@mike-kaufman
Copy link

@mike-kaufman mike-kaufman commented May 5, 2016

Some users want to store information directly on the handle. Despite the id each has, it's the easiest way to propagate information and allow the GC to clean it up automatically.

Providing storage for the async context is different than exposing the handle though.

I use it for debugging as well.

IMO, I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly. It still meets the criteria of providing arbitrary storage associated with the "handle", it provides a place to define a common interface across handles, and it can evolve independently of the underlying handle.

Simply making it available in a way that makes sense for the context of the call, and in a way that users like APMs will find very useful.

I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this?

@trevnorris
Copy link
Contributor Author

@trevnorris trevnorris commented May 5, 2016

Providing storage for the async context is different than exposing the handle though.

Creating and tracking a new async context for every handle, and tracking it, is expensive. By attaching properties directly to the handle instance GC will take care of it all automatically, and at the least expense.

I think providing a context object which has a consistent shape & properties like provider type and handle is a cleaner API than passing the handle directly.

This can be, or at least should be, construct-able by the user. Creating all these new objects filled with properties is expensive, and you're missing that printing the actual contents of the handle is useful. And I don't share the concern about possibly needing to standardize properties in the handle and making it difficult for node to move forward. I've been aiming for a more standardized lower-level API, and "hiding" properties on an object in a significant way has become easier with ES6. But this is a separate topic.

I'm still not following how APMs will utilize the handle. Is there specfiic data on the handle that is useful? If so, what is this?

Here's a really basic example script that should explain how useful it is to be able to see the handles themselves while debugging:

'use strict';
const async_wrap = process.binding('async_wrap');
const print = process._rawDebug;
const ctx_array = [];

async_wrap.setupHooks({
  init() { /*print(this)*/ },
  pre() {
    if (ctx_array.indexOf(this) === -1) {
      ctx_array.push(this);
      print(this);
    }
  },
});
async_wrap.enable();

process.on('exit', () => print(ctx_array.length));

require('net').createServer(function(c) {
  require('fs').readFile(__filename, () => {
    c.end(new Buffer(1024 * 1024 * 100).fill('a'));
    this.close();
  });
}).listen(8080);

require('net').connect(8080, function() { this.resume() });

In there you'll see a WriteWrap which encapsulates the writing of the data from server to client, and gives access to the buffer being written. Useful for inspecting all TCP packets going through the server. Also the GetAddrInfoReqWrap which indicates there was a dns lookup for a host. Which is available under hostname. Or the TCPConnectWrap which gives you information about the remote server attempting to connect. Or the ShutdownWrap that alerts us that the connection is closing. The FSReqWrap is useful that we can see the contents of the file that's been read in, and even the position of the file that was read.

I hope this demonstrates the utility for being able to analyse each handle. All of the things mentioned in the previous paragraph cannot be obtained any other way. Removing the ability to see the handle would be a blow to the API, and basically be one step towards moving it to nothing more than a continuation-local-storage API.

@mhdawson
Copy link
Member

@mhdawson mhdawson commented May 6, 2016

@trevnorris what I was referring to in respect to "their callbacks" was that for the CRYPTO provider there are multiple cases were callbacks are wrapped such that they pre, post methods are invoked (as in my example). I think your comment about making each of these have their own provider id addresses that question.

I terms of the discussion about visibility of the handles, from what you describe we should document both in this eps what it's ok/not ok to use the handlers for and what expectations are. For example:

  • it is ok to store data in the handle by adding fields, but it is your responsibility to ensure that the namespace is unique enough that the names will not collide with any additions made in future Node.js versions
  • you may choose to inspect the contents of the handle, however, these are not part of the public API and will change between releases.
  • The list of providers may change from release to release, it is up to your code to handle any additions/deletions in a graceful manner.

If we believe that documenting a list like this is enough protection from being boxed in later when users of the API are broken by later Node.js releases and complain, then passing the handles would be fine. If we were concerned that despite the warnings we'd still be trying to avoid breakage passing some other field from wrapper could make sense.

@jasnell
Copy link
Member

@jasnell jasnell commented May 6, 2016

@trevnorris a couple more clarifying questions ...

Let's say I create a hook and some dependency module I'm using creates a hook... when those are called, are they passed the same id and handle, different id's same handle, same id's different handle or different ids and different handles? (and by handle here I mean the js object that wraps the actual handle). The main reason I ask is that if I'm attaching additional context to the handle, it would be helpful to also know that other hooks could be attaching their own context to the same handle.

I'm still wondering about the potential cost of creating too many of these which is why I think describing the specific lifecycle from when a hook is created to when it is destroyed would be very helpful. While I understand that you've designed and implemented this to be as low impact on performance as possible, there is a non-zero cost to calling these hooks. Have you had the opportunity yet to benchmark an upper limit to the number of hooks that can be created without having a serious impact on performance? My key concern with this is that an app developer may not have any idea that dependency modules they may be using could be going out and creating hooks. Depending on how many such dependencies they have, they could end up seeing degraded performance without any clear indication as to why since installing the hook appears to be a completely transparent operation (that is, there's no indication that a new hook was created).

@Trott
Copy link
Member

@Trott Trott commented Sep 21, 2016

@Fishrock123

9 votes in favor: @indutny @Trott @trevnorris @rvagg @evanlucas @ChALkeR @Fishrock123 @mscdex @cjihrig

0 votes against:

3 abstentions: @misterdjules @addaleax @thealphanerd

6 members who did not cast a vote and did not indicate they were abstaining: @chrisdickinson @shigeki @bnoordhuis @mhdawson @ofrobots @jasnell

@Qard
Copy link
Member

@Qard Qard commented Sep 21, 2016

@trevnorris Thanks for the deep explanation.

My hope with the create/queue event split was that CLS could link then() and on() callbacks to where they were attached. Without support for that, CLS will need to continue to monkey-patch a bunch of things, which probably has quite a bit more performance impact than if async_wrap was able to provide the appropriate hooks.

In its current state, async_wrap only solves part of the problem of async transaction tracing. You can get the code path the internals took, but you can't really get the more contextually useful path of how it got through the users code.

@Jeff-Lewis
Copy link

@Jeff-Lewis Jeff-Lewis commented Sep 22, 2016

@trevnorris By the way, thank you for all your work in Node and async_wrap.

As far as the asynchronous execution stack is concerned, calls to .then() aren't important. What's important is when the callback passed to .then() is called.

Is this how the current unofficial async_wrap works? I'm sorry I'm having trouble following the changes in behavior b/w this EPS and the current released async_wrap.

The real hope for myself and I think many others using CLS, is if this EPS version eliminates or help reduces the monkey-patching needed in order to have reliable CLS in node?

I might be off-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in? For the non-embedded folks, hardware and scaling is only getting cheaper.

@trevnorris
Copy link
Contributor Author

@trevnorris trevnorris commented Sep 22, 2016

@Jeff-Lewis

Is this how the current unofficial async_wrap works?

The current implementation is a hybrid. There were a lot of requests to allow multiple hook instances. Instead of having a single global set. This required propagating the state of hooks for each execution stack.

The real hope for myself and I think many others using CLS, is if this EPS version eliminate or help reduce the monkey-patching needed in order to have reliable CLS in node?

When it comes to patching .then(), with the impending async/await syntax no amount of monkey patching will help. We'll have to depend on V8 API.

I might be of-base, but it sounds like it won't b/c of its impact on performance? I'd give up a 💩 ton of performance to have reliable CLS. Can we make it opt-in?

I put in many hours in an attempt to make this functionality opt-in, but the problem is a certain amount of state still needs to be maintained and propagated because a hook may be enabled at any time. In my PR nodejs/node#8531 I purposely left the commits intact so anyone could view the effort to keep the old functionality in.

@Fishrock123
Copy link
Member

@Fishrock123 Fishrock123 commented Sep 26, 2016

but you can't really get the more contextually useful path of how it got through the users code.

@Qard I think @trevnorris is going to be making parentId in init be the conceptual parent, and getCurrentId within init be the technical parent. Does that solve this issue?

@Qard
Copy link
Member

@Qard Qard commented Sep 26, 2016

Maybe. I'll have to dig into it at some point to see how it works. I don't work in APM anymore, so I haven't been paying super close attention to this stuff lately. 😕

@@ -286,9 +308,9 @@ const async_hooks = require('async_hooks');
const net = require('net');
asyn_hooks.createHook({

This comment has been minimized.