Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulletproofing the async requirement #139

Closed
domenic opened this issue Aug 3, 2013 · 17 comments · Fixed by #140
Closed

Bulletproofing the async requirement #139

domenic opened this issue Aug 3, 2013 · 17 comments · Fixed by #140
Milestone

Comments

@domenic
Copy link
Member

domenic commented Aug 3, 2013

Previous discussions in #84 (comment), #70, and #100.

Requirements

All of these must call a, then b, then c.

// (Basic)
var p = new Promise(resolve => resolve())
a();
p.then(c);
b();
// (Resolve-Before-Then)
var resolveP;
var p = new Promise(resolve => resolveP = resolve);

a();
resolveP();
p.then(c);
b();
// (Resolve-After-Then)
var resolveP;
var p = new Promise(resolve => resolveP = resolve);

a();
p.then(c);
resolveP();
b();
// (Then-Inside-Then)
fulfilledP.then(function a() {
    fulfilledP.then(c);
    b();
});

This should be allowed to call a then b then c:

// (Resolve-Inside-Async)
var p = new Promise(resolve => setImmediate(() => {
  a();
  resolve();
  c();
}));

p.then(b);

Naive implementations would probably do a then c then b, but because we are inside a setImmediate, it is at least one turn after p.then(b) occured, so it should be OK for resolve() to call b synchronously, in between a and c.

The Situation

Currently, the spec's wording is:

then must return before onFulfilled or onRejected is called

This is correct for (Basic), (Resolve-Before-Then), (Then-Inside-Then), and (Resolve-Inside-Async), but incorrect for (Resolve-After-Then).

The Goal

Can anyone figure out phrasing that will be correct for all cases?

@erights
Copy link

erights commented Aug 3, 2013

First, my apologies for letting my attention to this issue slide. As it became a long thread, I kept thinking I needed to catch up and then contribute. I never found the time to do that. So I'll take this opportunity to respond to this reboot even though I lack some of the previous context.

My argument of course is against "This should be allowed to call a then b then c:" Under normal event loop assumptions, the following is a perfectly sensible bit of defensive code, where state is encapsulated and api is exposed to untrusted clients:

var state = ....;
function api(p, callback) {
    // change state in a way that mildly suspends invariants
    try {
        callback(); // may recursively enter api
    } finally {
        // restore state's mildly suspended invariants
    }
    Q(p).then(_ => {
        // do something with state, relying on all invariants holding
    });
}

What I mean here by "mildly" is that api on entry assumes that state's invariants may already be mildly suspended, i.e., the suspension of invariants is carefully designed to tolerate recursive entry at the callback() point. Otherwise it would not be safe to call callback synchronously. This might seem a bit contrived, but I know I've done exactly this and can probably dig up some examples.

The second part of our api method does a different defensive pattern. It doesn't know what p is, but it knows that Q(p) is a genuine promise, and so it knows that Q(p).then will call its argument function in a fresh turn, where all of state's invariants have been restored. Under the looser requirements proposed above, this assumption can be violated as follows:

var r;
var p = new Promise(resolve => r = resolve);
api(p, () => setImmediate(() => api(null, () => r())));

This program does not even try to attack by recursive entry, since api was designed to defend against that. The first call to api suspends and restores state's invariants no problem. It also schedules something1 to happen later, after p is allegedly settled (i.e., after Q(p) is settled), where that something1 relies on state's invariants being intact.

The second call to api happens during a later turn. This suspends state's invariants again and then calls the callback. This callback fulfills p, causing something1 to potentially run now, while state's invariants are suspended, violating something1's assumptions. Then the callback returns, restoring the invariants and scheduling something2 to happen in a later turn. In that later turn, something2 runs fine. The problem was exactly that something1 happened while api was on the stack with suspended state invariants.

@bergus
Copy link

bergus commented Aug 3, 2013

I'd like to repeat my point from #104 that the behavior we're argumenting about relies heavily on the resolver spec. I'll extend my proposal to specify the behavior of resolve, though I guess those lines should be part of the resolvers-spec only.

  • The phrasing for then callbacks changes but the meaning is the same as the current " not before then returns ":

    onFulfilled or onRejected must not be called in a descendant execution context of the call to then

  • For resolve (and implicitly the same applies to reject respectively) we can choose from (or specify all of) three levels of "asynchronity strictness":

    (0) The onFulfilled callbacks installed via then must not be called in a descendant execution context of the call to resolve
    (1) The onFulfilled callbacks installed via then must not be called in a descendant execution context of the call to resolve unless they were installed in a previous turn of the event loop
    (2) The onFulfilled callbacks installed via then may be called synchronously from the execution context of the call to resolve

A library may follow the spec to a certain level of strictness, or it might provide options or different resolver functions for less strict behaviour than (0).

@erights
Copy link

erights commented Aug 3, 2013

Wouldn't all three of these options still be vulnerable to this attack?

@bergus
Copy link

bergus commented Aug 3, 2013

@erights: Only if Q(p) returns a non-level-0 promise so that the callback timing could be controlled by the attacker. If it was level 0, then the attacker could resolve (the old) p during the callback (when the state is polluted) but the then-callback would need to be invoked outside resolve - after the try-finally-statement has finished.

OK, maybe my wording for levels 0 and 1 needs improvement to forbid something like

deferred.resolve(); // lvl 0 compliant, not calling the callbacks from `resolve`
deferred.andDoItNow(); // but some other library invocation in the same user context calls them

@erights
Copy link

erights commented Aug 3, 2013

How about

(-1) The onFulfilled callbacks installed via then must not be called in a descendant execution context.

?

@domenic
Copy link
Member Author

domenic commented Aug 4, 2013

@bergus: I'd like to clear this once and for all. We cannot leave this to a hypothetical resolvers spec. It is imperative that, when someone receives a Promises/A+ promise, they have predictable behavior in case (Resolve-After-Then) and similar. Promises/A+ specifies many things about the behavior of resolution, even if it doesn't specify what exact syntax is used for performing resolution. In particular, it specifies all things about the behavior of the then method and the onFulfilled/onRejected handlers relative to resolution. So we are not going to confine this to a companion spec.

@erights: thanks for the clear example. This may sink (Resolve-Inside-Async), and thus make our job a lot easier in terms of what to specify. But it would really be a shame, because it means promise libraries will always be behind callback libraries in performance. For example, code such as

function readFile(fileName) {
    return new Promise((resolve, reject) => {
        fs.readFile(fileName, (err, result) => {
            if (err) {
                reject(err);
            } else {
                resolve(result);
            }
        });
    });
}

readFile("src.txt").then(...)

will always be penalized by an extra tick in comparison to simply calling fs.readFile(fileName, (err, result) => ...), since we must wait until the end of the turn to invoke the registered callbacks instead of invoking them immediately upon calling resolve(result) or reject(err).

Can you think of any way around this? Or do we just have to swallow our pride and have promises always be slower than callbacks, in exchange for avoiding that attack you mention?

@erights
Copy link

erights commented Aug 4, 2013

@domenic Looking at your example makes even clearer to me that we need to swallow. When

// ... stuff
readFile("src.txt").then(...)
// ... stuff

is called from deep within a turn, it is important for the programmer to know that the .then callbacks will only be called later, in their own turn. If the programmer actually wants to waive this temporal isolation, it is good that they'd have to write

fs.readFile(fileName, (err, result) => ...)

explicitly, so that this absence of temporal isolation is clear.

@domenic
Copy link
Member Author

domenic commented Aug 4, 2013

Hmm, I think I may have miscommunicated. Because the underlying fs.readFile operation always involves disk IO, there is absolutely no chance that when doing fs.readFile(fileName, cb), cb will be called in this turn. Thus when doing readFile("src.txt").then(...), we have a guarantee that the associated promise's resolve or reject handlers will never be called in the same turn, and thus the .then callbacks will be in their own turn as well.

It's the additional turn, beyond the ones already passed by going out to the hard disk, that I would prefer to avoid if we could.

@erights
Copy link

erights commented Aug 4, 2013

@domenic As for the performance cost, keep in mind that this will all quickly turn into mechanism directly supported by the browsers, and soon after, by JS engines. This won't make it efficient in the short term but will in the longer term.

@erights
Copy link

erights commented Aug 4, 2013

@domenic I see. Actually, in context your example is clear and I jumped to the wrong conclusion. Thanks for the clarification.

Nevertheless, I don't think this extra tick is avoidable. I still think we need to swallow. (Once promises move into the JS engine, I have some ideas about how such things might be optimized, but I'll leave those speculations for another day.)

@domenic
Copy link
Member Author

domenic commented Aug 5, 2013

If others are agreed, i.e. that (Resolve-Inside-Async) should always be a-c-b, then I think we can take two approaches:

Take 1

onFulfilled or onRejected must not be called until the execution context stack contains only platform code.

Footnote: Here "platform code" means engine, environment, and promise implementation code. In practice, this requirement ensures that onFulfilled and onRejected execute asynchronously, after the event loop turn in which then is called, and with a fresh stack. This can be implemented with either a "macro-turn" mechanism such as setTimeout or setImmediate, or with a "micro-turn" mechanism such as Object.observe or process.nextTick. Since the promise implementation is considered platform code, it may itself contain a task-scheduling queue or "trampoline" in which the handlers are called.

Take 2

onFulfilled or onRejected must not be called before the end phase of the event loop turn in which then is called.

Footnote: In practical terms, an implementation must use a mechanism such as setTimeout, setImmediate, or process.nextTick to ensure that onFulfilled and onRejected are not invoked in the same turn of the event loop as the call to then to which they are passed. Importantly, both "micro-turn" scheduling (e.g. using Object.observe or process.nextTick) and "macro-turn" scheduling (e.g. using setImmediate) fulfill the requirement.

@erights
Copy link

erights commented Aug 5, 2013

What operational difference might there be between these two? Would take 2 allow a "promise implementation ... may itself contain a task-scheduling queue or "trampoline" in which the handlers are called."? If not, then I am prefer take 1.

@domenic
Copy link
Member Author

domenic commented Aug 5, 2013

When I wrote this 11 hours ago, I thought there was no operational difference between the two, and it was just a matter of which concepts we find more relevant or less confusing.

But now I think take 2 is indeed a failure; it breaks trampolining, i.e. (Then-Inside-Then), and it allows a-b-c for (Resolve-Inside-Async), which we just spent a couple days establishing as bad. Oops! Not sure what I was thinking.

In that case I think, modulo anyone else chiming in, we should settle on take 1.

@novemberborn
Copy link
Contributor

I can live with Take 1, but it should provide an example of how Resolve-Inside-Async is not allowed. The text is fairly opaque. Incidentally I think this answers #77.

@domenic
Copy link
Member Author

domenic commented Aug 12, 2013

@briancavalier would love your thoughts.

@briancavalier
Copy link
Member

Sorry guys, things have been very hectic on my end for the past two weeks.

It does seem like Take 1 is a good way to go. There might be some gray areas around "platform code", even though you've specified what that means. I think we can probably just deal with such gray areas as they arise.

@alexbyk
Copy link
Contributor

alexbyk commented Nov 29, 2016

even if this issue is old and closed, could you please edit top comment for Resolve-Inside-Async case?
Because if someone encounters this, he'll think that it should be a,b,c (instead of a,c,b)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants