Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsapScheduler race condition w/ nested schedules (Cannot read properties of undefined (reading 'execute')) #6672

Closed
matthewjh opened this issue Nov 11, 2021 · 9 comments
Labels
bug

Comments

@matthewjh
Copy link

@matthewjh matthewjh commented Nov 11, 2021

Bug Report

Current Behavior
On our non-trivial, non-public UI codebase that heavily uses rx.js, we sporadically see the following error arising from code that uses the AsapScheduler:

TypeError: Cannot read properties of undefined (reading 'execute')
at AsapScheduler.flush (node_modules/rxjs/src/internal/scheduler/AsapScheduler.ts:19:8

This is thrown from

if ((error = action.execute(action.state, action.delay))) {

I have debugged this in-situ and have arrived at what is, I think, a working understanding how the exception arises. I believe it to be a problem with nested AsapTask schedules.

Note there have been reports of similar issues in #4690 and #2697, which were closed without resolution. I think it likely all these share this cause.

  1. Let's say an AsapAction, A, is scheduled. The scheduler's actions are [A]. When A is executed, it will schedule another AsapAction, B.
  2. When the scheduler asynchronously flushes, it first sets this._scheduled = undefined;
    https://github.com/ReactiveX/rxjs/blob/master/src/internal/scheduler/AsapScheduler.ts#L5
  3. Then it iterates through the actions and executes them. As A is the only action, it will call action.execute() on A. It shifts A off so the actions queue is now [].
    (Here, index is initially -1 and count is the length of the queue on entering flush, which is 1.)
 do {
      if ((error = action.execute(action.state, action.delay))) {
        break;
      }
    } while (++index < count && (action = actions.shift()));
  1. In the execution of A, A schedules AsapAction B (as mentioned). The action queue is now [B]. As asapScheduler._scheduled was set to undefined on entering AsapScheduler#flush, the following code will create a new async task that will flush the scheduler:

https://github.com/ReactiveX/rxjs/blob/master/src/internal/scheduler/AsapAction.ts#L21

  1. On returning from action.execute on A, the while expression will be evaluated. Bear in mind that we mutated actions in (4). The first part is true: (++index < count) === (++(-1) < 1) === (0 < 1) === true
    The second part, action = actions.shift() will pop B from the actions queue and set the current action to B.
  2. The next turn of the loop will call action.execute() on B.
  3. Later on, the JS VM will come to execute the async task that was created when B was originally scheduled, in (4). The purpose of this task is to flush any actions created between B and now.
  4. However, assuming no AsapActions were scheduled after B, this flush task is now redundant as B has already been executed and popped from the actions queue.
  5. The async task enters flush, and because actions is [], action = action || actions.shift()!; is undefined.
  6. The exception is then thrown on action.execute(action.state, action.delay).
TypeError: Cannot read properties of undefined (reading 'execute')
            at AsapScheduler.flush (node_modules/rxjs/src/internal/scheduler/AsapScheduler.ts:19:8

One interesting footnote is that in our codebase, it appears to be quite common for an AsapAction to schedule another AsapAction. But - this exception is rare. The question is why. I suspect the following code serves to mask the issue in most instances, because it happens to remove the async task if the action queue is empty:

https://github.com/ReactiveX/rxjs/blob/master/src/internal/scheduler/AsapAction.ts#L23

// If the scheduler queue is empty, cancel the requested microtask and
    // set the scheduled flag to undefined so the next AsapAction will schedule
    // its own.
    if (scheduler.actions.length === 0) {
      immediateProvider.clearImmediate(id);
      scheduler._scheduled = undefined;
    }

However, as far as I can tell (not familiar with this code), if this happens to clean up dangling async tasks created by this scenario, it would be coincidence - they are logically distinct code paths.

Expected behavior
No exception

Reproduction

** Please provide repro code does not involves framework, or other setups to address this bug is specific to rxjs core **

  • REPL or Repo link:
    I don't have one at the moment but will try to make a stackblitz or failing test case on an rxjs fork.

Environment

  • Runtime: Chrome 91
  • RxJS version: 6.5.5
  • (If bug is related) Loader, build configuration: [e.g webpack, angular-cli version, config]

Possible Solution
The simplest fix is to fix the flawed assumption in AsapScheduler flush that there will always be actions in the queue. As the reports of this exception demonstrate, this is not always true. However, it would be even better in this case to not schedule the flush at all.

I'm assuming nested AsapAction schedules should indeed be flushed in the same turn, similar to microtasks. In that case, I think there is an argument that asapScheduler._scheduled should not be set to undefined until after all the actions are executed in the flush loop. asapScheduler._scheduled is what drives the "do I need to schedule another flush?" condition (https://github.com/ReactiveX/rxjs/blob/master/src/internal/scheduler/AsapAction.ts#L21). If asapScheduler._scheduled? answers the question "is there already a time in the future in which my new action will be executed?", then the answer needs to be "yes" inside the action execution loop. It is logically inconsistent to set this to undefined at the start of flush, answering "no" to the question, when that is not quite true: actions scheduled inside the action execution loop will be executed in the current flush.

That solution would look like this:

export class AsapScheduler extends AsyncScheduler {
  public flush(action?: AsyncAction<any>): void {
    this._active = true;
   // remove    
   // this._scheduled = undefined;

    const { actions } = this;
    let error: any;
    let index = -1;
    action = action || actions.shift()!;
    const count = actions.length;

    do {
      if ((error = action.execute(action.state, action.delay))) {
        break;
      }
    } while (++index < count && (action = actions.shift()));

    // insert
    this._scheduled = undefined;
    this._active = false;

    if (error) {
      while (++index < count && (action = actions.shift())) {
        action.unsubscribe();
      }
      throw error;
    }
  }

However, I'm not sure whether there are reasons NOT to do this.

From looking at one of the linked issues, as well as the code, it looks like other schedulers may suffer from the same loophole (e.g. the RAF scheduler).

Additional context/Screenshots
Add any other context about the problem here. If applicable, add screenshots to help explain.

@cartant @notnarb @decafdennis

@cartant
Copy link
Collaborator

@cartant cartant commented Nov 11, 2021

I've had a quick read through this and have one comment ATM. Regarding this - in point 3:

(Here, index is initially -1 and count is the length of the queue on entering flush, which is 1.)

The count is obtained after the action has been shifted off the array, so it looks to me like count will be 0 and not 1, so the expression mentioned in point 5 should be false rather than true.

Also, my understanding is that actions scheduled within other actions should not be synchronously executed with their parent action - that is, they should be executed via a separate flush/setImmediate, etc. That seems to be the purpose of the count - to not keep flushing when nested/recursive scheduled actions are encountered.

I'll have a closer look at this later. It's unfortunate that the implementation is so complicated.

@cartant
Copy link
Collaborator

@cartant cartant commented Nov 11, 2021

Also, I have a question: IIRC, the others who have raised this issue have seen the problem in Angular applications. Is your application using Angular?

@matthewjh
Copy link
Author

@matthewjh matthewjh commented Nov 11, 2021

Thanks for pointing out the count assignment, @cartant; I missed that. As I have a real-world case internally I will do some fresh debugging on it tomorrow in light of this.

Yes, our application uses Angular.

@matthewjh
Copy link
Author

@matthewjh matthewjh commented Nov 12, 2021

@cartant, I haven't tested this, but just a thought, maybe something like this would work:

  1. Schedule Action A and B. When A executes it will unsubscribe B and schedule action C. queue is [A, B]
  2. flush: pop A, queue is [B], count is 1, index is -1.
  3. execute A (i): unsubscribe B. queue is [],
  4. execute A (ii): schedule C. this creates a new async task and pushes C onto the queue. queue is [C]
  5. loop while statement runs, (++index < count) === (0 < 1), causing C to be shifted from the queue and executed. queue is []
  6. later on, the async task created in (4) runs and blows up because the queue is empty

In summary, we create a new async task for C, but C slipping into B's position in the queue "tricks" the guard logic in the while clause into executing C.

@cartant
Copy link
Collaborator

@cartant cartant commented Nov 12, 2021

@matthewjh What you've mentioned in the above comment sounds reasonable. I can imagine that unsubscriptions from within executed actions could be the culprit. I'll have a closer look/think about this later. I'm curious though, is the action unsubscription scenario that you've mentioned something that could occur in your app?

@cartant
Copy link
Collaborator

@cartant cartant commented Nov 12, 2021

I've had a bit of a look at this. If you run the following:

import { asapScheduler, Subscription } from "rxjs";

let a: Subscription;
let b: Subscription;
let c: Subscription;

const originalFlush: any = asapScheduler.flush;
asapScheduler.flush = function (...args: any[]) {
  console.log(`flush; actions.length = ${asapScheduler.actions.length}`);
  return originalFlush.apply(asapScheduler, args);
};

a = asapScheduler.schedule(() => {
  c = asapScheduler.schedule(() => {
    console.log("executed c");
  });
  console.log("scheduled c");
  b.unsubscribe();
  console.log("unsubscribed b");
  console.log("executed a");
});
console.log("scheduled a");
b = asapScheduler.schedule(() => {
  console.log("executed b");
});
console.log("scheduled b");

You'll see that c is executed in the same flush that executes a (and that's a problem; it really should not do this):

scheduled a
scheduled b
flush; actions.length = 2
scheduled c
unsubscribed b
executed a
executed c

And if the unsubscribe of b is removed, c does execute on a subsequent flush:

scheduled a
scheduled b
flush; actions.length = 2
scheduled c
executed a
executed b
flush; actions.length = 1
executed c

However, the error is not effected because when c executes, it calls recycleAsyncId:

this.id = this.recycleAsyncId(this.scheduler, this.id, null);

And that effects a clearImmediate call:

if (scheduler.actions.length === 0) {
immediateProvider.clearImmediate(id);
scheduler._scheduled = undefined;
}

So ... 🤔

@matthewjh
Copy link
Author

@matthewjh matthewjh commented Nov 13, 2021

@cartant how about this? I've added logging to our code, and this appears to be what is happening.

  1. Schedule A (creates async task for flush 1)
  2. Schedule B (via flush 1)
  3. Flush 1
  4. Execute A (queue = [B])
    1. schedule C (queue = [B, C])
      1. creates async flush task for flush 2
    2. unsubscribe C (queue = [B])
      1. recycles flush 2, but doesn't clear the task as the queue is non-empty ([B])
    3. unsubscribe A
      1. recycles flush 1
  5. Execute B (queue = [])
    1. unsubscribe B
      1. recycles flush 1
  6. Flush 2 (throws)

If so, the problem is that each task keeps track of its own "flush task id" and can clear the task in recycleAsyncId, but the check required to do so reads a data structure (scheduler.actions) that is shared across both the current flush (when we are in a flush) and the future flush. Therefore, it is possible to retain a flush task that should be cleared because it is no longer needed, just because the shared queue contains items that will certainly be executed in the current flush (and belong to the current flush rather than to the newly created flush). This comes back to the point about where nested schedules run and setting scheduler._scheduled = undefined at the top of the flush method.

@cartant
Copy link
Collaborator

@cartant cartant commented Nov 13, 2021

Yeah, I've verified that this reproduces the problem:

a = asapScheduler.schedule(() => {
  c = asapScheduler.schedule(() => {
    console.log("executed c");
  });
  console.log("scheduled c");
  c.unsubscribe();
  console.log("unsubscribed c");
  a.unsubscribe();
  console.log("unsubscribed a");
  console.log("executed a");
});
console.log("scheduled a");
b = asapScheduler.schedule(() => {
  console.log("executed b");
});
console.log("scheduled b");

But I think the solution should involve ensuring that actions scheduled within the flush are not executed within the current flush, as that appears to be the intent of the original implementation. It's only the unsubscriptions that allow said actions to be executed within the current flush. I have some ideas about how we might be able to implement this in an efficient manner without relying upon the unsound count. I'll give it some more thought when I'm less tired.

Thanks so much for your help on this.

@cartant cartant added the bug label Nov 13, 2021
@matthewjh
Copy link
Author

@matthewjh matthewjh commented Nov 13, 2021

Fantastic, thank you @cartant. Interested to see your fix when it's ready.

cartant added a commit to cartant/rxjs that referenced this issue Nov 13, 2021
benlesh pushed a commit to benlesh/rxjs that referenced this issue Nov 29, 2021
…bing during flushes

* chore: use sinon sandbox consistently

* test: add failing flush tests

* fix: don't execute actions scheduled within flush

* test: add failing tests

* fix: avoid calling flush with empty actions queue

Closes ReactiveX#6672

* chore: remove accidental auto-import

* test: call all the dones
benlesh pushed a commit that referenced this issue Nov 29, 2021
…bing during flushes

* chore: use sinon sandbox consistently

* test: add failing flush tests

* fix: don't execute actions scheduled within flush

* test: add failing tests

* fix: avoid calling flush with empty actions queue

Closes #6672

* chore: remove accidental auto-import

* test: call all the dones
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug
Projects
None yet
Development

No branches or pull requests

2 participants