New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fiber Principles: Contributing To Fiber #7942

Open
sebmarkbage opened this Issue Oct 11, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@sebmarkbage
Member

sebmarkbage commented Oct 11, 2016

I just wanted to document a few unique design patterns that apply to Fiber, but not necessarily anything else. I'll start here.

  • You may mutate the fiber that you're working on during beginWork and completeWork phases but you may not have any other global side-effects. If you need a global side-effect, that have to be moved to the commitWork phase.
  • Fiber is a fixed data structure. It shares the same hidden class. Never add fields outside of construction in ReactFiber.
  • Nothing in the reconciler uses dynamic dispatch. I.e. we don't call a first class function, except for user code such as ref callbacks, functional components, render methods, etc. The rest is a static function available in a closure. I.e. use myHelper(obj) instead of obj.myHelper(). Any time we need to branch logic we use a switch statement over a tag which is a number that indicates which type of object we're dealing with and which branch to take (see pattern matching).
  • Many modules are instantiated with a HostConfig object. It is a single constructor that gets called on initialization time. This should be inlinable by a compiler.
  • Nothing in Fiber uses the normal JS stack. Meaning it does use the stack but it can be compiled into a flat function if needed. Calling other functions is fine - the only limitation is that they can't be recursive.
  • If I can't use recursion, how do I traverse through the tree? Learn to use the singly linked list tree traversal algorithm. E.g. parent first, depth first:
let root = fiber;
let node = fiber;
while (true) {
  // Do something with node
  if (node.child) {
    node = node.child;
    continue;
  }
  if (node === root) {
    return;
  }
  while (!node.sibling) {
    if (!node.return || node.return === root) {
      return;
    }
    node = node.return;
  }
  node = node.sibling;
}

Why does it need to be this complicated?

  • We can use the normal JS stack for this but any time we yield in a requestIdleCallback we would have to rebuild the stack when we continue. Since this only lasts for about 50ms when idle, we would spend some time unwinding and rebuilding the stack each time. It is not too bad. However, everything along the stack would have to be aware of how to "unwind" when we abort in the middle of the work flow.
  • It is plausible we could do this at the level of OCaml algebraic effects but we don't currently have all the features we need and we don't get the performance tradeoffs we want out of the box atm. This is a plausible future way forward though.
  • Most code lives outside of this recursion so it doesn't matter much for most cases.
  • Most of what React does is in the space of what the normal stack does. E.g. memoization, error handling, etc. Using the normal stack too, just makes it more difficult to get those to interact.
  • Everything we put on the stack we generally have to put on the heap too because we memoize it. Maintaining the stack and the heap with the same data is theoretically less efficient.
  • That said, all of these optimizations might be moot because JS stacks are much more efficient than JS heaps.
  • One thing that I wanted to try was to compile React components to do work directly on these data structures, just like normal programming languages compile to make mutations etc. to the stack. I think that's where the ideal implementation of React is.

Let's just try it and see how it goes. :D

cc @spicyj @gaearon @acdlite

@sebmarkbage

This comment has been minimized.

Show comment
Hide comment
@sebmarkbage

sebmarkbage Oct 20, 2016

Member

Aren't you just working around the lack of threads in the language?

Yes and no.

It is true that we don't have a good option in JavaScript to run threads - and that is a huge problem. We tried exploring various options of running in Web Workers, parallel JS, tried to propose shared immutable persistent data structures to the language, experimented with custom VM tweaks etc. JavaScript the language isn't very suitable for this because of mutable shared runtimes like prototypes. The ecosystem isn't ready for it because you have to duplicate code loading and module initialization across workers. Garbage collectors are not as efficient as they currently are if they have to be thread safe, and VM implementors seem to be unwilling to bare the implementation cost of persistent data structures. Shared mutable typed arrays seems to be moving along but requiring all data to go through this layer seems to be unfeasible in the ecosystem today. Artificial boundaries between different parts of the code base also doesn't work very well and introduces unnecessary friction. Even then you have lots of JS code like utility libraries that have to be duplicated across workers. Which leads to slower start up time and memory overhead. So, yes, threads are likely out of the question until we can target something like Web Assembly.

However, an interesting realization is that there are other benefits of the Fiber architecture that are applicable whether you have threads or not.

ComponentKit, that runs on native, does work using threads for example. It is able to start doing higher priority work in one thread while a lower priority thread is still happening. Leading to a much simpler implementation. However, there are some limitations.

You can't safely abort the background thread. Aborting and restarting a thread is not very cheap. In many languages it is also not safe because you could be in the middle of some lazy initialization work. Even though it is effectively interrupted, you have to continue spending CPU cycles on it. One solution is to do the samething that Fiber does - make an API that has yield points so that you can unwind the stack safely. Then check a flag periodically if you should abort. That is effectively what Fiber does too.

Another limitation is that since you can't immediately abort a thread, you can't be sure if two threads are processing the same component at the same time. This leads to some limitations such as not being able to support stateful class instances (like React.Component). Although that might be a good thing for other reasons.

Another thing that threads don't automatically buy you is the ability to resume part of the work. You can't just memoize part of the work that you've done in one thread and reuse it in another. You can certainly implement that with threads but you'd end up with similar complexity as Fiber - on top of the threads.

Threads have a slight benefit that you can start the next work slightly earlier because you can interject instructions while the background thread is still powering down. However, because our yield points are frequent enough by default we don't think that will matter for Fiber.

Therefore, yes, we are writing a more complex implementation because we're lacking threads in JavaScript, but because we're forced to deal with that we'll end up with better features than if we simply relied on threads for scheduling.

What about parallelism?

It is true that threads can get you parallelism. However, this has nothing to do with scheduling for responsiveness. You generally don't want to spend one CPU processing work that was already aborted because higher priority work arrived. That doesn't buy you anything at all.

Instead, what you really want is to calculate independent work in different threads. For example, two React siblings can be calculated in parallel threads since they're disconnected. If we had access to threads, that's what we would do. However, everything about Fiber is still valid for scheduling purposes in that architecture even if we use threads for parallelism of subtrees.

So, in conclusion, no we're not just working around limitations in the language.

NOTE: This is not to say that you should choose cooperative scheduling over threads in your project/use case. It's just that for our particular use case it made sense. Threads still make sense in many other cases.

Member

sebmarkbage commented Oct 20, 2016

Aren't you just working around the lack of threads in the language?

Yes and no.

It is true that we don't have a good option in JavaScript to run threads - and that is a huge problem. We tried exploring various options of running in Web Workers, parallel JS, tried to propose shared immutable persistent data structures to the language, experimented with custom VM tweaks etc. JavaScript the language isn't very suitable for this because of mutable shared runtimes like prototypes. The ecosystem isn't ready for it because you have to duplicate code loading and module initialization across workers. Garbage collectors are not as efficient as they currently are if they have to be thread safe, and VM implementors seem to be unwilling to bare the implementation cost of persistent data structures. Shared mutable typed arrays seems to be moving along but requiring all data to go through this layer seems to be unfeasible in the ecosystem today. Artificial boundaries between different parts of the code base also doesn't work very well and introduces unnecessary friction. Even then you have lots of JS code like utility libraries that have to be duplicated across workers. Which leads to slower start up time and memory overhead. So, yes, threads are likely out of the question until we can target something like Web Assembly.

However, an interesting realization is that there are other benefits of the Fiber architecture that are applicable whether you have threads or not.

ComponentKit, that runs on native, does work using threads for example. It is able to start doing higher priority work in one thread while a lower priority thread is still happening. Leading to a much simpler implementation. However, there are some limitations.

You can't safely abort the background thread. Aborting and restarting a thread is not very cheap. In many languages it is also not safe because you could be in the middle of some lazy initialization work. Even though it is effectively interrupted, you have to continue spending CPU cycles on it. One solution is to do the samething that Fiber does - make an API that has yield points so that you can unwind the stack safely. Then check a flag periodically if you should abort. That is effectively what Fiber does too.

Another limitation is that since you can't immediately abort a thread, you can't be sure if two threads are processing the same component at the same time. This leads to some limitations such as not being able to support stateful class instances (like React.Component). Although that might be a good thing for other reasons.

Another thing that threads don't automatically buy you is the ability to resume part of the work. You can't just memoize part of the work that you've done in one thread and reuse it in another. You can certainly implement that with threads but you'd end up with similar complexity as Fiber - on top of the threads.

Threads have a slight benefit that you can start the next work slightly earlier because you can interject instructions while the background thread is still powering down. However, because our yield points are frequent enough by default we don't think that will matter for Fiber.

Therefore, yes, we are writing a more complex implementation because we're lacking threads in JavaScript, but because we're forced to deal with that we'll end up with better features than if we simply relied on threads for scheduling.

What about parallelism?

It is true that threads can get you parallelism. However, this has nothing to do with scheduling for responsiveness. You generally don't want to spend one CPU processing work that was already aborted because higher priority work arrived. That doesn't buy you anything at all.

Instead, what you really want is to calculate independent work in different threads. For example, two React siblings can be calculated in parallel threads since they're disconnected. If we had access to threads, that's what we would do. However, everything about Fiber is still valid for scheduling purposes in that architecture even if we use threads for parallelism of subtrees.

So, in conclusion, no we're not just working around limitations in the language.

NOTE: This is not to say that you should choose cooperative scheduling over threads in your project/use case. It's just that for our particular use case it made sense. Threads still make sense in many other cases.

@sebmarkbage

This comment has been minimized.

Show comment
Hide comment
@sebmarkbage

sebmarkbage Oct 20, 2016

Member

Ok, so cooperative scheduling might have some benefits over preemptive threads but...

Couldn't you just use generator functions like other scheduling frameworks have done?

No.

There's two reasons for this.

  1. Generators doesn't just let you yield in the middle of a stack. You have to wrap every single function in a generator. This not only adds a lot of syntactic overhead but also runtime overhead in any existing implementation. It's fair that the syntax might be more helpful than not, but the perf issue still stands.

  2. The biggest reason, however, is that generators are stateful. You can't resume in the middle of it.

function* doWork(a, b, c) {
  var x = doExpensiveWorkA(a);
  yield;
  var y = x + doExpensiveWorkB(b);
  yield;
  var z = y + doExpensiveWorkC(c);
  return z;
}

If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed doExpensiveWorkA(a) and doExpensiveWorkB(b) but not doExpensiveWorkC(c) there is no way for me to reuse the value x. I.e. to skip ahead to doExpensiveWorkB with a different value for b but still reuse the result of doExpensiveWorkA(a).

This is important to React since we do a lot of memoization.

It is plausible that you can add that as a layer around, but then you're really not gaining much from the use of generators.

There are also languages that have generators that are designed for a more functional use case that has this capability. JS is not one of them.

Member

sebmarkbage commented Oct 20, 2016

Ok, so cooperative scheduling might have some benefits over preemptive threads but...

Couldn't you just use generator functions like other scheduling frameworks have done?

No.

There's two reasons for this.

  1. Generators doesn't just let you yield in the middle of a stack. You have to wrap every single function in a generator. This not only adds a lot of syntactic overhead but also runtime overhead in any existing implementation. It's fair that the syntax might be more helpful than not, but the perf issue still stands.

  2. The biggest reason, however, is that generators are stateful. You can't resume in the middle of it.

function* doWork(a, b, c) {
  var x = doExpensiveWorkA(a);
  yield;
  var y = x + doExpensiveWorkB(b);
  yield;
  var z = y + doExpensiveWorkC(c);
  return z;
}

If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed doExpensiveWorkA(a) and doExpensiveWorkB(b) but not doExpensiveWorkC(c) there is no way for me to reuse the value x. I.e. to skip ahead to doExpensiveWorkB with a different value for b but still reuse the result of doExpensiveWorkA(a).

This is important to React since we do a lot of memoization.

It is plausible that you can add that as a layer around, but then you're really not gaining much from the use of generators.

There are also languages that have generators that are designed for a more functional use case that has this capability. JS is not one of them.

@sebmarkbage

This comment has been minimized.

Show comment
Hide comment
@sebmarkbage

sebmarkbage Oct 20, 2016

Member

Ok, but what about OCaml's algebraic effects then? BuckleScript can compile them.

There is some non-trivial overhead in the JS version but assuming that would be ok.

To use algebraic effects in OCaml in this way, I think we would need to clone each fiber so that we can resume it. It is possible that a compiler could reuse an immutable stack frame / fiber without copying but that's not how it works now. Without that, we only get limited benefits.

We would also end up with lots of nested handlers which would add to the overhead and also add linear search time to find the top handler when we do yield.

Besides, to support the existing React API we need a set of features that overlap with the implementation of fibers, such as "return pointers" so by combining them we get some benefits.

I do have high hopes that one day we can just use effects for our use cases internally, but we could also use it to make an even simpler React API itself. However, for now, I think that it is probably easier to have lower level control even though the complexity is higher.

I do look forward to seeing other implementations that prioritize implementation simplicity or using complex compilers to solve the same problems.

Member

sebmarkbage commented Oct 20, 2016

Ok, but what about OCaml's algebraic effects then? BuckleScript can compile them.

There is some non-trivial overhead in the JS version but assuming that would be ok.

To use algebraic effects in OCaml in this way, I think we would need to clone each fiber so that we can resume it. It is possible that a compiler could reuse an immutable stack frame / fiber without copying but that's not how it works now. Without that, we only get limited benefits.

We would also end up with lots of nested handlers which would add to the overhead and also add linear search time to find the top handler when we do yield.

Besides, to support the existing React API we need a set of features that overlap with the implementation of fibers, such as "return pointers" so by combining them we get some benefits.

I do have high hopes that one day we can just use effects for our use cases internally, but we could also use it to make an even simpler React API itself. However, for now, I think that it is probably easier to have lower level control even though the complexity is higher.

I do look forward to seeing other implementations that prioritize implementation simplicity or using complex compilers to solve the same problems.

@bvaughn

This comment has been minimized.

Show comment
Hide comment
@bvaughn

bvaughn Nov 21, 2016

Contributor

Hi! Thanks for writing up these thoughts! They're great!

I realize it's been a few weeks since you wrote this, but there's one bit I didn't quite understand. I may be misreading or missing something, but under the "generator functions" section, you say:

  1. The biggest reason, however, is that generators are stateful. You can't resume in the middle of it.
function* doWork(a, b, c) {
  var x = doExpensiveWorkA(a);
  yield;
  var y = x + doExpensiveWorkB(b);
  yield;
  var z = y + doExpensiveWorkC(c);
  return z;
}

If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed doExpensiveWorkA(a) and doExpensiveWorkB(b) but not doExpensiveWorkC(c) there is no way for me to reuse the value x. I.e. to skip ahead to doExpensiveWorkB with a different value for b but still reuse the result of doExpensiveWorkA(a).
This is important to React since we do a lot of memoization.

Wouldn't memoization actually make this a non-issue? Assuming doExpensiveWorkA is a memoized function then calling it again with the same value would be fast, effectively allowing you to "skip" the heavy work and continue to doExpensiveWorkB.

I don't know what these doExpensiveWork* functions represent so maybe it's not possible to memoize them in this way.

Contributor

bvaughn commented Nov 21, 2016

Hi! Thanks for writing up these thoughts! They're great!

I realize it's been a few weeks since you wrote this, but there's one bit I didn't quite understand. I may be misreading or missing something, but under the "generator functions" section, you say:

  1. The biggest reason, however, is that generators are stateful. You can't resume in the middle of it.
function* doWork(a, b, c) {
  var x = doExpensiveWorkA(a);
  yield;
  var y = x + doExpensiveWorkB(b);
  yield;
  var z = y + doExpensiveWorkC(c);
  return z;
}

If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed doExpensiveWorkA(a) and doExpensiveWorkB(b) but not doExpensiveWorkC(c) there is no way for me to reuse the value x. I.e. to skip ahead to doExpensiveWorkB with a different value for b but still reuse the result of doExpensiveWorkA(a).
This is important to React since we do a lot of memoization.

Wouldn't memoization actually make this a non-issue? Assuming doExpensiveWorkA is a memoized function then calling it again with the same value would be fast, effectively allowing you to "skip" the heavy work and continue to doExpensiveWorkB.

I don't know what these doExpensiveWork* functions represent so maybe it's not possible to memoize them in this way.

@sebmarkbage

This comment has been minimized.

Show comment
Hide comment
@sebmarkbage

sebmarkbage Nov 21, 2016

Member

It really more of a placeholder for other work within this generator that may be expensive.

The problem with memoization of doExpensiveWorkA is that React doesn't have an opportunity to inject memoization there. Including its local memoization strategy (using parent context and keys). You could use a global memoization for that method but then you end up with cache invalidation problems instead.

Member

sebmarkbage commented Nov 21, 2016

It really more of a placeholder for other work within this generator that may be expensive.

The problem with memoization of doExpensiveWorkA is that React doesn't have an opportunity to inject memoization there. Including its local memoization strategy (using parent context and keys). You could use a global memoization for that method but then you end up with cache invalidation problems instead.

@bvaughn

This comment has been minimized.

Show comment
Hide comment
@bvaughn

bvaughn Nov 21, 2016

Contributor

That makes sense. Thanks for clarifying!

Contributor

bvaughn commented Nov 21, 2016

That makes sense. Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment