Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage during a RAF loop #9844

Closed
paulrouget opened this issue Mar 2, 2016 · 30 comments
Closed

High CPU usage during a RAF loop #9844

paulrouget opened this issue Mar 2, 2016 · 30 comments

Comments

@paulrouget
Copy link
Contributor

@paulrouget paulrouget commented Mar 2, 2016

Something as simple at that:

<script>
  function foo() {
    requestAnimationFrame(foo);
  }
</script>

<button onclick="foo()">Start rAF</button>

… shows a 30% to 50% CPU usage on my macbook.

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Mar 3, 2016

This is a lot better since #9858

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Mar 17, 2016

Still around ~15% here. Compared to Gecko: ~3%.

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 21, 2016

Yeah, so this is another case of Servo's design currently making it impossible to catch up to other browsers here. Safari, Gecko, and Chrome all use CVDisplayLink instead of actually synchronizing requestAnimationFrame() to CGLFlushDrawable() calls. We use CGLFlushDrawable() with a swap interval of 1, which means we have to actually redraw the picture every frame on the OS main thread and communicate that info to the background threads. All of that adds overhead. The good news is that this is significantly amortized if you're actually drawing something in your requestAnimationFrame loop, since the optimization of not drawing the picture only applies if you aren't actually animating anything. :)

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Mar 21, 2016

This test shows that there's an actual overhead even when drawing: https://gist.github.com/paulrouget/99442585974f6661b7c5

25% CPU without rAF, 55% CPU with rAF.

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 22, 2016

OK, that is suspicious. Event handling is expected to be somewhat expensive, but not that expensive.

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 23, 2016

It is true that we drop a few frames with setTimeout(), but not enough that it explains the difference. The profile doesn't look that different either. Strange…

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Apr 5, 2016

@pcwalton any idea what's going on here?

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Apr 7, 2016

Here's something interesting I found (feel free to confirm, @paulrouget): The CPU usage between rAF and non-rAF seems nearly-identical (about 21%) for me if the window is in the background. But if the window is in the foreground, then the CPU usage of rAF jumps up.

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Apr 7, 2016

I can't reproduce. Do you use the test from the first comment? Is anything being drawn in your loop?

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Apr 7, 2016

Yes, I'm using the test from the first comment. By not being able to repro, do you mean that the rAF usage is always high and the setTimeout is low, or that both are low?

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Apr 7, 2016

I'm using the test from the first comment. With rAF, I get ~20% with window in the background and foreground. With setTimeout instead, I get around ~5% with window in the background and foreground.

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Apr 7, 2016

Interesting. I get 20% for both rAF and setTimeout with the window in the background.

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented Apr 7, 2016

This is the code I use for setTimeout:

<script>
  var x = 0;
  window.requestAnimationFrame = c => setTimeout(c, 16);
  setInterval(() => {
    console.log(x);
    x = 0;
  }, 1000);
  function foo() {
    x++;
    requestAnimationFrame(foo);
  }
</script>

<button onclick="foo()">Start rAF</button>
@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Apr 7, 2016

So this is at least part of it: When using setTimeout(), we get 53 FPS. When using rAF, we get 60 FPS. More FPS == more frames painted == more CPU.

@tschneidereit
Copy link
Contributor

@tschneidereit tschneidereit commented May 9, 2016

I'm seeing 27% with setTimeout and 31% with rAF with @paulrouget's current Homebrew servo distribution. That seems perfectly in line with the difference in FPS that @pcwalton (and that I'm seeing, too.)

@paulrouget, your 5% CPU usage when using setTimeout seems suspiciously low. What kinds of results do you see when you pass -w -Z wr-stats?

@tschneidereit
Copy link
Contributor

@tschneidereit tschneidereit commented May 9, 2016

Tested on my 2015 Macbook, too, with different results: using setTimeout I get the same 27% CPU usage as on my 2012 Macbook Pro (which is somewhat surprising given that the latter's CPU should be quite a bit more powerful.) Using rAF, though, I get 54% CPU usage, so pretty exactly twice as much. And that's with roughly the same FPS.

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented May 9, 2016

Describing my protocol:

  • this code: #9844 (comment)
  • latest master with -w -b without -Z wr-stats (does that actually take CPU to draw?)
  • keep window in foreground
  • activity monitor in background

Results with my macbook 2015:

  • with rAF, ~11% CPU, 60 FPS
  • without, ~3% CPU, ~54 FPS
@tschneidereit
Copy link
Contributor

@tschneidereit tschneidereit commented May 10, 2016

Results with my macbook 2015:

with rAF, ~11% CPU, 60 FPS
without, ~3% CPU, ~54 FPS

Ok, when actually testing the right code (I was testing with this), I see roughly the same result. Only difference is that with rAF I have 14%-16% CPU usage.

pcwalton added a commit to pcwalton/glutin that referenced this issue May 13, 2016
repeatedly creating a new one when waking up the event loop from another
thread.

This also avoids sending this event to the `NSApplication`, since that's
needless overhead.

Reduces CPU usage of a simple `requestAnimationFrame()` loop in Servo by
23%.

Partially addresses servo/servo#9844.
pcwalton added a commit to pcwalton/glutin that referenced this issue May 13, 2016
repeatedly creating a new one when waking up the event loop from another
thread.

This also avoids sending this event to the `NSApplication`, since that's
needless overhead.

Reduces CPU usage of a simple `requestAnimationFrame()` loop in Servo by
23%.

Partially addresses servo/servo#9844.
bors-servo added a commit to servo/glutin that referenced this issue May 14, 2016
cocoa: Reuse a single thread-local `NSEvent` instance instead of repeatedly creating a new one when waking up the event loop from another thread.

This also avoids sending this event to the `NSApplication`, since that's
needless overhead.

Reduces CPU usage of a simple `requestAnimationFrame()` loop in Servo by
23%.

Partially addresses servo/servo#9844.

r? @paulrouget

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/glutin/87)
<!-- Reviewable:end -->
pcwalton added a commit to pcwalton/servo that referenced this issue May 16, 2016
typical `requestAnimationFrame()` animations.

This skips useless message traffic when `requestAnimationFrame()` is
called during an animation frame callback. It reduces CPU usage of the
following snippet by 49%:

    <script>
        function foo() {
            requestAnimationFrame(foo);
        }
    </script>
    <button onclick="foo()">Start rAF</button>

Partially addresses servo#9844.
pcwalton added a commit to pcwalton/servo that referenced this issue May 17, 2016
typical `requestAnimationFrame()` animations.

This skips useless message traffic when `requestAnimationFrame()` is
called during an animation frame callback. It reduces CPU usage of the
following snippet by 49%:

    <script>
        function foo() {
            requestAnimationFrame(foo);
        }
    </script>
    <button onclick="foo()">Start rAF</button>

Partially addresses servo#9844.
bors-servo added a commit that referenced this issue May 17, 2016
…s, r=jdm

script: Avoid needless `ChangeRunningAnimationsState` messages during typical `requestAnimationFrame()` animations.

This skips useless message traffic when `requestAnimationFrame()` is
called during an animation frame callback. It reduces CPU usage of the
following snippet by 49%:

    <script>
        function foo() {
            requestAnimationFrame(foo);
        }
    </script>
    <button onclick="foo()">Start rAF</button>

Partially addresses #9844.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/11205)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue May 17, 2016
…s, r=jdm

script: Avoid needless `ChangeRunningAnimationsState` messages during typical `requestAnimationFrame()` animations.

This skips useless message traffic when `requestAnimationFrame()` is
called during an animation frame callback. It reduces CPU usage of the
following snippet by 49%:

    <script>
        function foo() {
            requestAnimationFrame(foo);
        }
    </script>
    <button onclick="foo()">Start rAF</button>

Partially addresses #9844.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/11205)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue May 18, 2016
…s, r=jdm

script: Avoid needless `ChangeRunningAnimationsState` messages during typical `requestAnimationFrame()` animations.

This skips useless message traffic when `requestAnimationFrame()` is
called during an animation frame callback. It reduces CPU usage of the
following snippet by 49%:

    <script>
        function foo() {
            requestAnimationFrame(foo);
        }
    </script>
    <button onclick="foo()">Start rAF</button>

Partially addresses #9844.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/11205)
<!-- Reviewable:end -->
@nox
Copy link
Member

@nox nox commented May 18, 2016

@paulrouget That doesn't seem to be true anymore since #11205.

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented May 19, 2016

@nox
Copy link
Member

@nox nox commented May 19, 2016

Just want to make sure, you are using latest master right?

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented May 19, 2016

Just want to make sure, you are using latest master right?

Yes.

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented May 23, 2016

So now that #11260 appears to not be a problem anymore, here are the latest results on my macbook:

  • with rAF, ~8.5% CPU, 60 FPS
  • without (setTimeout(16)), ~2% CPU, ~54 FPS
@nox
Copy link
Member

@nox nox commented May 23, 2016

@paulrouget Should we consider it fixed?

@paulrouget
Copy link
Contributor Author

@paulrouget paulrouget commented May 23, 2016

@paulrouget Should we consider it fixed?

Good enough to run browserhtml. Divided by 2 the initial overhead.

I still don't understand why it takes more CPU than setTimeout(16).

@paulrouget paulrouget closed this May 23, 2016
@nox
Copy link
Member

@nox nox commented May 23, 2016

@paulrouget @pcwalton says that's the necessary cost to vsync.

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented May 23, 2016

@paulrouget I'll explain the issue. setTimeout(16) does not have to sync to the vertical blanking interval (VBLANK) of the display. However, requestAnimationFrame() does—or, at least, it should, as that is its intended use.

The ways to do that are limited. On the Mac and iOS, Apple recommends a class called CVDisplayLink. This is what Safari and Firefox use. When I used it, however, I got tearing if the swap interval was not set to 1, which is unexpected. Investigating further, I found that CVDisplayLink simply queries the refresh rate of the screen and sets up a timer for that interval—effectively, it just does a glorified setTimeout(16). This is not adequate: the exact moment of VBLANK is what we care about, not simply the interval.

The other, cross-platform method is through glXSwapBuffers on X11 and CGLFlushDrawable on the Mac, with the swap interval set to 1. The swap interval can be configured with glXSwapIntervalEXT on X11 and kCGLPFASwapInterval on the Mac. This causes glXSwapBuffers/CGLFlushDrawable to finish rendering, block until the next VBLANK, swap, and then return. This is the most reliable way to actually get the correct VBLANK timing—it has to be, or else tearing would occur. So this is the method we use.

Now note that glXSwapBuffers/CGLFlushDrawable do two things: they swap the front and back buffer, and they block until the next VBLANK. In the case in which there is no rendering to be done, this is a waste: we don't want to swap buffers, but we do want to block until the nearest VBLANK. However, the API fuses both operations. So we actually have to paint the frame and perform the buffer swap. This uses a significant amount of CPU and GPU time, both in preparing the rendering commands and in the driver.

As mentioned before, Safari and Firefox dodge this by using CVDisplayLink on the Mac, which only performs VBLANK notifications, which is why you see them use less CPU in this test case. But those VBLANK notifications are not as accurate as the ones we get from swapping buffers, at least according to my investigation. So there's a tradeoff: more CPU usage (though note that there's using not much if any more CPU in the case of actual rendering) versus more accurate timing. I made the latter choice.

zakorgy added a commit to zakorgy/servo that referenced this issue May 26, 2016
typical `requestAnimationFrame()` animations.

This skips useless message traffic when `requestAnimationFrame()` is
called during an animation frame callback. It reduces CPU usage of the
following snippet by 49%:

    <script>
        function foo() {
            requestAnimationFrame(foo);
        }
    </script>
    <button onclick="foo()">Start rAF</button>

Partially addresses servo#9844.
emilio added a commit to emilio/glutin that referenced this issue Jul 6, 2016
repeatedly creating a new one when waking up the event loop from another
thread.

This also avoids sending this event to the `NSApplication`, since that's
needless overhead.

Reduces CPU usage of a simple `requestAnimationFrame()` loop in Servo by
23%.

Partially addresses servo/servo#9844.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.