Skip to content

[testing only] Mutter frame timing rebase #533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from

Conversation

itzexor
Copy link
Contributor

@itzexor itzexor commented Aug 21, 2019

This is just for anyone who wants to try the recent changes in mutter to frame timing. It also includes the proposed nvidia-specific threaded-swap removal which should improve performance.

This reverts a ton of local changes, and as a result will break cinnamon-settings while applied since some gsettings are removed, and I haven't created a complementary branch for cinnamon.

The gist of the changes here are:

  1. Revert all local changes that affected frame timing code paths so we are basically at the code as it existed when mutter's fork of clutter/cogl were merged into muffin.
  2. Apply the following mutter merge requests:
    1. Remove ability to externally set sync-to-vblank
    2. clutter/stage-cogl: Don't skip over the next frame
    3. clutter/stage-cogl: Reschedule update on present
    4. Consolidate all frame throttling into clutter-stage-cogl
    5. cogl: Remove GLX "threaded swap wait" used on Nvidia
    6. Remove pending_swaps counter

itzexor and others added 23 commits August 1, 2019 15:27
Externally setting the sync-to-vblank setting was a feature
added as a workaround to old Intel and ATI graphic cards, and
is not needed anymore. Furthermore, it doesn't make sense to
change it on a compositor whatsoever.

This commit removes all the ways to externally change this
setting, as well as the now unused API.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/191
Since now we don't set the swap throttled value based
on sync-to-vblank, we can effectively remove it from
Cogl. Throttling swap buffers in Cogl is as much a
historical artifact as sync-to-vblank. Furthermore,
it doesn't make sense to disable it on a compositor,
which is the case with the embedded Cogl.

In addition to that, the winsys vfunc for updating
whenever swap throttling changes could also be removed,
since swap throttling is always enabled now.

Removing it means less code, less branches when running,
and one less config option to deal with.

This also removes the micro-perf test, since it doesn't
make sense for the case where Cogl is embedded into the
compositor.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/191
Just better wording for swap throttled.
The `last_presentation_time` is usually a little in the past (although
sometimes in the future depending on the driver). When it's over 2ms
(`sync_delay`) in the past that would trigger the while loop to count up so
that the next `update_time` is in the future.

The problem with that is for common values of `last_presentation_time`
which are only a few milliseconds ago, incrementing `update_time` by
`refresh_interval` also means counting past the next physical frame that
we haven't rendered yet. And so mutter would skip that frame.

**Example**

Given:
```
  last_presentation_time = now - 3ms
  sync_delay = 2ms
  refresh_interval = 16ms
  next_presentation_time = last_presentation_time + refresh_interval
                         = now + 13ms

          -3ms now        +13ms           +29ms           +45ms
        ----|--+------------|---------------|---------------|----
            :               :
  last_presentation_time  next_presentation_time
```

Old algorithm:
```
  update_time = last_presentation_time + sync_delay
              = now - 1ms
  while (update_time < now)
        (now - 1ms   < now)
    update_time = now - 1ms + 16ms
  update_time = now + 15ms
  next_presentation_time = now + 13ms
  available_render_time = next_presentation_time - max(now, update_time)
                        = (now + 13ms) - (now + 15ms)
                        = -2ms  so the next frame will be skipped.

          -3ms now        +13ms           +29ms           +45ms
        ----|--+------------|-+-------------|---------------|----
            :               : :
            :               : update_time (too late)
            :               :
  last_presentation_time  next_presentation_time (a missed frame)

```

New algorithm:
```
  min_render_time_allowed = refresh_interval / 2
                          = 8ms
  max_render_time_allowed = refresh_interval - sync_delay
                          = 14ms
  target_presentation_time = last_presentation_time + refresh_interval
                           = now - 3ms + 16ms
                           = now + 13ms
  while (target_presentation_time - min_render_time_allowed < now)
        (now + 13ms - 8ms < now)
        (5ms < 0ms)
    # loop is never entered
  update_time = target_presentation_time - max_render_time_allowed
              = now + 13ms - 14ms
              = now - 1ms
  next_presentation_time = now + 13ms
  available_render_time = next_presentation_time - max(now, update_time)
                        = (now + 13ms) - now
                        = 13ms  which is plenty of render time.

          -3ms now        +13ms           +29ms           +45ms
        ----|-++------------|---------------|---------------|----
            : :             :
            : update_time   :
            :               :
  last_presentation_time  next_presentation_time
```

The reason nobody noticed these missed frames very often was because
mutter has some accidental workarounds built-in:

 * Prior to 3.32, the offending code was only reachable in Xorg sessions.
   It was never reached in Wayland sessions because it hadn't been
   implemented yet (till e9e4b2b72).

 * Even though Wayland support is now implemented the native backend
   provides a `last_presentation_time` much faster than Xorg sessions
   (being in the same process) and so is less likely to spuriously enter
   the while loop to miss a frame.

 * For Xorg sessions we are accidentally triple buffering (linuxmint#334). This
   is a good way to avoid the missed frames, but is also an accident.

 * `sync_delay` is presently just high enough (2ms by coincidence is very
   close to common values of `now - last_presentation_time`) to push the
   `update_time` into the future in some cases, which avoids entering the
   while loop. This is why the same missed frames problem was also noticed
   when experimenting with `sync_delay = 0`.

v2: adjust variable names and code style.

Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=789186
       and most of https://gitlab.gnome.org/GNOME/mutter/issues/571

https://gitlab.gnome.org/GNOME/mutter/merge_requests/520
If an update (new frame) had been scheduled already before
`_clutter_stage_cogl_presented` was called then that means it was
scheduled for the wrong time. Because the `last_presentation_time` has
changed since then. And using an `update_time` based on an outdated
presentation time results in scheduling frames too early, filling the
buffer queue (triple buffering or worse) and high visual latency.

So if we do receive a presentation event when an update is already
scheduled, remember to reschedule the update based on the newer
`last_presentation_time`. This way we avoid overfilling the buffer queue
and limit ourselves to double buffering for less visible lag.

Closes: https://gitlab.gnome.org/GNOME/mutter/issues/334

Prerequisite: https://gitlab.gnome.org/GNOME/mutter/merge_requests/520

https://gitlab.gnome.org/GNOME/mutter/merge_requests/281
One million is the number of microseconds in one second, which is also
defined by `G_USEC_PER_SEC`.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
Instead of crazy refresh rates >1MHz falling back to 60Hz, just honour
them by rendering unthrottled (same as `sync_delay < 0`). Although I
wouldn't actually expect that path to ever be needed in reality, it just
ensures an infinite `while` loop never happens.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
Instead of 0Hz falling back to 60Hz, use `CLUTTER_DEFAULT_FPS` which is
also 60Hz by default.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
That could happen if the backend did not provide presentation timestamps,
or if the screen was not changing other than the hardware cursor:

  if (stage_cogl->last_presentation_time == 0||
      stage_cogl->last_presentation_time < now - 150000)
    {
      stage_cogl->update_time = now;
      return;
    }

By setting `update_time` to `now`, master_clock_get_swap_wait_time()
returns 0:

  gint64 now = g_source_get_time (master_clock->source);
  if (min_update_time < now)
    {
      return 0;
    }
  else
    {
      gint64 delay_us = min_update_time - now;
      return (delay_us + 999) / 1000;
    }

However, zero is a value unsupported by the default master clock
due to:

  if (swap_delay != 0)
    return swap_delay;

All cases are now handled by extrapolating when the next presentation
time would be and calculating an appropriate update time to meet that.

We also need to add a check for `update_time == last_update_time`, which
is a situation that just became possible since we support old (or zero)
values of `last_presentation_time`. This avoids getting more than one
stage update per frame interval when input events arrive without
triggering a stage redraw (e.g. moving the hardware cursor).

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
If `last_presentation_time` is zero (unsupported) or just very old
(system was idle) then we would like to avoid that triggering a large
number of loop interations. This will get us closer to the right answer
without iterating.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
The presentation timing logic (via `master_clock_get_swap_wait_time`) now
works unconditionally. By "works" we mean that a result of zero from
`master_clock_get_swap_wait_time` actually means zero now. Previously
zero could mean either a successful result of zero milliseconds or that
the backend couldn't get an answer. And a non-zero result is the same as
before.

This works even if the screen is "idle" and even if the backend doesn't
provide presentation timestamps. So now our two fallback throttling
mechanisms of relying on `CLUTTER_FEATURE_SWAP_THROTTLE` and decimating
to `clutter_get_default_frame_rate` can be deleted.

Closes: https://gitlab.gnome.org/GNOME/mutter/issues/406 and
        https://bugzilla.gnome.org/show_bug.cgi?id=781835

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
After 4faeb12731b8, the maximum time allowed for an update to happen
is calculated as:

  max_render_time_allowed = refresh_interval - 1000 * sync_delay;

However, extremely small refresh intervals -- that come as consequence
to extremely high refresh rates -- may fall into an odd numerical range
when refresh_interval < 1000 * sync_delay. That would give us a negative
time.

To be extra cautious about it, add another sanity check for this case.

Change suggested by Jasper St. Pierre.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
The single purpose of "threaded swap wait" was to provide the value:
`u.presentation_time = get_monotonic_time_ns ();` for use by
`clutter-stage-cogl`.

Until recently (before !363), all backends were required to provide
a nonzero value for `presentation_time` or else suffer falling back
to poor-performing throttling methods in `master_clock_next_frame_delay`.
So we needed "threaded swap wait" to support the Nvidia driver.

This is no longer true. The fallbacks don't exist any more and
`clutter_stage_cogl_schedule_update` now always succeeds even in the
absence of a `presentation_time` (since !363).

The drawbacks to keeping "threaded swap wait" are:

  * `u.presentation_time = get_monotonic_time_ns ();` is a guess and not
    an accurate hardware presentation time.
  * It required blocking the main loop on every frame in
    `_cogl_winsys_wait_for_gpu` due to `glFinish`. Any OpenGL programmer
    will tell you calling `glFinish` is a bad idea because it kills CPU-GPU
    parallelism. In my case, it was blocking the main loop for 1-3ms on
    every mutter frame. It's easy to imagine slower (or higher resolution)
    Nvidia systems would lose an even larger chunk of their frame interval
    blocked in that function. This significantly crippled frame rates on
    Nvidia systems.

The benefit to keeping "threaded swap wait" is:

  * Its guess of `presentation_time` is likely a better guess by a few
    milliseconds than the guess that `clutter_stage_cogl_schedule_update`
    will make in its place.

So "threaded swap wait" provided better sub-frame phase accuracy, but at
the expense of frame rates. And as soon as it starts causing frame drops,
that one and only benefit is lost. There is no reason to keep it.

And in case you are wondering, the documentation for "threaded swap wait"
is now wrong (since !363):

  > The advantage of enabling this is that it will allow your main loop
  > to do other work while waiting for the system to be ready to draw
  > the next frame, instead of blocking in glXSwapBuffers()."

At the time (before !363) it was true that "threaded swap wait" avoided
swap interval throttling that would occur as a result of
`master_clock_next_frame_delay` blindly returning zero and over-queuing
frames. That code no longer exists. And ironically the implementation of
"threaded swap wait" necessitates the same kind of blocking (to a lesser
extent) that it was designed to avoid. We can eliminate all blocking
however by deleting "threaded swap wait", which is now safe since !363.

https://gitlab.gnome.org/GNOME/mutter/merge_requests/602
As a protection against duplicate/early update times, it has already
been replaced by commit 35aa2781 and commit 4faeb127.

Removing it also prevents a common cause of frame skips:

  clutter_stage_cogl_schedule_update()
  clutter_stage_cogl_get_update_time() == -1

Closes: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1411

https://gitlab.gnome.org/GNOME/mutter/merge_requests/719
@itzexor itzexor closed this Sep 17, 2019
@linuxmint linuxmint deleted a comment Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants