Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the blur shader to remove the variable-length loop and to use the texture filtering hardware more effectively. #3028

Closed
wants to merge 4 commits into from

Conversation

@pcwalton
Copy link
Collaborator

pcwalton commented Sep 5, 2018

This new shader performs Gaussian resampling instead of regular convolution.
It samples in between texels to reduce the number of taps.

The speed is about the same as the existing technique. It is nevertheless the
fastest blur method that I could come up with. In particular, it exceeds the
performance of the Kawase and dual Kawase blur techniques. The speed comes from
working at lower resolution and incurring fewer downsampling and upsampling
passes. As is often the case, ALU performance does not really seem to be the
limiting factor; it's mostly memory bandwidth, which is why downsampling is so
important. Further improvements should come from not doing the downsampling at
all and instead rendering the blurred content at low resolution to begin with.

Closes #2821.

Some reftests are currently failing; do not merge yet. I just wanted to get this here to discuss. cc @gw3583


This change is Reviewable

@pcwalton pcwalton changed the title (DO NOT MERGE YET) Rewrite the blur shader to remove the variable-length loop and to use the texture filtering hardware more effectively. (DO NOT MERGE YET) Rewrite the blur shader to remove the variable-length loop and to use the texture filtering hardware more effectively. Sep 5, 2018
@pcwalton pcwalton force-pushed the pcwalton:blur branch from bd0244d to 3b20924 Sep 6, 2018
@pcwalton
Copy link
Collaborator Author

pcwalton commented Sep 6, 2018

Just one reftest is failing and needs investigation. (It may just be a harmless minor blur algorithm difference, but I want to be sure.)

@kvark
Copy link
Member

kvark commented Sep 6, 2018

Is there anything special going on during upsampling in order to preserve the missing detail?

@pcwalton
Copy link
Collaborator Author

pcwalton commented Sep 6, 2018

There's no (well, negligible) missing detail because Gaussian blur is a low-pass filter. (The Fourier transform of a Gaussian convolution is itself a Gaussian with the characteristic bell curve shape.) That's why downsampling works in the first place. Because the output of a Gaussian blur is effectively bandlimited, the Nyquist-Shannon sampling theorem says that we can preserve fewer samples in order to perfectly reconstruct the signal.

pcwalton added 4 commits Sep 4, 2018
This benchmark tests the performance of large blur radii combined with large
images.
… the

texture filtering hardware more effectively.

This new shader performs Gaussian *resampling* instead of regular convolution.
It samples in between texels to reduce the number of taps.

The speed is about the same as the existing technique. It is nevertheless the
fastest blur method that I could come up with. In particular, it exceeds the
performance of the Kawase and dual Kawase blur techniques. The speed comes from
working at lower resolution and incurring fewer downsampling and upsampling
passes. As is often the case, ALU performance does not really seem to be the
limiting factor; it's mostly memory bandwidth, which is why downsampling is so
important. Further improvements should come from not doing the downsampling at
all and instead rendering the blurred content at low resolution to begin with.

Closes #2821.
@pcwalton pcwalton force-pushed the pcwalton:blur branch from 3b20924 to 4684f99 Sep 6, 2018
@pcwalton pcwalton changed the title (DO NOT MERGE YET) Rewrite the blur shader to remove the variable-length loop and to use the texture filtering hardware more effectively. Rewrite the blur shader to remove the variable-length loop and to use the texture filtering hardware more effectively. Sep 6, 2018
@pcwalton
Copy link
Collaborator Author

pcwalton commented Sep 6, 2018

OK, this should be ready for review, though we should do a Gecko try run as well. r? @gw3583

@pcwalton pcwalton requested a review from gw3583 Sep 6, 2018
@kvark kvark self-requested a review Sep 7, 2018
@kvark
Copy link
Member

kvark commented Sep 7, 2018

There's no (well, negligible) missing detail because Gaussian blur is a low-pass filter. (The Fourier transform of a Gaussian convolution is itself a Gaussian with the characteristic bell curve shape.) That's why downsampling works in the first place. Because the output of a Gaussian blur is effectively bandlimited, the Nyquist-Shannon sampling theorem says that we can preserve fewer samples in order to perfectly reconstruct the signal.

We aren't perfectly reconstructing the signal, if I understand correctly, it's still an approximation.
Some light reading with pictures for those interested - http://chemaguerra.com/downsampling-and-gaussian-filters/

@kvark
Copy link
Member

kvark commented Sep 7, 2018

Weirdly Github doesn't show the CI icon for your latest commit. Could happen if you changed the date and/or committed from a different time zone?

Copy link
Member

kvark left a comment

Looks sensible! A few concerns below:

};

BlurTask fetch_blur_task(int address) {
BlurTask fetchBlurTask(int address) {

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

please stick to the existing naming convention in the shaders

This comment has been minimized.

@pcwalton

pcwalton Sep 7, 2018

Author Collaborator

I made that change to improve consistency. Do function names have snake_case and variable names have CamelCase? I never was able to figure it out.

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

Look at prim_shared.glsl for examples. It's pretty much the Rust naming convention:

  • variables and functions are snake_case
  • type names are CamelCase
#endif

#ifdef WR_FRAGMENT_SHADER

#if defined WR_FEATURE_COLOR_TARGET
#define SUPPORT 4

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

it's not clear what SUPPORT here means

gauss_coefficient.xy *= gauss_coefficient.yz;
bool vertical = vVertical != 0;
vec2 axisCoord = vertical ? vUv.yx : vUv.xy;
float start = floor(axisCoord.x - float(SUPPORT)) + 0.5;

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

if the flooring changes our start position, wouldn't the computed Gaussian factors no longer match?

This comment has been minimized.

@pcwalton

pcwalton Sep 7, 2018

Author Collaborator

The computed Gaussian factors are based on offset, which is relative to the un-floored axisCoord.x.

@@ -274,7 +274,7 @@ impl ClipNode {
// Quote from https://drafts.csswg.org/css-backgrounds-3/#shadow-blur
// "the image that would be generated by applying to the shadow a
// Gaussian blur with a standard deviation equal to half the blur radius."
let blur_radius_dp = (info.blur_radius * 0.5 * device_pixel_scale.0).round();
let blur_radius_dp = (info.blur_radius * device_pixel_scale.0).round();

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

comment above needs to be addressed as well

let uv_rect_kind = render_tasks[src_task_id].uv_rect_kind();

// Determine the number of downsampling passes that we'll use.
let downsampling_passes = (30 as u32).saturating_sub((radius as u32).leading_zeros());

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

number of downsample passes = 30 - (leading 0 bits of radius)?
where does 30 come from?

This comment has been minimized.

@pcwalton

pcwalton Sep 7, 2018

Author Collaborator

It's log₂(r) - 1 == log₂(r/2). (Note the leading_zeros call.) I'll add this to a comment :)

let _timer = self.gpu_profile.start_timer(GPU_TAG_BLUR);

self.set_blend(false, framebuffer_kind);
self.shaders.cs_blur_rgba8
.bind(&mut self.device, projection, &mut self.renderer_errors);

if !target.vertical_blurs.is_empty() {
if !target.blurs.is_empty() {

This comment has been minimized.

@kvark

kvark Sep 7, 2018

Member

we already check for this condition in an enclosing block

@pcwalton
Copy link
Collaborator Author

pcwalton commented Sep 7, 2018

I should mention that we shouldn't be downsampling any more than the old code did.

I noticed a bunch of test failures on try (thanks for kicking off the try run!)—I'll investigate those.

@pcwalton
Copy link
Collaborator Author

pcwalton commented Sep 13, 2018

Given all the test failures on try, I'm going to try separating the new shader from the downscaling changes to try to isolate the source of the problems.

@bors-servo
Copy link
Contributor

bors-servo commented Sep 25, 2018

The latest upstream changes (presumably #3098) made this pull request unmergeable. Please resolve the merge conflicts.

pcwalton added a commit to pcwalton/webrender that referenced this pull request Sep 26, 2018
I'm pretty confident this is correct--it's either just as accurate or more
accurate than the current implementation--so it'll let me sort out which test
failures, if any, from servo#3028 and servo#3122 are true regressions.
pcwalton added a commit to pcwalton/webrender that referenced this pull request Sep 26, 2018
I'm pretty confident this is correct--it's either just as accurate or more
accurate than the current implementation--so it'll let me sort out which test
failures, if any, from servo#3028 and servo#3122 are true regressions.
bors-servo added a commit that referenced this pull request Sep 26, 2018
Round blur support up to the nearest even number of pixels.

I'm pretty confident this is correct--it's either just as accurate or more
accurate than the current implementation--so it'll let me sort out which test failures, if any, from #3028 and #3122 are true regressions.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3126)
<!-- Reviewable:end -->
@gw3583
Copy link
Collaborator

gw3583 commented Oct 5, 2018

@pcwalton Do we still want this kept open?

@pcwalton
Copy link
Collaborator Author

pcwalton commented Oct 5, 2018

@gw3583 Once #3122 lands I want to update this to be just about removing the variable-length loop (as much as possible anyway).

@gw3583
Copy link
Collaborator

gw3583 commented Dec 18, 2018

@pcwalton #3122 has landed now. Should we keep this open and rebase on top of those changes, or would it be best to close this and open a new PR when ready?

@pcwalton pcwalton closed this Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.