Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRewrite the blur shader to remove the variable-length loop and to use the texture filtering hardware more effectively. #3028
Conversation
|
Just one reftest is failing and needs investigation. (It may just be a harmless minor blur algorithm difference, but I want to be sure.) |
|
Is there anything special going on during upsampling in order to preserve the missing detail? |
|
There's no (well, negligible) missing detail because Gaussian blur is a low-pass filter. (The Fourier transform of a Gaussian convolution is itself a Gaussian with the characteristic bell curve shape.) That's why downsampling works in the first place. Because the output of a Gaussian blur is effectively bandlimited, the Nyquist-Shannon sampling theorem says that we can preserve fewer samples in order to perfectly reconstruct the signal. |
This benchmark tests the performance of large blur radii combined with large images.
… the texture filtering hardware more effectively. This new shader performs Gaussian *resampling* instead of regular convolution. It samples in between texels to reduce the number of taps. The speed is about the same as the existing technique. It is nevertheless the fastest blur method that I could come up with. In particular, it exceeds the performance of the Kawase and dual Kawase blur techniques. The speed comes from working at lower resolution and incurring fewer downsampling and upsampling passes. As is often the case, ALU performance does not really seem to be the limiting factor; it's mostly memory bandwidth, which is why downsampling is so important. Further improvements should come from not doing the downsampling at all and instead rendering the blurred content at low resolution to begin with. Closes #2821.
|
OK, this should be ready for review, though we should do a Gecko try run as well. r? @gw3583 |
We aren't perfectly reconstructing the signal, if I understand correctly, it's still an approximation. |
|
Weirdly Github doesn't show the CI icon for your latest commit. Could happen if you changed the date and/or committed from a different time zone? |
|
Looks sensible! A few concerns below: |
| }; | ||
|
|
||
| BlurTask fetch_blur_task(int address) { | ||
| BlurTask fetchBlurTask(int address) { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 7, 2018
Author
Collaborator
I made that change to improve consistency. Do function names have snake_case and variable names have CamelCase? I never was able to figure it out.
This comment has been minimized.
This comment has been minimized.
kvark
Sep 7, 2018
Member
Look at prim_shared.glsl for examples. It's pretty much the Rust naming convention:
- variables and functions are
snake_case - type names are
CamelCase
| #endif | ||
|
|
||
| #ifdef WR_FRAGMENT_SHADER | ||
|
|
||
| #if defined WR_FEATURE_COLOR_TARGET | ||
| #define SUPPORT 4 |
This comment has been minimized.
This comment has been minimized.
| gauss_coefficient.xy *= gauss_coefficient.yz; | ||
| bool vertical = vVertical != 0; | ||
| vec2 axisCoord = vertical ? vUv.yx : vUv.xy; | ||
| float start = floor(axisCoord.x - float(SUPPORT)) + 0.5; |
This comment has been minimized.
This comment has been minimized.
kvark
Sep 7, 2018
Member
if the flooring changes our start position, wouldn't the computed Gaussian factors no longer match?
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 7, 2018
Author
Collaborator
The computed Gaussian factors are based on offset, which is relative to the un-floored axisCoord.x.
| @@ -274,7 +274,7 @@ impl ClipNode { | |||
| // Quote from https://drafts.csswg.org/css-backgrounds-3/#shadow-blur | |||
| // "the image that would be generated by applying to the shadow a | |||
| // Gaussian blur with a standard deviation equal to half the blur radius." | |||
| let blur_radius_dp = (info.blur_radius * 0.5 * device_pixel_scale.0).round(); | |||
| let blur_radius_dp = (info.blur_radius * device_pixel_scale.0).round(); | |||
This comment has been minimized.
This comment has been minimized.
| let uv_rect_kind = render_tasks[src_task_id].uv_rect_kind(); | ||
|
|
||
| // Determine the number of downsampling passes that we'll use. | ||
| let downsampling_passes = (30 as u32).saturating_sub((radius as u32).leading_zeros()); |
This comment has been minimized.
This comment has been minimized.
kvark
Sep 7, 2018
Member
number of downsample passes = 30 - (leading 0 bits of radius)?
where does 30 come from?
This comment has been minimized.
This comment has been minimized.
pcwalton
Sep 7, 2018
•
Author
Collaborator
It's log₂(r) - 1 == log₂(r/2). (Note the leading_zeros call.) I'll add this to a comment :)
| let _timer = self.gpu_profile.start_timer(GPU_TAG_BLUR); | ||
|
|
||
| self.set_blend(false, framebuffer_kind); | ||
| self.shaders.cs_blur_rgba8 | ||
| .bind(&mut self.device, projection, &mut self.renderer_errors); | ||
|
|
||
| if !target.vertical_blurs.is_empty() { | ||
| if !target.blurs.is_empty() { |
This comment has been minimized.
This comment has been minimized.
|
I should mention that we shouldn't be downsampling any more than the old code did. I noticed a bunch of test failures on try (thanks for kicking off the try run!)—I'll investigate those. |
|
Given all the test failures on try, I'm going to try separating the new shader from the downscaling changes to try to isolate the source of the problems. |
|
|
I'm pretty confident this is correct--it's either just as accurate or more accurate than the current implementation--so it'll let me sort out which test failures, if any, from servo#3028 and servo#3122 are true regressions.
I'm pretty confident this is correct--it's either just as accurate or more accurate than the current implementation--so it'll let me sort out which test failures, if any, from servo#3028 and servo#3122 are true regressions.
Round blur support up to the nearest even number of pixels. I'm pretty confident this is correct--it's either just as accurate or more accurate than the current implementation--so it'll let me sort out which test failures, if any, from #3028 and #3122 are true regressions. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3126) <!-- Reviewable:end -->
|
@pcwalton Do we still want this kept open? |
pcwalton commentedSep 5, 2018
•
edited by larsbergstrom
This new shader performs Gaussian resampling instead of regular convolution.
It samples in between texels to reduce the number of taps.
The speed is about the same as the existing technique. It is nevertheless the
fastest blur method that I could come up with. In particular, it exceeds the
performance of the Kawase and dual Kawase blur techniques. The speed comes from
working at lower resolution and incurring fewer downsampling and upsampling
passes. As is often the case, ALU performance does not really seem to be the
limiting factor; it's mostly memory bandwidth, which is why downsampling is so
important. Further improvements should come from not doing the downsampling at
all and instead rendering the blurred content at low resolution to begin with.
Closes #2821.
Some reftests are currently failing; do not merge yet. I just wanted to get this here to discuss. cc @gw3583
This change is