Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue on some nvidia cards with texelFetch. #3073

Merged
merged 1 commit into from Sep 19, 2018

Conversation

@gw3583
Copy link
Collaborator

gw3583 commented Sep 18, 2018

In the do_clip() function, texelFetch is sometimes returning
invalid results on some nVidia GPUs.

To work around that, use UVs as normalized float coordinates
and fetch via texture(). This fixes the problems on nVidia
cards.

This exposed a bug in how we handle vertex snapping with clips.
Previously, we just cast the float UVs to an int before passing
to texelFetch. Now, pass in the snap offset and apply that to
the UVs for the clip mask.


This change is Reviewable

@gw3583
Copy link
Collaborator Author

gw3583 commented Sep 18, 2018

r? @kvark

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1491911

Surprisingly enough, this doesn't seem to have broken anything on a try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1b06cde8bb6406aae65ddd0d29776ebfd69ab35

It's still pending but so far no changes on Linux and a new PASS in OSX.

@gw3583
Copy link
Collaborator Author

gw3583 commented Sep 18, 2018

Yep, try run looks good.

@nical
Copy link
Collaborator

nical commented Sep 18, 2018

I added an entry in the driver issues page. The entry is a bit hand-wavey and it'd be sad if we have to give up on texelFetch. Don't hesitate to add more information.

@kvark
Copy link
Member

kvark commented Sep 18, 2018

@nical that entry link is broken, FYI

void write_clip(vec4 world_pos, ClipArea area) {
vec2 uv = world_pos.xy * uDevicePixelRatio +
void write_clip(vec4 world_pos, vec2 snap_offset, ClipArea area) {
vec2 uv = snap_offset + world_pos.xy * uDevicePixelRatio +

This comment has been minimized.

@kvark

kvark Sep 18, 2018

Member

snap_offset needs to be multiplied by world_pos.w to be homogeneous

This comment has been minimized.

@gw3583

gw3583 Sep 18, 2018

Author Collaborator

Fixed

ivec3 tc = ivec3(mask_uv, vClipMaskUv.z);
return texelFetch(sCacheA8, tc, 0).r;

// TODO(gw): texelFetch here fails on some nVidia hardware in

This comment has been minimized.

@kvark

kvark Sep 18, 2018

Member

this is really really sad 😿

This comment has been minimized.

@gw3583

gw3583 Sep 18, 2018

Author Collaborator

Yes 😢

In the do_clip() function, texelFetch is sometimes returning
invalid results on some nVidia GPUs.

To work around that, use UVs as normalized float coordinates
and fetch via texture(). This fixes the problems on nVidia
cards.

This exposed a bug in how we handle vertex snapping with clips.
Previously, we just cast the float UVs to an int before passing
to texelFetch. Now, pass in the snap offset and apply that to
the UVs for the clip mask.
@gw3583 gw3583 force-pushed the gw3583:nvidia-clip branch from f902cd6 to 171be5b Sep 18, 2018
@gw3583
Copy link
Collaborator Author

gw3583 commented Sep 18, 2018

@kvark Thanks for the review, this is ready to go now, I think.

@kvark
kvark approved these changes Sep 19, 2018
Copy link
Member

kvark left a comment

:lgtm: , one suggestion left on the table

Reviewed 8 of 9 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @gw3583)


webrender/res/prim_shared.glsl, line 268 at r2 (raw file):

    //           unconditionally.
    mask_uv = clamp(mask_uv, vClipMaskUvSampleBounds.xy, vClipMaskUvSampleBounds.zw);
    return texture(sCacheA8, vec3(mask_uv, vClipMaskUv.z)).r;

we can at least use the LOD-selecting variant

Copy link
Collaborator Author

gw3583 left a comment

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @gw3583)


webrender/res/prim_shared.glsl, line 268 at r2 (raw file):

Previously, kvark (Dzmitry Malyshau) wrote…

we can at least use the LOD-selecting variant

Does that make any difference, when we know the sCacheA8 texture never has mips? Maybe it's a theoretical performance hint to the driver?

@gw3583
Copy link
Collaborator Author

gw3583 commented Sep 19, 2018

@bors-servo r=kvark

@bors-servo
Copy link
Contributor

bors-servo commented Sep 19, 2018

📌 Commit 171be5b has been approved by kvark

@bors-servo
Copy link
Contributor

bors-servo commented Sep 19, 2018

Testing commit 171be5b with merge 01ec3e7...

bors-servo added a commit that referenced this pull request Sep 19, 2018
Fix an issue on some nvidia cards with texelFetch.

In the do_clip() function, texelFetch is sometimes returning
invalid results on some nVidia GPUs.

To work around that, use UVs as normalized float coordinates
and fetch via texture(). This fixes the problems on nVidia
cards.

This exposed a bug in how we handle vertex snapping with clips.
Previously, we just cast the float UVs to an int before passing
to texelFetch. Now, pass in the snap offset and apply that to
the UVs for the clip mask.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3073)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 19, 2018

☀️ Test successful - status-appveyor, status-taskcluster
Approved by: kvark
Pushing 01ec3e7 to master...

@bors-servo bors-servo merged commit 171be5b into servo:master Sep 19, 2018
3 of 4 checks passed
3 of 4 checks passed
code-review/reviewable 1 discussion left (kvark)
Details
Taskcluster (pull_request) TaskGroup: success
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details
@kvark
Copy link
Member

kvark commented Sep 19, 2018

@jrmuizel found out that this PR fixes a few reftests on Gecko side that aren't related to NVidia, and are reproducible on both OSX and Windows. It means that we weren't doing the texelFetch correctly in the first place, and it's not (or at least, not only) the driver bug in play.

Re-reading the patch, I noticed a few things. First, the new code doesn't sample the same texels: since we are using the nearest sampling now, we are rounding up the sampling coordinate (e.g. 0.7 -> 1.0).

Secondly, the old code wasn't properly clamping the upper sampling bounds:

     bvec4 inside = lessThanEqual(
         vec4(vClipMaskUvBounds.xy, mask_uv),
vec4(mask_uv, vClipMaskUvBounds.zw));

Notice the second part of comparison - it would pass even if mask_uv == vClipMaskUvBounds.zw. If that is an integer number, we'd fetch a texel outside of the permitted area.

With this in mind, there is a hope that we aren't really seeing a driver bug here. I'll prepare a patch to test things out.

@kvark
Copy link
Member

kvark commented Sep 19, 2018

@gw3583 I missed this one:

Does that make any difference, when we know the sCacheA8 texture never has mips? Maybe it's a theoretical performance hint to the driver?

The texture doesn't have mips, right, so the texture unit knows it doesn't need to load them at run-time. But the shader compiler doesn't know that, and can't possibly (since the texture can be changing at run-time). In modern GPUs, the LOD selection appears to be computed with shader instructions, according to Humus:

Since gradient are needed by the texture units
anyway, it made sense in the past to let them handle it; however, now that GCN has a
generic lane swizzle, the ALUs has all the tools to do the work itself, so now it’s done in the
ALUs again.

It would be great to confirm with something like a shader analyzer, but the reasoning is sound to me, and I see no downside to use textureLod whenever we can anyway.

Also, interesting (distantly related) tip from a similar document:

GCN Performance Tip 23: GetDimensions() is a TEX instruction; prefer storing
texture dimensions in a Constant Buffer if TEX-bound.
Notes: The use of a TEX instruction can lead to lower shader performance if the shader
is TEX instructions - limited. It m ay also be subject to additional latency. It is therefore preferred to store the desired texture dimensions into a constant buffe

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Sep 19, 2018
xeonchen pushed a commit to xeonchen/gecko-cinnabar that referenced this pull request Sep 20, 2018
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Sep 20, 2018
--HG--
extra : rebase_source : 7d090510f12caceaff8c2329e47871dbb5f4de55
jankeromnes pushed a commit to jankeromnes/gecko that referenced this pull request Sep 21, 2018
bors-servo added a commit that referenced this pull request Sep 25, 2018
Bring back texelFetch from clip surface

This PR reverts #3073, leaves the snapping offsets around, and fixes the concern spotted in #3073 (comment)

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2fce2addd97b66f044ecc83a46c7f45b53dc5381&selectedJob=200247696

WIP because: needs @gw3583 to check if it fixes their test on Nvidia GPU.
cc @jrmuizel

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3086)
<!-- Reviewable:end -->
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Oct 3, 2019
UltraBlame original commit: 16995931c819acec7bbf665fe635da7430602c85
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Oct 3, 2019
UltraBlame original commit: 25b839b8e6a98334398c16a89aa03009481a640e
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Oct 3, 2019
UltraBlame original commit: 16995931c819acec7bbf665fe635da7430602c85
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Oct 3, 2019
UltraBlame original commit: 25b839b8e6a98334398c16a89aa03009481a640e
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Oct 3, 2019
UltraBlame original commit: 16995931c819acec7bbf665fe635da7430602c85
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Oct 3, 2019
UltraBlame original commit: 25b839b8e6a98334398c16a89aa03009481a640e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.