-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AtlasEngine: Reduce shader power draw with explicit branching #12552
Conversation
How are you testing power draw? I might be able to find an AMD GPU to test. |
|
149f236
to
69ae3d2
Compare
desc.AlphaMode = DXGI_ALPHA_MODE_IGNORE; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is redundant with a ternary statement a few lines above, where we already set the correct AlphaMode
if a HWND is present.
@@ -79,7 +79,7 @@ float4 DWrite_GrayscaleBlend(float4 gammaRatios, float grayscaleEnhancedContrast | |||
float3 foregroundStraight = DWrite_UnpremultiplyColor(foregroundColor); | |||
float contrastBoost = isThinFont ? 0.5f : 0.0f; | |||
float blendEnhancedContrast = contrastBoost + DWrite_ApplyLightOnDarkContrastAdjustment(grayscaleEnhancedContrast, foregroundStraight); | |||
float intensity = DWrite_CalcColorIntensity(foregroundColor.rgb); | |||
float intensity = DWrite_CalcColorIntensity(foregroundStraight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a minor bug in my DirectWrite implementation.
float3 foregroundStraight = DWrite_UnpremultiplyColor(fg); | ||
float blendEnhancedContrast = DWrite_ApplyLightOnDarkContrastAdjustment(enhancedContrast, foregroundStraight); | ||
|
||
[branch] if (useClearType) | ||
{ | ||
// See DWrite_ClearTypeBlend | ||
float3 contrasted = DWrite_EnhanceContrast3(glyph.rgb, blendEnhancedContrast); | ||
float3 alphaCorrected = DWrite_ApplyAlphaCorrection3(contrasted, foregroundStraight, gammaRatios); | ||
color = float4(lerp(color.rgb, foregroundStraight, alphaCorrected * fg.a), 1.0f); | ||
} | ||
else | ||
{ | ||
// See DWrite_GrayscaleBlend | ||
float intensity = DWrite_CalcColorIntensity(foregroundStraight); | ||
float contrasted = DWrite_EnhanceContrast(glyph.a, blendEnhancedContrast); | ||
color = fg * DWrite_ApplyAlphaCorrection(contrasted, intensity, gammaRatios); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined the two DWrite functions to offset the binary size cost for the shader a bit. Due to all the [branch]
annotations the compiler can't inline as much which increases binary size of the shader by about 50%.
Sorry Leonard, it turns out I didn't have AMD hardware at home anymore. |
It'll probably be fine... 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Hello @lhecker! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
🎉 Handy links: |
Many articles I read while writing this engine claimed that GPUs can't
do branches like CPUs can. One common approach to branching in GPUs is
apparently to "mask" out results, a technique called branch predication.
The GPU will simply execute all instructions in your shader linearly,
but if a branch isn't taken, it'll ignore the computation results.
This is unfortunate for our shader, since most branches we have are
only very seldomly taken. The cursor for instance is only drawn
on a single cell and underlines are seldomly used.
But apparently modern GPUs (2010s and later?) are actually entirely
capable of branching, if all lanes ("pixels") processed by a
wave (""GPU core"") take the same branch.
On both my Nvidia GPU (RTX 3080) and Intel iGPU (Intel HD Graphics 530)
this change has a positive impact on power draw. Most noticeably on the
latter this reduces power draw from 900mW down to 600mW at 60 FPS.
PR Checklist
Validation Steps Performed
It seems to work fine on Intel and Nvidia GPUs.
Unfortunately I don't have a AMD GPU to test this on, but I suspect it can't be worse.