Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dxvk] Optimize for the d3d9 Strict float emulation path #346

Closed
Blisto91 opened this issue Jan 9, 2024 · 18 comments
Closed

[dxvk] Optimize for the d3d9 Strict float emulation path #346

Blisto91 opened this issue Jan 9, 2024 · 18 comments
Assignees
Labels
assigned The issue is assigned to engineer reproduced The issue is reproduced by CQE

Comments

@Blisto91
Copy link

Blisto91 commented Jan 9, 2024

The d3d9 Strict float emulation path in dxvk (see links below for technical description) is not enabled by default for all drivers, even though it is more correct, because it has a performance penalty compared to the default True.
Radv and now also nvk have code to optimize for this and so will both use Strict out of the box without any performance penalty and with more games functioning out of the box without visual issues.

Amdvlk currently doesn't do this and so will either have a performance penalty for any games where dxvk sets Strict by default or risk of visual issues in any games where such builtin configs doesn't exist yet. A couple of examples for illustrating the performance dip can be seen below.
Note that these games are just randomly chosen and are not meant to be worst case scenarios. Also note that my test setup is pretty high end to begin with (RX6800 and 7950x) and so does not represent a typical one.

Risen

d3d9.floatEmulation = True
emulation-true

d3d9.floatEmulation = Strict
emulation-strict

Dragons Dogma

d3d9.floatEmulation = True
dragon-emulation-true

d3d9.floatEmulation = Strict
dragon-emulation-strict

See original dxvk PR doitsujin/dxvk#2294
See also radv MR https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436

@Blisto91 Blisto91 changed the title [dxvk] Optimize for the d3d9 Strict float emulation path in dxvk [dxvk] Optimize for the d3d9 Strict float emulation path Jan 9, 2024
@jinjianrong jinjianrong added the assigned The issue is assigned to engineer label Jan 11, 2024
@ruiminzhao
Copy link

ruiminzhao commented Feb 19, 2024

@Blisto91 Thanks for your comments. Now I'm investigating this issue on Amdvlk.
A question here:
Do you have any SPIRV generated when "d3d9.floatEmulation = True" or "d3d9.floatEmulation = Strict" ? I want to know if this setting will be reflected in the shader. Then I can do the optimization according to the related flag in SPIRV.
Thanks.

@Blisto91
Copy link
Author

Blisto91 commented Feb 23, 2024

Hi there and thank you for the response. I am not personally skilled in this area but I have asked the dxvk devs for assistance when they have a bit of time.

@DadSchoorse
Copy link

I see GPUOpen-Drivers/llpc@e91a935 added an optimization for ((b==0.0 ? 0.0 : a) * (a==0.0 ? 0.0 : b)). But dxvk also emits fma((b==0.0 ? 0.0 : a), (a==0.0 ? 0.0 : b), c) . So unless llpc lowers fma to mul+add, you should also add a pattern that optimizes the fma version to v_fma_legacy_f32/v_mad_legacy_f32. And depending on if you run constant folding before the optimizations, you also want to handle the case where the comparison+select was optimized away for one mul operand, (a * (a==0.0 ? 0.0 : b))/fma(a, (a==0.0 ? 0.0 : b), c), if b is not constant zero.

@ruiminzhao
Copy link

ruiminzhao commented Mar 24, 2024

@DadSchoorse Thanks for your comment. Now I have added more patterns as you refer, now the patterns supported is listed below:

  1. ((b==0.0 ? 0.0 : a) * (a==0.0 ? 0.0 : b)) ==>fmul_legacy(a,b)
  2. a * (a==0.0?0.0:b) or (b==0.0?0.0:a) * b ==>fmul_legacy(a.b)
  3. fma((b==0.0 ? 0.0 : a), (a==0.0 ? 0.0 : b), c) ==>fma_legacy(a,b,c)
  4. fma(a, (a==0.0 ? 0.0 : b), c) or fma(b==0.0?0.0:a, b, c) ==>fma_legacy(a,b,c)

For 2.3, one more condition is the single operand(a or b) should not be constant zero here.

Please check any missing here. Now my fix is under CI, looking forward to merge and deliver it ASAP.
Thanks.

@DadSchoorse
Copy link

DadSchoorse commented Mar 24, 2024

For 2.3, one more condition is the single operand(a or b) should not be constant zero here.

What I've said before may have been a bit ambiguous, so just to make sure: For a * (a==0.0?0.0:b) it's important that b is not zero. So if (b.isConstant() && b.constantValue() != 0.0) { apply_opt(); }, not if (!b.isConstant() || b.constantValue() != 0.0).

Otherwise, your list matches what radv optimizes.

@DadSchoorse
Copy link

Oh, another thing I just thought of, I don't see a bit size check in GPUOpen-Drivers/llpc@e91a935 . v_mul_legacy_f32/v_fma_legacy_f32 are 32bit only.

@Blisto91
Copy link
Author

Was this work supposed to be enabled in the 2024.Q2.1 release?
I tried a quick test with my iGPU in Risen 1 and still get a big performance drop when setting d3d9.floatEmulation = Strict

AMDVLK 2024.Q2.1

d3d9.floatEmulation = True
Q2 1-true

d3d9.floatEmulation = Strict
Q2 1-strict

RADV for comparison

d3d9.floatEmulation = True
radv-true

d3d9.floatEmulation = Strict
radv-strict

@ruiminzhao
Copy link

@Blisto91 Thanks for your feedback. For the cause of the issue, I have two suspects here:

  1. My fix hasn't fit the pattern of IR pattern generate by the game.
  2. Other fix which is related with fastmath flag has broken the pattern on which I have the optimized.

To confirm which one cause this issue, would you please(or ask dxvk devs for assistance) to dump the pipeline then I can check whether my optimization has been effective.

Thanks.

@K0bin
Copy link

K0bin commented May 20, 2024

You can dump the shaders by setting the environment variable DXVK_SHADER_DUMP_PATH=/your/path and then running the game with DXVK.

That will export the generated SPIR-V among other things.

Any D3D9 game will work, you just need to also set the environment variable DXVK_CONFIG=d3d9.floatEmulation = Strict; to enable the accurate float behavior.

@jinjianrong jinjianrong added the reproducing Reproducing the issue label May 21, 2024
@jinjianrong jinjianrong assigned ruiminzhao and unassigned amdrexu May 22, 2024
@Blisto91
Copy link
Author

Linked is a dxvk shader dump from Risen 1 running on my 7950x iGPU with amdvlk 2024.Q2.1 and d3d9.floatEmulation = strict

https://drive.proton.me/urls/SF8RPVZ6CG#Rk7KIIG4d480

@ruiminzhao
Copy link

@Blisto91 Thanks. But unfortunatelly I can't access this link.... Maybe you can add the related files in this page attached?

@Blisto91
Copy link
Author

@ruiminzhao Hi there. I hope a 7zip wrapped in a zip is fine as most formats Github allows doesn't compress enough on their own.
Risen-amdvlk-strict-float.zip

@ruiminzhao
Copy link

@ruiminzhao Hi there. I hope a 7zip wrapped in a zip is fine as most formats Github allows doesn't compress enough on their own. Risen-amdvlk-strict-float.zip

Thanks. I can get the log now and will have a look later.

@ruiminzhao
Copy link

The root cause has been found, the pattern used widely is like:
"

%2272 = select reassoc nnan nsz arcp contract afn i1 %2270, float 0.000000e+00, float %2271
%2273 = insertelement <3 x float> poison, float %2272, i64 0

%2276 = select reassoc nnan nsz arcp contract afn i1 %2274, float 0.000000e+00, float %2275
%2277 = insertelement <3 x float> %2273, float %2276, i64 1

%2280 = select reassoc nnan nsz arcp contract afn i1 %2278, float 0.000000e+00, float %2279
%2281 = insertelement <3 x float> %2277, float %2280, i64 2
..

%2293 = select reassoc nnan nsz arcp contract afn i1 %2291, float 0.000000e+00, float %2292
%2294 = insertelement <3 x float> poison, float %2293, i64 0

%2297 = select reassoc nnan nsz arcp contract afn i1 %2295, float 0.000000e+00, float %2296
%2298 = insertelement <3 x float> %2294, float %2297, i64 1

%2301 = select reassoc nnan nsz arcp contract afn i1 %2299, float 0.000000e+00, float %2300
%2302 = insertelement <3 x float> %2298, float %2301, i64 2

%2303 = fmul reassoc nnan nsz arcp contract afn <3 x float> %2281, %2302

"
It hasn't been caught, it needs to reorder the process like this:
"
Many other transforms
Scalarizer pass
fmul_legacy / fma_legacy matching
"

@jinjianrong jinjianrong added reproduced The issue is reproduced by CQE and removed reproducing Reproducing the issue labels Jun 4, 2024
@ruiminzhao
Copy link

I think the further fix for this issue has been delivered, maybe you can have a validation in the latest release. Thanks.

@Blisto91
Copy link
Author

Blisto91 commented Aug 7, 2024

@ruiminzhao Hi there.
I am not seeing any change in 2024.Q2.3 compared to prior test with 2024.Q2.1. Tested on my 7950x iGPU

@ruiminzhao
Copy link

Oh, it seems this issue hasn't been delivered on 2024.Q2.3. Once it's delivered, I would ask you for further validation. Thanks.

@Blisto91
Copy link
Author

Thank you for the work everyone, i can confirm this looks good now with 2024.Q3.1.
doitsujin/dxvk#4203

Risen screenshots on 7950x iGPU

d3d9.floatEmulation = True
image

d3d9.floatEmulation = Strict
image

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned The issue is assigned to engineer reproduced The issue is reproduced by CQE
Projects
None yet
Development

No branches or pull requests

6 participants