Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

softgpu: Optimize (bi-)linear texture filtering #17609

Merged
merged 1 commit into from
Jun 22, 2023

Conversation

fp64
Copy link
Contributor

@fp64 fp64 commented Jun 21, 2023

Seeing as SampleLinearLevel is near the top in the profiler, optimize actual bilinear filtering using SSE2. Solid win in the synthetic benchmark (https://godbolt.org/z/fqh3xvbGx, also doubles as correctness check), no visible difference in actual PPSSPP. Note: profiler suggests that hot part of SampleLinearLevel is elsewhere.

Seeing as SampleLinearLevel is near the top in the profiler,
optimize actual bilinear filtering using SSE2. Solid win in the
synthetic benchmark (https://godbolt.org/z/fqh3xvbGx, also doubles
as correctness check), no visible difference in actual PPSSPP.
Note: profiler suggests that hot part of SampleLinearLevel is
elsewhere.
@hrydgard hrydgard added this to the v1.16.0 milestone Jun 21, 2023
@hrydgard
Copy link
Owner

I keep making various optimizations myself that locally look like great wins but seems to have barely a measurable effect overall... but it's hard to measure. Machines clock up and down according to load, etc.

This one has to be a win on some dimension, maybe power consumption :P I'm all for merging it, though I'll let @unknownbrackets click merge.

@fp64
Copy link
Contributor Author

fp64 commented Jun 21, 2023

Well, in my case "observable difference" would constitute going from 7 FPS on average to 8 - 12.5% improvement, pretty significant for a single function change. It oscillating between 6 and 9 FPS does not help measuring.

Offtopic, but while eyeing softgpu for more optimization opportunities, I have several questions, which I'm not sure where to ask. The discord would seem a logical choice... if the damn thing would actually work for me. Maybe I'll just create "softgpu optimization opportunities" issue, or something.

@unknownbrackets
Copy link
Collaborator

This would only apply for 32-bit Intel, you're not going to end up in this function on x86_64. So it probably won't actually make any difference for most users. I'd tried to avoid over optimizing this code for SSE given that we're already using a jit for it that is much faster (especially with AVX2.)

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Jun 21, 2023

Oh right, forgot about that, hah.

Do feel free to create a discussion issue if you want.

@fp64
Copy link
Contributor Author

fp64 commented Jun 21, 2023

Oh, looks like I'm blind. I somehow thought that DrawPixelX86.cpp was the only special JIT path, but SamplerX86.cpp is a thing too.
While I care about 32-bit perf on x86 (and am mildly hopeful about improving it to more palatable levels there in softgpu), I realize that most people don't.

Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this seems reasonable, so I'll merge.

-[Unknown]

@unknownbrackets unknownbrackets merged commit 76990ae into hrydgard:master Jun 22, 2023
@fp64 fp64 mentioned this pull request Jun 22, 2023
2 tasks
@fp64 fp64 deleted the optimize-softgpu-tex-linear branch June 30, 2023 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants