Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize image conversion for half and float formats. #92291

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

BlueCube3310
Copy link
Contributor

@BlueCube3310 BlueCube3310 commented May 23, 2024

Optimizes conversion between RGBA variants of half and float image formats.
On average, this makes the conversion process ~7 times faster, depending on the formats. This is especially important for RGB images, which need to be converted to RGBA by most GPUs.

Build - Windows 64-bit, production
CPU: Ryzen 9 5900X 12-Core
RAM: 64 GB DDR4 3000 MHz

4096x4096 image:

Conversion Unopt Opt
RH - RGH 133ms 14ms
RH - RGBH 157ms 28ms
RH - RGBAH 176ms 31ms
RGBH - RGBAH 226ms 32ms
RGBAH - RH 163ms 10ms
RGBAH - RGBH 214ms 22ms

MRP: image-half-convert.zip

@BlueCube3310 BlueCube3310 force-pushed the image-convert-optimized branch from 7ad67d6 to d260a2b Compare May 24, 2024 09:01
@AThousandShips AThousandShips added this to the 4.x milestone May 24, 2024
@BlueCube3310 BlueCube3310 marked this pull request as ready for review May 24, 2024 09:44
@BlueCube3310 BlueCube3310 requested a review from a team as a code owner May 24, 2024 09:44
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@clayjohn clayjohn modified the milestones: 4.x, 4.4 Jun 13, 2024
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected.

Benchmark

Using an optimized editor binary (optimize=speed lto=full).

Testing project: image-half-convert-benchmark.zip
(only source format is taken into account in the benchmark)

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 39)

The time taken to create an image and convert its format 1,000 times (the image is recreated on every iteration seems identical before and after this PR. Here it is for reference:

Format After (this PR)
RH 77.5 ms
RGH 99.5 ms
RGBH 119.5 ms
RGBAH 140.8 ms
RF 65.5 ms
RGF 77.1 ms
RGBF 75.2 ms
RGBAF 79.9 ms

I'm a bit surprised I don't see any performance improvement in this scenario, but at least it's not slower than before.

Note that the stripped binary size also grows by 8 KB with this PR, likely due to the addition of a template function. It's not a huge issue but I thought it'd be worth mentioning nonetheless.

@BlueCube3310
Copy link
Contributor Author

BlueCube3310 commented Jun 13, 2024

Tested locally, it works as expected.

Thank you for the review!

I'm a bit surprised I don't see any performance improvement in this scenario, but at least it's not slower than before.

The first conversion happens between the source 8-bit uint image and the specified half/float format. In order to get the correct results, only the second conversion/match statement has to be measured.
This PR only affects the conversion times between halfX-halfY and floatX-floatY conversions, since these ones have the same data type and can be handled with a simple memcpy, instead of the slower fallback function. I apologise if I hadn't been clear about that. The source format also needs to differ from the destination.

Here's the benchmark with the different measurements: image-half-convert-benchmark.zip

@akien-mga akien-mga merged commit eb684cc into godotengine:master Aug 16, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants