Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize .hdr loading and RGB9E5 conversion #95291

Merged
merged 1 commit into from
Aug 16, 2024

Conversation

BlueCube3310
Copy link
Contributor

@BlueCube3310 BlueCube3310 commented Aug 8, 2024

Improves .hdr file loading/importing times by doing the following:

Results:
Optimized x64 build
OS: Windows 10 64-bit
CPU: AMD Ryzen 9 5900X 12-Core
RAM: 64 GB DDR4

Loading only:

master PR Image
7354 ms 329 ms https://polyhaven.com/a/symmetrical_garden_02 - 8K
6901 ms 283 ms https://polyhaven.com/a/zwartkops_straight_sunset - 8K
1865 ms 78 ms https://polyhaven.com/a/cobblestone_street_night - 4K
114 ms 4 ms https://polyhaven.com/a/st_peters_square_night - 1K

On average, this PR loads data ~25x faster.

Whole import:

master w mips PR w mips master no mips PR no mips Image
12318 ms 3227 ms 9268 ms 2227 ms https://polyhaven.com/a/symmetrical_garden_02 - 8K
11845 ms 3240 ms 8884 ms 2212 ms https://polyhaven.com/a/zwartkops_straight_sunset - 8K
3190 ms 820 ms 2380 ms 602 ms https://polyhaven.com/a/cobblestone_street_night - 4K
204 ms 69 ms 142 ms 32 ms https://polyhaven.com/a/st_peters_square_night - 1K

On average, the import time for images with mipmaps is ~3.5x faster. Without mipmaps, it is ~4x faster.
The engine also generates mipmaps, saves the data to the import directory and reloads it again, which is bottlenecked by the disk's read/write speeds (The project was on an HDD).

TODO:

  • verify whether the converted values are accurate,
  • add more accurate performance benchmarks.

@fire
Copy link
Member

fire commented Aug 8, 2024

Can you post your samples as png / openexr converted from the your test cases to compare?

@BlueCube3310
Copy link
Contributor Author

Here are the samples in .exr format: hdrsamples.zip

I'll upload png comparisons later.

@BlueCube3310
Copy link
Contributor Author

master PR diff
falls_unopt falls_opt falls_diff
square_unpot square_opt square_diff
table-unopt table-opti table-diff

Image sources:
https://polyhaven.com/a/crystal_falls - Greg Zaal - CC0
https://polyhaven.com/a/st_peters_square_night - Andreas Mischok - CC0
https://polyhaven.com/a/table_mountain_1 - Greg Zaal - CC0

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great and the performance improvements are very exciting. I assume most of the performance benefit comes from not reading 8 bits at a time?

core/math/color.h Outdated Show resolved Hide resolved
modules/hdr/image_loader_hdr.cpp Outdated Show resolved Hide resolved
@BlueCube3310
Copy link
Contributor Author

The code looks great and the performance improvements are very exciting. I assume most of the performance benefit comes from not reading 8 bits at a time?

Interestingly, the RGBE conversion makes the biggest difference. The fast low-range optimization had the biggest impact on load speeds.

@clayjohn
Copy link
Member

clayjohn commented Aug 9, 2024

The code looks great and the performance improvements are very exciting. I assume most of the performance benefit comes from not reading 8 bits at a time?

Interestingly, the RGBE conversion makes the biggest difference. The fast low-range optimization had the biggest impact on load speeds.

Wow, that is really surprising! But I guess it is more than just a few extra instructions, the old path moves data around a lot more too.

@BlueCube3310 BlueCube3310 marked this pull request as ready for review August 10, 2024 20:52
@BlueCube3310 BlueCube3310 requested review from a team as code owners August 10, 2024 20:52
@clayjohn clayjohn modified the milestones: 4.x, 4.4 Aug 12, 2024
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected. The imported HDR panoramas look pretty much identical to before (I can't tell a difference with ACES tonemapping and whitepoint set to 6 in Forward+).

Benchmark

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 40)

Using an optimized editor build (optimize=speed lto=full). .godot/ folder is removed between each iteration.

Here's the time taken to open the editor, import the 4 HDRIs linked in OP with lossless compression and exit the editor. The times shown here include editor startup and shutdown, so the import process is actually more than 1.45× faster compared to before (probably at least 3× faster in practice).

Benchmark 1: bin/godot.linuxbsd.editor.x86_64.master ~/Documents/Godot/test_hdri_import/project.godot --import
  Time (mean ± σ):      7.856 s ±  0.402 s    [User: 11.515 s, System: 1.181 s]
  Range (min … max):    7.326 s …  8.333 s    10 runs
 
Benchmark 2: bin/godot.linuxbsd.editor.x86_64 ~/Documents/Godot/test_hdri_import/project.godot --import
  Time (mean ± σ):      5.399 s ±  0.203 s    [User: 5.504 s, System: 1.076 s]
  Range (min … max):    5.121 s …  5.698 s    10 runs
 
Summary
  bin/godot.linuxbsd.editor.x86_64 ~/Documents/Godot/test_hdri_import/project.godot --import ran
    1.45 ± 0.09 times faster than bin/godot.linuxbsd.editor.x86_64.master ~/Documents/Godot/test_hdri_import/project.godot --import

Stripped binary size is 8 KB smaller with this PR compared to master.

@akien-mga akien-mga merged commit 886d586 into godotengine:master Aug 16, 2024
18 checks passed
@akien-mga
Copy link
Member

Thanks!

@BlueCube3310 BlueCube3310 deleted the hdr-optimizations branch October 13, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants