Skip to content

Conversation

@mrxz
Copy link
Collaborator

@mrxz mrxz commented Jul 15, 2025

While the decoding of images for the SOGS format was happening asynchronously, the process was awaiting the result each time. This prevents the browser from decoding the images in parallel.

This PR changes the loading function such that once the decoding of relevant images for an attribute is done (e.g. scales, colors, shN) the processing for that attribute begins. This ensures that the decoding of all images can start as soon as possible, improving loading times.

Here's an image showing the process before this PR (using the SOGS example):
image
And with this PR:
image

Note that the above images don't have the exact same time scale. Measuring the full span of unpackPcSogs the duration drops from ~1900ms to ~1600ms (measured while profiling in Chrome dev tools).

@dmarcos
Copy link
Contributor

dmarcos commented Jul 15, 2025

This is awesome thank you!

Copy link
Contributor

@asundqui asundqui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement!

One slight concern: In theory this would increase the peak GPU memory utilization during decoding... Someone yesterday sent me a post saying that iOS Safari only allows approx 380 MB per tab to be allocated on the GPU? Haven't verified this myself but I wonder if people try to load 10M splat scenes for example, could this cause more problems?

There are likely many other places in the code that also become problematic for large # splats so I don't think we should over-index on this one case, but something we should perhaps keep in mind for the future when we start to face more "my huge splat scene won't load in Spark!" :)

@mrxz
Copy link
Collaborator Author

mrxz commented Jul 16, 2025

One slight concern: In theory this would increase the peak GPU memory utilization during decoding... Someone yesterday sent me a post saying that iOS Safari only allows approx 380 MB per tab to be allocated on the GPU? Haven't verified this myself but I wonder if people try to load 10M splat scenes for example, could this cause more problems?

Safari on iOS is indeed known to limit the memory usage per tab. While I don't know the details, I thought the limit wasn't specific to GPU memory, given its unified memory. Either way peak memory utilization can indeed be higher with this change, so it's a valid concern.

Let's keep this in mind when we do indeed face "my huge splat scene won't load in Spark!" issues. Like you said, odds are there are many more places in the code that might be problematic in that case.

@dmarcos dmarcos merged commit c803096 into sparkjsdev:main Jul 16, 2025
2 checks passed
@vv4dvv
Copy link

vv4dvv commented Aug 2, 2025

This PR changes the loading function such that once the decoding of relevant images for an attribute is done (e.g. scales, colors, shN) the processing for that attribute begins. This ensures that the decoding of all images can start as soon as possible, improving loading times.
@mrxz The improvement is excellent! especially when testing on a local server — the parallel loading speed is noticeably faster.
However, once it's deployed to a cloud server, the loading still feels more like it's happening serially.
It seems that only one WebP file can be loaded at a time.

Previously, you mentioned that PlayCanvas uses XMLHttpRequest for loading, and their loading appears to be truly parallel and much faster.Could you compare and benchmark the loading speeds of the two methods (fetch vs XHR)?

Also, the AI explanation was that although HTTP/2's multiplexing should theoretically be faster, in practice — due to CDN or proxy misconfigurations, or browser-level bottlenecks — multiple XHR requests (over HTTP/1.1) can sometimes outperform a single large fetch (over HTTP/2). What’s your opinion on this explanation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants