Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-blocking large textures uploading to GPU - maybe via tiled uploading copyTextureToTexture #28101

Closed
jo-chemla opened this issue Apr 9, 2024 · 12 comments
Milestone

Comments

@jo-chemla
Copy link
Contributor

jo-chemla commented Apr 9, 2024

Description

We are developing a threejs application which will be deployed on a very controlled hardware, which allows us to use high-resolution meshes and textures. The final scene has around 1M tris (not that large) and a few 16k textures (which does take a few seconds to load). We have a small proxy mesh loaded first as LOD0 which should allow for controls to still be responsive, while the real heavy meshes geometry+textures are loaded.

However, as described in this non-blocking assets loaders thread, the primary thing blocking the main thread is uploading textures to GPU - on the following screenshot, 8s to load the 7x16k and 8k textures. During these uploadTexture, the scene and controls are unresponsive, since the GPU cannot process frame updates and renders during texture uploads.

image

Solution

The above thread led to the TaskManager discussion yielding the WorkerPool.js implementation mid-2022.

Seeing the recent copyTextureToTexture WebGPURenderer support PR and this small stackoverflow thread I was wondering if it would be doable that the worker handling texture loading and uploading to the GPU could be parametrized to do it in chunks/tiles, so the uploadTexture bits could become non-blocking. As imagined in the SO post, we could use texSubImage2D/copyTextureToTexture to upload small chunks that fit in a 16 ms time window (if aiming for 60 fps). This way, the update/render loops would still be responsive, and texture would be loading incrementally until completely uploaded to GPU.

This could be parametrized for example by the tile count or resolution (NxM tiles or XxY pixels wide).

Alternatives

At the moment, we are using glb/gltf meshes, with draco mesh compression and avif texture compression (lower bandwidth impact). Switching to ktx2 texture compression yields textures which have ~10x lower gpuSize according to gltf-transform inspect (although ~2-3x larger on disk), which should proportionally reduce uploadTexture duration.

Additional context

No response

@Mugen87
Copy link
Collaborator

Mugen87 commented Apr 10, 2024

Couldn't you represent your textures as image bitmaps instead? That would solve the blocking issues since the texture decode does not block the main thread anymore.

Another solution would be to transcode your texture into a compressed format. However, AVIF is similar to JPEG or PNG a compression format that isn't retained on the GPU so you end up with a decode overhead. The usage of KTX2 isn't satisfying for you? KTX2 produces a GPU format on the client side that should resolve the decode overhead since no decode happens on the CPU side.

I would like to understand why you can't use one of these solutions before thinking about alternatives.

@jo-chemla
Copy link
Contributor Author

jo-chemla commented Apr 10, 2024

As shown in the profiler, the blocking operation seem to be the uploadTexture itself, to the GPU by the main thread, rather than image-decode. I thought what was blocking was actually that the GPU could not render a frame during upload, rather than the CPU main thread being occupied, but this is probably a misunderstanding.

AVIF is indeed akin to JPEG, PNG or WebP and require cpu-decode. We just tried with gpu-compressed KTX2 textures and the uploadTexture duration is indeed smaller - reduced by 2x as shown on this screenshot. So this is already a decent perf boost, thanks for the pointer, although not a factor 10x as anticipated given that gpuSize of our ktx2 is ~10x smaller than our avif.

Given that this is the uploadTexture's that are blocking the main thread, would there be another way for the controls to be responsive during texture upload? Rather than engineering a complex workaround like the one described above (to tile the uploadTexture of large textures in chunks via copyTextureToTexture), would it be possible to delegate uploadTexture to a worker thread other than the main thread?

image

Note on another topic, interesting read regarding virtual textures here

@donmccurdy
Copy link
Collaborator

donmccurdy commented Apr 10, 2024

I believe uploadTexture includes both the time of decoding (for JPG/PNG/WebP/AVIF...) and uploading. So if you aren't using ImageBitmap yet, it should indeed decrease the time for image formats that need to be decompressed. But as a 16k texture with mipmaps will occupy about 1.5 GB of VRAM, uploading that to the GPU is going to drop frames with or without ImageBitmap. Even with KTX2 (normally I expect 4-8x compression in VRAM) it's still a lot.

Unfortunately, there's no way to upload textures from another thread. I'm unsure whether that's a WebGL limitation, or a lower-level limitation.

@jo-chemla
Copy link
Contributor Author

jo-chemla commented Apr 10, 2024

Thanks for these precisions - that uploadTexture timings includes both decoding + uploading, and that textures cannot be uploaded from another thread. Since ktx2 does not require decoding before sending the texture to the GPU, then using ImageBitmap should yield no better timings, is that correct?

If this is true, then we should either wait for all the textures to be loaded to the gpu, or find a way to stream these textures to the gpu bit by bit and still have the main thread and render loop reach 30-60 fps. Or use other mechanisms for loading large resolution textures, like texture streaming/progressive loading as presented in this threejs discussion or this babylon discussion for example. Thanks again for your help!

@donmccurdy
Copy link
Collaborator

Since ktx2 does not require decoding before sending the texture to the GPU, then using ImageBitmap should yield no better timings, is that correct?

Yes – KTX2 requires a transcoding step to a GPU-compatible compressed format, but transcoding does happen off the main thread (in WASM) before upload.

ImageBitmap should improve results compared to using PNG/JPG/WebP/AVIF without ImageBitmap, but the amount of data uploaded to the GPU is still 4-8x larger than with KTX2, and I'd expect upload time to be about 4-8x worse than KTX2 accordingly. If the upload time for the 16K KTX2 texture is already unacceptable for your application, ImageBitmap will be worse.

I'm less familiar with streaming texture upload, or what's required to support it... but I suspect that's the only way to upload uncompressed formats with less total blocking time than ImageBitmap offers, or to reduce the (already lower) blocking time with KTX2.

@donmccurdy
Copy link
Collaborator

The babylonjs thread discusses progressive upload of entire mipmap levels. This would get you a blurry version of the texture ready to render very quickly. But the largest mipmap (level=0) still represents about 1.1 GB of data for a 16k uncompressed texture, and so there's still a large chunk of blocking time before that full-res mipmap becomes available.

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 10, 2024

@donmccurdy That's exactly what I do 😊.
But it doesn't have to look blurry. This depends heavily on the available texture quality.

@jo-chemla
I made the video with the spaceship at the bottom right using the WebGPURenderer.

https://the-mars-project.com/

This runs absolutely smoothly. My camera control is just a bit haptic. I've softened that up now. I load the many tiles in workers (multithreading). I now also do this with the normalmap and I can use it to display huge textures. I'm always happy about performance improvements, but loading textures dynamically without blocking the main thread works very well for me. If there had been larger textures for the spaceship, I would have used them. My site doesn't look particularly impressive, I apologize for that.

Threejs can only offer what the WGSL standard enables, but threejs does that very efficiently by directly using the W3C commands. The new node-based system is very impressive. I use a tiled upload technology but it is technically very extensive.

Here is the background about implementing copyTextureToTexture. That's what I wished for r163:
#27859 (comment)

@donmccurdy
Copy link
Collaborator

I load the many tiles in workers (multithreading).

What loading do you mean? The texture upload to the GPU cannot be done from a worker. Hoping to understand how your solution is implemented.

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 10, 2024

@donmccurdy
Yes, you're right, I'll be more precise. I load all the texture data into workers. Then I share all the texturedatas with the main thread and this only needs to send the data to the GPU with copyTextureToTexture. I have outsourced all of the load-intensive activities from the main thread. The main thread then only needs to pass the loaded data to the GPU and it works exactly as it should and very well.

@Spiri0
Copy link
Contributor

Spiri0 commented Apr 10, 2024

@jo-chemla
Even if you send data to the GPU from a worker, it doesn't improve performance. I once implemented several threejs renderers with several workers because I thought I would simply move render-intensive activities to workers and then return data to the main thread at the end. Yes, with a little thinking I could have saved myself the trouble 🤦‍♂️
There may be several CPU cores that I can use with workers, but each renderer only accesses one existing GPU and I cannot use individual GPU cores for rendering. With the WebGPURenderer everything works much better thanks to the compute shaders. This makes things possible that I could only dream of in WebGL. In my ocean repository i compute nearly 100 (256x256) textures in each frame and it runs very good.

@Mugen87
Copy link
Collaborator

Mugen87 commented Apr 11, 2024

It seems the project already exposes the tools which are required for implementing a more fine-grained texture upload. For now, let's leave it in the responsibility of the application to use the tools according to its specific requirement.

@Mugen87 Mugen87 closed this as completed Apr 11, 2024
@jo-chemla
Copy link
Contributor Author

Thanks all for your feedback.

@Spiri0 Thanks for the link to your demo app, that's interesting. Yes I'm not looking for improved performance, since data has to be uploaded to the GPU anyway - and if this has to happen via the main thread, then the only way I can think of to avoid freezes would be to upload by chunks/tiles, a bit at each frame update/render. In terms of data volume, 100 256² textures is equivalent to a 2k texture upload, so roughly 64 less data-heavy than a single 16k texture.

And indeed @Mugen87 this can be left as the responsibility of the application to implement that feature, although making that uploadTextureByChunk via copyTextureToTexture generic enough might be useful for anyone willing to upload large textures to GPU without impacting perceived performance for the user during upload. If we do end-up implementing that feature, I'll be happy to contribute it as a PR - although using KTX2 already greatly helps, reducing load times from 8s to ~2s.

@Mugen87 Mugen87 added this to the r164 milestone Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants