-
-
Notifications
You must be signed in to change notification settings - Fork 21.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transfer queue support to RenderingDevice and enable multithreaded resource loading #87590
Conversation
On the editor or a game? The benefits mostly apply to using threaded resource loading (e.g. loading with an animated loading screen or during gameplay), it shouldn't really matter that much on situations where it's loading but stuck without drawing anything. Also you can get fairly accurate measurements by just measuring the ticks using GDScript in a project and displaying them somewhere, no need to use a stopwatch. |
f39d302
to
72e7831
Compare
5f781dd
to
8b01c94
Compare
I can confirm during testing that #86333 brings performance back up to par with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally (rebased on top of master
10e1114 + #86333), it works as expected on Linux + NVIDIA.
On https://github.com/godotengine/tps-demo, it loads a full second faster than on master
with neither PR applied.
8b01c94
to
c4240a4
Compare
Now that context rewrite is merged I guess we can start discussing this PR. Right now I'm fairly happy with its current state, but some questions remain to be answered if we're fine with the current design in regards to the extra memory consumption it implies. As threaded workers are created, that means each worker that can upload stuff in parallel can have a variable cost up to the size of the biggest resource that's been uploaded. We don't want to increase memory consumption endlessly, so the current approach I took was to take the current processor count as the max amount of workers possible. In theory this means that potentially the memory consumption can increase as much as N CPU cores multiplied by the memory consumption biggest resource (if said resources were all uploaded in parallel). On the plus side, the staging buffer could actually be much smaller now as it's no longer used for uploading large resources but for uploading per-frame data now. I'm not sure if this is actually the approach we prefer or we should make this configurable somehow, but it does make sense that a system with more cores is likely to have more memory available (1 to 2 GB per core is usually the sweet spot or you run into paging issues very often on multi-core systems). I don't think exposing a fixed limit would make much sense either as it means it can't take advantage of the extra system resources effectively, but it might be warranted depending on the use case. The current version also implies a CPU performance regression unless #86333 is merged first due to having to enable thread safety on the RID Owners. In either case I'm taking this out of drafts so we can start looking into it, but I'm not sure if it should make it in 4.3 or not just yet as the testing to find the issues with it might need to be very extensive (and it also exposes existing multi-threading issues like we found out with #87721). |
// TODO: How should this max size be determined? | ||
transfer_worker_pool_max_size = OS::get_singleton()->get_processor_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked this very deeply yet. So far, I got the impression that this code is not spawning threads but having parallel bookkeeping for thr threads already existing that may be loading resources.
If that's true, a beginning would be WorkerThreadPool::get_thread_count()
. Even if currently resource loading is only allowed on low priority threads, that can mean the whole thread pool being used because there's the potential of more low-prio tasks in flight than that number and also because the are no pre-assigned low-prio threads, so any thread from the pool can be running a low-prio task at a given moment.
That said, the main thead as well as arbitrary threads can be loading resources at a given point. So, I'd at least add one to that count. If user creates extra threads while a WTP-backed load is in progress, well, it's only fair that they have to wait for a transfer worker to be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, it doesn't spawn threads on its own nor does it pre-create them, it just allows a maximum number of concurrent workers. Using WTP as a reference does seem like a good start.
…ial data on RenderingDevice.
c4240a4
to
b741d3f
Compare
if (!src_buffer) { | ||
ERR_FAIL_V_MSG(ERR_INVALID_PARAMETER, "Source buffer argument is not a valid buffer of any type."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!src_buffer) { | |
ERR_FAIL_V_MSG(ERR_INVALID_PARAMETER, "Source buffer argument is not a valid buffer of any type."); | |
} | |
ERR_FAIL_NULL_V_MSG(src_buffer, ERR_INVALID_PARAMETER, "Source buffer argument is not a valid buffer of any type."); |
if (!dst_buffer) { | ||
ERR_FAIL_V_MSG(ERR_INVALID_PARAMETER, "Destination buffer argument is not a valid buffer of any type."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!dst_buffer) { | |
ERR_FAIL_V_MSG(ERR_INVALID_PARAMETER, "Destination buffer argument is not a valid buffer of any type."); | |
} | |
ERR_FAIL_NULL_V_MSG(dst_buffer, ERR_INVALID_PARAMETER, "Destination buffer argument is not a valid buffer of any type."); |
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
…ice. Add ubershaders and rework pipeline caches for Forward+ and Mobile. - Implements asynchronous transfer queues from PR godotengine#87590. - Adds ubershaders that can run with specialization constants specified as push constants. - Pipelines with specialization constants can compile in the background. - Added monitoring for pipeline compilations. - Materials and shaders can now be created asynchronously on background threads. - Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
A few major bottlenecks were identified in Godot that this PR addresses.
The PR solves all these bottlenecks and also simplifies the separate setup command buffer logic that was in place in RenderingDevice. RenderingDevice is now able to create "Transfer Workers" on the fly that can use their own staging buffer and command buffer to record the upload of all resources queued by the thread. Once it reaches a certain saturation point, it'll submit all the work to the dedicated transfer queue. The main loop will wait if necessary for these transfer workers if it's been determined that one of the resources that was uploaded with a transfer worker was used in the frame. This design allows to effectively upload resources in the background without stalling the main thread at all until the resource needs to be used.
The PR also changes the flexibility of the thread safety of RenderingDevice: whatever is allowed to be called from different threads is now allowed with no mutex involved, and whatever is not allowed now has a thread guard in place instead. A thread guard is a much cheaper option than a mutex and it also verifies that RenderingDevice is being used with its intended behavior: drawing from the main loop and not from other threads that could potentially cause issues. This was mostly done following @reduz's advice as noted here.
So far we lack a public project that allows to showcase loading a large amount of resources, but one of W4's internal projects that uses a large amount of textures and meshes had some good results:
master
with Vsync enabled (60 HZ Monitor): ~25 secondsmaster
with Vsync disabled: ~2.5 seconds (10x faster!)transfer_queues
with both Vsync enabled and disabled: ~2.1 secondsWe can see one of the major benefits of this approach is that it makes the behavior much more consistent regardless of the user's vsync settings and refresh rate.
Projects with lots of resources that rely on background resource loading during loading screens or even during gameplay are the ideal candidates for testing out this PR.
TODO:
RID_Owner
lock-free for fetching. #86333).