Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple threads to import resources. #47343

Merged
merged 1 commit into from
Apr 19, 2021

Conversation

reduz
Copy link
Member

@reduz reduz commented Mar 24, 2021

  • For now everything imports multithreaded by default (should work I guess, let's test).
  • Controllable per importer

Early test benchmark. 64 large textures (importing as lossless, not as vram) on a mobile i7, 12 threads:
Importing goes down from 46 to 7 seconds.

For VRAM I will change the logic to use a compressing thread in a subsequent PR, as well as implementing Betsy.

@reduz reduz requested a review from a team as a code owner March 24, 2021 23:48
@reduz reduz changed the title Use multiple threads to import. Use multiple threads to import resources. Mar 24, 2021
@Calinou Calinou added this to the 4.0 milestone Mar 24, 2021
@akien-mga
Copy link
Member

I gave this PR a test rebased on top of #47370, and I'm having some issues when trying to import the current tps-demo:

  • On the first import attempt (after clearing .godot/ folder), it seems to stall on importing menu_intro_music.ogg - or possibly right after, as I have .godot/imported/menu_intro_music.ogg-790ed16894e4b5a52d74fb71b5f5de88.oggstr. After a couple minutes inactivity, I kill the process.

  • New attempt without clearing .godot: It imports PNG assets blazingly fast (like 10-15 s) and eventually reaches bullet_explode.wav and stalls there. Again, the .sample is generated so I suspect it's stalling on the next asset. After a couple minutes inactivity, I kill the process.

  • New try, I get a ton of errors and a segfault. Reproducible each time I retry. No backtrace yet as I was testing with an optimized release_debug build without debug symbols.

@akien-mga
Copy link
Member

Here I got a stracktrace with the above steps to reproduce. It crashes importing a glTF file:

Thread 45 "godot.linuxbsd." received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff27fff640 (LWP 942881)]
Node::_generate_serial_child_name (this=<optimized out>, p_child=0x7ffefc005380, name=...) at ./core/string/string_name.h:97
97                      return _data == p_name._data;
(gdb) bt
#0  Node::_generate_serial_child_name (this=<optimized out>, p_child=0x7ffefc005380, name=...) at ./core/string/string_name.h:97
#1  0x0000000002196087 in Node::_validate_child_name (this=0x894e5a0, p_child=0x7ffefc005380, p_force_human_readable=<optimized out>) at scene/main/node.cpp:1070
#2  0x0000000002196f75 in Node::add_child (this=0x894e5a0, p_child=0x7ffefc005380, p_legible_unique_name=<optimized out>) at scene/main/node.cpp:1248
#3  0x000000000191fc9f in ProgressDialog::add_task (this=0x9a7e460, p_task=..., p_label=..., p_steps=<optimized out>, p_can_cancel=<optimized out>) at editor/progress_dialog.cpp:171
#4  0x0000000001758a2e in EditorNode::progress_add_task (p_task=..., p_label=..., p_steps=<optimized out>, p_can_cancel=<optimized out>) at editor/editor_node.cpp:3975
#5  0x0000000001a789a5 in EditorProgress::EditorProgress (p_can_cancel=false, p_amount=104, p_label=..., p_task=..., this=0x7fff27ffe860) at ./editor/editor_node.h:890
#6  ResourceImporterScene::import (this=0x8b83c40, p_source_file=..., p_save_path=..., p_options=..., r_platform_variants=<optimized out>, r_gen_files=<optimized out>, r_metadata=0x7fff27ffeb10)
    at editor/import/resource_importer_scene.cpp:1326
#7  0x00000000016f522a in EditorFileSystem::_reimport_file (this=0x8556d20, p_file=..., p_custom_options=<optimized out>, p_custom_importer=...) at ./core/object/reference.h:100
#8  0x00000000016f7849 in EditorFileSystem::_reimport_thread (this=<optimized out>, p_index=<optimized out>, p_import_data=<optimized out>) at editor/editor_file_system.cpp:1926
#9  0x00000000016fe8ae in ThreadWorkPool::Work<EditorFileSystem, void (EditorFileSystem::*)(unsigned int, EditorFileSystem::ImportThreadData*), EditorFileSystem::ImportThreadData*>::work (this=0x210094e0)
    at ./core/templates/thread_work_pool.h:61
#10 0x0000000003664306 in ThreadWorkPool::_thread_function (p_user=0x8b97210) at core/templates/thread_work_pool.cpp:42
#11 0x0000000003290cb1 in Thread::callback (p_self=<optimized out>, p_settings=..., p_callback=0x3664270 <ThreadWorkPool::_thread_function(void*)>, p_userdata=0x8b97210) at core/os/thread.cpp:72
#12 0x00000000039429b0 in std::execute_native_thread_routine (__p=0x96c4dd0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#13 0x00007ffff7d77cba in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff7b6000f in clone () from /lib64/libc.so.6
(gdb) frame 7
#7  0x00000000016f522a in EditorFileSystem::_reimport_file (this=0x8556d20, p_file=..., p_custom_options=<optimized out>, p_custom_importer=...) at ./core/object/reference.h:100
100             _FORCE_INLINE_ T *operator->() {
(gdb) print p_file
$1 = (const String &) @0x160a3838: {_cowdata = {_ptr = 0x2267c160 U"res://enemies/red_robot/parts/sparks_effect/SparkParticle.glb"}, static _null = 0 U'\000', static invalid_node_name_characters = {_cowdata = {
      _ptr = 0x6a56eb0 U". : @ / \""}, static _null = 0 U'\000', static invalid_node_name_characters = <same as static member of an already seen type>}}

@fire
Copy link
Member

fire commented Apr 12, 2021

EditorProgress is suspect too.

@lyuma
Copy link
Contributor

lyuma commented Apr 14, 2021

Re: EditorProgress

See issues #47866 and #46893

@akien-mga
Copy link
Member

Extracted some files from the TPS demo to make a MRP for the deadlock issue:
thread_import_deadlock.zip

@reduz reduz force-pushed the editor-import-multithreaded branch from 2e42f43 to aea1890 Compare April 19, 2021 13:41
@reduz reduz requested a review from a team as a code owner April 19, 2021 13:41
@reduz reduz force-pushed the editor-import-multithreaded branch from aea1890 to c04a379 Compare April 19, 2021 13:45
editor/editor_file_system.h Outdated Show resolved Hide resolved
@akien-mga
Copy link
Member

Seems to work pretty well now!

Testing on TPS demo:

  • Editor base startup time: 10 s
  • Full import time (editor start + import from scratch of all assets): 35 s (so 25 s when removing editor startup time)
  • .stex import time (after full import, removing .godot/imported/*.stex): 17 s (so 7 s to import all textures)

@reduz reduz force-pushed the editor-import-multithreaded branch from c04a379 to 1fecccf Compare April 19, 2021 16:14
@reduz reduz requested a review from a team as a code owner April 19, 2021 16:14
- For now everything imports multithreaded by default (should work I guess, let's test).
- Controllable per importer

Early test benchmark. 64 large textures (importing as lossless, _not_ as vram) on a mobile i7, 12 threads:
Importing goes down from 46 to 7 seconds.

For VRAM I will change the logic to use a compressing thread in a subsequent PR, as well as implementing Betsy.
@reduz reduz force-pushed the editor-import-multithreaded branch from 1fecccf to 2b730ca Compare April 19, 2021 17:12
Copy link
Member

@fire fire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I downloaded the artifacts windows editor for Godot Engine from the cicd github actions and used tps-demo and nothing hung.

The load times roughly matched Akien's numbers, but I have a faster computer.

I did an integration test and not style review or code review.

@aaronfranke
Copy link
Member

Without PR, importing the TPS demo on my system:

  • Editor base startup time: 14 seconds
  • Full import time (editor start + import from scratch of all assets): 3 minutes 1 second (so 2 minutes 46 seconds when removing editor startup time)
  • .stex import time (after full import, removing .godot/imported/*.stex): 2 minutes 32 seconds (so 2 minutes 18 seconds to import all textures)

With PR, importing the TPS demo on my system:

  • Editor base startup time: 14 seconds
  • Full import time (editor start + import from scratch of all assets): 1 minute 23 seconds (so 1 minute 9 seconds when removing editor startup time)
  • .stex import time (after full import, removing .godot/imported/*.stex): 56 seconds (so 42 seconds to import all textures)

@reduz
Copy link
Member Author

reduz commented Apr 19, 2021

@aaronfranke Not so much of a win in your case, what hardware specs are you running on?

@aaronfranke
Copy link
Member

My specs are an i5-6600k CPU (4c/4t), 48 GB RAM, GTX 1070 GPU, and not many open programs while running this test.

@akien-mga
Copy link
Member

akien-mga commented Apr 19, 2021

That's still a pretty big gain if you compare before/after, it's a 69.5% import time reduction (for .stex).
My own test times for importing all .stex before this PR (but after etcpak was merged) were around 50 s, so it's a comparable gain (more than twice faster).

The base metrics do seem high for @aaronfranke, but comparatively I have an i7-8705G(*) and since this is all CPU and I/O bound, the difference is likely expected. Another factor might be the OS, I'm running Linux and if @aaronfranke is on Windows, the infamously slow Windows I/O might be a big factor.

All in all, good results I think :)


(*) Full specs.
$ inxi -CSG
System:    Host: cauldron Kernel: 5.11.15-desktop-1.mga9 x86_64 bits: 64 Desktop: KDE Plasma 5.21.4 Distro: Mageia 9 mga9 
CPU:       Info: Quad Core model: Intel Core i7-8705G bits: 64 type: MT MCP cache: L2: 8 MiB 
           Speed: 900 MHz min/max: 800/3100 MHz Core speeds (MHz): 1: 900 2: 900 3: 900 4: 900 5: 900 6: 900 7: 900 8: 900 
Graphics:  Device-1: Intel HD Graphics 630 driver: i915 v: kernel 
           Device-2: Advanced Micro Devices [AMD/ATI] Polaris 22 XL [Radeon RX Vega M GL] driver: amdgpu v: kernel 
           Device-3: Cheng Uei Precision Industry (Foxlink) HP Wide Vision FHD Camera type: USB driver: uvcvideo 
           Display: x11 server: Mageia X.org 1.20.11 driver: loaded: intel,v4l resolution: 1920x1080~60Hz 
           OpenGL: renderer: Mesa Intel HD Graphics 630 (KBL GT2) v: 4.6 Mesa 21.0.2

@reduz
Copy link
Member Author

reduz commented Apr 19, 2021

@aaronfranke oh, guess it kinda makes sense then since you are limited by number of cores and this is CPU based compression (you got 3.3x improvement, which is pretty good, considering the time it takes to load/save on disk). With a 8 core 16 threads machine the TPS demo imports in less than 10 seconds.

@aaronfranke
Copy link
Member

aaronfranke commented Apr 19, 2021

Sorry, should've specified the OS. This is on Linux, Ubuntu 20.04 64-bit to be specific.

Full specs

Screenshot from 2021-03-20 20-05-39

@akien-mga akien-mga merged commit f817e7f into godotengine:master Apr 19, 2021
@akien-mga
Copy link
Member

Thanks!

@qarmin
Copy link
Contributor

qarmin commented Jun 17, 2021

_reimport_file looks that use late_added_files variable for all threads but doesn't use any mutex

Set<String> late_added_files; //keep track of files that were added, these will be re-scanned

late_added_files.insert(p_file); //imported files do not call update_file(), but just in case..

and this broke importing when importing a lot of files
#49324 (comment)

@lyuma
Copy link
Contributor

lyuma commented Jun 17, 2021

@qarmin I asked reduz about late_added_files a few hours ago so I'll see if I get a reply.

If you have time, would it be possible for you to I submitted a PR that removes all uses of late_added_files, since it seems to do nothing other than cause crashes? It should be a relatively small change.

Also, when playing around with importing assets, here are two other changes that I recommend if you're still seeing hangs or crashes: the first seems to fix some corner case that can cause hangs; the second disables the progress bar, since it leads to a nested main loop which wreaks havoc on multithreaded code.

diff --git a/core/templates/thread_work_pool.h b/core/templates/thread_work_pool.h
index 9f7a692cc5..b242648bc8 100644
--- a/core/templates/thread_work_pool.h
+++ b/core/templates/thread_work_pool.h
@@ -105,7 +105,7 @@ public:
 	}
 
 	bool is_done_dispatching() const {
-		ERR_FAIL_COND_V(current_work == nullptr, false);
+		ERR_FAIL_COND_V(current_work == nullptr, true);
 		return index.load(std::memory_order_acquire) >= current_work->max_elements;
 	}
 
diff --git a/editor/progress_dialog.cpp b/editor/progress_dialog.cpp
index 0b6a3798b3..1a89ff4749 100644
--- a/editor/progress_dialog.cpp
+++ b/editor/progress_dialog.cpp
@@ -207,7 +207,7 @@ bool ProgressDialog::task_step(const String &p_task, const String &p_state, int
 		DisplayServer::get_singleton()->process_events();
 	}
 
-	Main::iteration(); // this will not work on a lot of platforms, so it's only meant for the editor
+	// Main::iteration(); // this will not work on a lot of platforms, so it's only meant for the editor
 	return cancelled;
 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants