Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading debugger errors when using large loops #32441

Open
rainlizard opened this issue Sep 30, 2019 · 9 comments
Open

Multi-threading debugger errors when using large loops #32441

rainlizard opened this issue Sep 30, 2019 · 9 comments

Comments

@rainlizard
Copy link
Contributor

Godot version: 3.11
OS/device including version:
Intel Core i7-4790K CPU
16GB ram
Windows 8.1 64bit

Steps to reproduce:
Main scene:

extends Spatial
var threadScene = load("res://Thread.tscn")

func _ready():
	print("ESC to cancel test. Check debugger for errors.")

func _input(event):
	if Input.is_action_pressed('exit'):
		get_tree().quit()

func _process(delta):
	var newThread = threadScene.instance()
	add_child(newThread)
	newThread.begin()

Thread scene:

extends Node
var thread = Thread.new()

func begin():
	thread.start(self, "_thread_function")

func _thread_function(userdata):
	var threadData = []
	for i in 100000:
		threadData.append(0)
	call_deferred("waitForCompletion")
	return threadData

func waitForCompletion():
	var data = thread.wait_to_finish()
	queue_free()

Minimal reproduction project: Isolate Bug.zip

Issue description:
If you let this run for a while (can take anywhere from 0-30 seconds) the debugger will fill with errors and lock up the game.

The time it takes to generate the errors is inconsistent and so is the amount of errors.
With this code I couldn't get the errors to appear at 50000 loops but it does appear at 100000 loops. To produce these errors I was originally using Vector3s in dictionaries (for terrain gen) but it looks like arrays have the same issue. In my game I get the errors using as little as 15000 loops but there's a lot more other stuff going on so I think it's about the intensity of the process.

This is the usual error:

E 0:00:02:0793   Condition ' len < 4 ' is true. returned: ERR_INVALID_DATA
  <C Source>     core/io/marshalls.cpp:108 @ decode_variant()

E 0:00:02:0793   Condition ' err != OK ' is true. Continuing..:
  <C Source>     core/script_debugger_remote.cpp:692 @ _poll_events()

But I've also seen this on occasion:

E 0:00:12:0153   Condition ' (type & 0xFF) >= Variant::VARIANT_MAX ' is true. returned: ERR_INVALID_DATA
  <C Source>     core/io/marshalls.cpp:113 @ decode_variant()

E 0:00:12:0154   Condition ' err ' is true. returned: err
  <C Source>     core/io/marshalls.cpp:487 @ decode_variant()

E 0:00:12:0155   Condition ' err ' is true. returned: err
  <C Source>     core/io/marshalls.cpp:531 @ decode_variant()

E 0:00:12:0155   Condition ' err != OK ' is true. Continuing..:
  <C Source>     core/script_debugger_remote.cpp:692 @ _poll_events()

E 0:00:12:0307   Condition ' var.get_type() != Variant::ARRAY ' is true. Continuing..:
  <C Source>     core/script_debugger_remote.cpp:693 @ _poll_events()

I don't know what any of this means. I am unable to use multi-threading as it currently is (the errors are freezing my game). Your help is appreciated.

@qarmin
Copy link
Contributor

qarmin commented Sep 30, 2019

I tested 1 min but nothing shows, but maybe this is related:

With compiled Godot with sanitizers support

scons p=x11 -j6 use_llvm use_lsan=yes use_asan=yes use_ubsan=yes 

when I run project and I pressed exit key on keyboard then Godot used freed memory:

==19760==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0004cfb98 at pc 0x00000e80b497 bp 0x7f1dae23e550 sp 0x7f1dae23e540
READ of size 4 at 0x60d0004cfb98 thread T129
    #0 0xe80b496 in Variant::operator=(Variant const&) core/variant.cpp:2531
    #1 0x14a8a76 in CowData<Variant>::set(int, Variant const&) core/cowdata.h:139
    #2 0x149d4e0 in Vector<Variant>::set(int, Variant const&) core/vector.h:82
    #3 0x148e63f in Vector<Variant>::push_back(Variant const&) core/vector.h:154
    #4 0xe2d3fd9 in Array::push_back(Variant const&) core/array.cpp:117
    #5 0x177d431 in Array::append(Variant const&) core/array.h:67
    #6 0xe93a70d in _VariantCall::_call_Array_append(Variant&, Variant&, Variant const**) core/variant_call.cpp:520
    #7 0xe90b8cd in _VariantCall::FuncData::call(Variant&, Variant&, Variant const**, int, Variant::CallError&) core/variant_call.cpp:121
    #8 0xe85b12a in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) core/variant_call.cpp:1098
    #9 0x1a0f94a in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_function.cpp:1082
    #10 0x1846f71 in GDScriptInstance::call(StringName const&, Variant const**, int, Variant::CallError&) modules/gdscript/gdscript.cpp:1164
    #11 0xe56bf1b in Object::call(StringName const&, Variant const**, int, Variant::CallError&) core/object.cpp:900
    #12 0xf11ca59 in _Thread::_start_func(void*) core/bind/core_bind.cpp:2619
    #13 0x4fa2315 in ThreadPosix::thread_callback(void*) drivers/unix/thread_posix.cpp:74
    #14 0x7f1de9912181 in start_thread /build/glibc-KRRWSm/glibc-2.29/nptl/pthread_create.c:486
    #15 0x7f1de8b81b1e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11db1e)

0x60d0004cfb98 is located 40 bytes inside of 144-byte region [0x60d0004cfb70,0x60d0004cfc00)
freed by thread T0 here:
    #0 0x7f1dea38604f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0x10c04f)
    #1 0xeb0922f in Memory::free_static(void*, bool) core/os/memory.cpp:181
    #2 0x13f9d06 in CowData<Variant>::_unref(void*) core/cowdata.h:212
    #3 0x13f9838 in CowData<Variant>::~CowData() core/cowdata.h:369
    #4 0x13f92ec in Vector<Variant>::~Vector() core/vector.h:126
    #5 0x1a29df7 in GDScriptFunction::~GDScriptFunction() modules/gdscript/gdscript_function.cpp:1773
    #6 0x188ff1b in void memdelete<GDScriptFunction>(GDScriptFunction*) core/os/memory.h:116
    #7 0x183f625 in GDScript::~GDScript() modules/gdscript/gdscript.cpp:920
    #8 0x14a21fd in void memdelete<Reference>(Reference*) core/os/memory.h:116
    #9 0x149442f in Ref<Reference>::unref() core/reference.h:279
    #10 0xe63652e in RefPtr::unref() core/ref_ptr.cpp:82
    #11 0xe7ee7a8 in Variant::clear() core/variant.cpp:1118
    #12 0x13f90d1 in Variant::~Variant() core/variant.h:418
    #13 0x13f9cec in CowData<Variant>::_unref(void*) core/cowdata.h:207
    #14 0x13f9838 in CowData<Variant>::~CowData() core/cowdata.h:369
    #15 0x13f92ec in Vector<Variant>::~Vector() core/vector.h:126
    #16 0xc03df30 in SceneState::~SceneState() scene/resources/packed_scene.h:37
    #17 0x58df302 in void memdelete<SceneState>(SceneState*) core/os/memory.h:116
    #18 0x58ce1f9 in Ref<SceneState>::unref() core/reference.h:279
    #19 0x58c5d82 in Ref<SceneState>::~Ref() core/reference.h:295
    #20 0xc03e2a8 in PackedScene::~PackedScene() scene/resources/packed_scene.h:197
    #21 0x14a21fd in void memdelete<Reference>(Reference*) core/os/memory.h:116
    #22 0x149442f in Ref<Reference>::unref() core/reference.h:279
    #23 0xe63652e in RefPtr::unref() core/ref_ptr.cpp:82
    #24 0xe7ee7a8 in Variant::clear() core/variant.cpp:1118
    #25 0x13f90d1 in Variant::~Variant() core/variant.h:418
    #26 0x13f9cec in CowData<Variant>::_unref(void*) core/cowdata.h:207
    #27 0x13f9838 in CowData<Variant>::~CowData() core/cowdata.h:369
    #28 0x13f92ec in Vector<Variant>::~Vector() core/vector.h:126
    #29 0x184bf81 in GDScriptInstance::~GDScriptInstance() modules/gdscript/gdscript.cpp:1328

previously allocated by thread T0 here:
    #0 0x7f1dea386448 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    #1 0xeb081da in Memory::alloc_static(unsigned long, bool) core/os/memory.cpp:85
    #2 0x14931ee in CowData<Variant>::resize(int) core/cowdata.h:274
    #3 0x14888ba in Vector<Variant>::resize(int) core/vector.h:84
    #4 0x1920de9 in GDScriptCompiler::_parse_function(GDScript*, GDScriptParser::ClassNode const*, GDScriptParser::FunctionNode const*, bool) modules/gdscript/gdscript_compiler.cpp:1699
    #5 0x192d047 in GDScriptCompiler::_parse_class_blocks(GDScript*, GDScriptParser::ClassNode const*, bool) modules/gdscript/gdscript_compiler.cpp:2017
    #6 0x1930cd3 in GDScriptCompiler::compile(GDScriptParser const*, GDScript*, bool) modules/gdscript/gdscript_compiler.cpp:2159
    #7 0x18323d4 in GDScript::reload(bool) modules/gdscript/gdscript.cpp:582
    #8 0x186c43c in ResourceFormatLoaderGDScript::load(String const&, String const&, Error*) modules/gdscript/gdscript.cpp:2189
    #9 0xefaf510 in ResourceLoader::_load(String const&, String const&, String const&, bool, Error*) core/io/resource_loader.cpp:270
    #10 0xefb22b6 in ResourceLoader::load(String const&, String const&, bool, Error*) core/io/resource_loader.cpp:396
    #11 0xc1b6989 in ResourceInteractiveLoaderText::poll() scene/resources/resource_format_text.cpp:433
    #12 0xefa95c8 in ResourceFormatLoader::load(String const&, String const&, Error*) core/io/resource_loader.cpp:197
    #13 0xefaf510 in ResourceLoader::_load(String const&, String const&, String const&, bool, Error*) core/io/resource_loader.cpp:270
    #14 0xefb22b6 in ResourceLoader::load(String const&, String const&, bool, Error*) core/io/resource_loader.cpp:396
    #15 0x1a8cdab in GDScriptFunctions::call(GDScriptFunctions::Function, Variant const**, int, Variant&, Variant::CallError&) modules/gdscript/gdscript_functions.cpp:1069
    #16 0x1a1162c in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_function.cpp:1146
    #17 0x1822621 in GDScript::_create_instance(Variant const**, int, Object*, bool, Variant::CallError&) modules/gdscript/gdscript.cpp:114
    #18 0x18297b1 in GDScript::instance_create(Object*) modules/gdscript/gdscript.cpp:330
    #19 0xe56f59e in Object::set_script(RefPtr const&) core/object.cpp:998
    #20 0xe55c173 in Object::set(StringName const&, Variant const&, bool*) core/object.cpp:432
    #21 0xbfef454 in SceneState::instance(SceneState::GenEditState) const scene/resources/packed_scene.cpp:209
    #22 0xc021526 in PackedScene::instance(PackedScene::GenEditState) const scene/resources/packed_scene.cpp:1692
    #23 0x15764f9 in Main::start() main/main.cpp:1761
    #24 0x13f8ad3 in main platform/x11/godot_x11.cpp:55
    #25 0x7f1de8a8ab6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)

Thread T129 created by T0 here:
    #0 0x7f1dea2b4311 in pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x3a311)
    #1 0x4fa280b in ThreadPosix::create_func_posix(void (*)(void*), void*, Thread::Settings const&) drivers/unix/thread_posix.cpp:90
    #2 0xeb1dded in Thread::create(void (*)(void*), void*, Thread::Settings const&) core/os/thread.cpp:51
    #3 0xf11e76c in _Thread::start(Object*, StringName const&, Variant const&, _Thread::Priority) core/bind/core_bind.cpp:2665
    #4 0xf1f709a in MethodBind4R<Error, Object*, StringName const&, Variant const&, _Thread::Priority>::call(Object*, Variant const**, int, Variant::CallError&) core/method_bind.gen.inc:3325
    #5 0xe56c3aa in Object::call(StringName const&, Variant const**, int, Variant::CallError&) core/object.cpp:921
    #6 0xe85adde in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) core/variant_call.cpp:1082
    #7 0x1a0f94a in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_function.cpp:1082
    #8 0x1846f71 in GDScriptInstance::call(StringName const&, Variant const**, int, Variant::CallError&) modules/gdscript/gdscript.cpp:1164
    #9 0xe56bf1b in Object::call(StringName const&, Variant const**, int, Variant::CallError&) core/object.cpp:900
    #10 0xe85adde in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) core/variant_call.cpp:1082
    #11 0x1a0f94a in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_function.cpp:1082
    #12 0x1847486 in GDScriptInstance::call_multilevel(StringName const&, Variant const**, int) modules/gdscript/gdscript.cpp:1180
    #13 0x9307f4e in Node::_notification(int) scene/main/node.cpp:60
    #14 0x1700d07 in Node::_notificationv(int, bool) scene/main/node.h:46
    #15 0x1730b0d in Spatial::_notificationv(int, bool) scene/3d/spatial.h:54
    #16 0xe56c915 in Object::notification(int, bool) core/object.cpp:931
    #17 0x9438025 in SceneTree::_notify_group_pause(StringName const&, int) scene/main/scene_tree.cpp:962
    #18 0x9428ce0 in SceneTree::idle(float) scene/main/scene_tree.cpp:516
    #19 0x157d614 in Main::iteration() main/main.cpp:1929
    #20 0x147c720 in OS_X11::run() platform/x11/os_x11.cpp:3191
    #21 0x13f8ba3 in main platform/x11/godot_x11.cpp:56
    #22 0x7f1de8a8ab6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)
SUMMARY: AddressSanitizer: heap-use-after-free core/variant.cpp:2531 in Variant::operator=(Variant const&)


@rainlizard
Copy link
Contributor Author

I tested 1 min but nothing shows

Try adding one or two zeroes to the number of loops, like 10000000 instead of 100000, or remove a zero. It might be dependent on your hardware.
errors
I can replicate this error bombardment fairly regularly.

@ghost
Copy link

ghost commented Nov 24, 2019

I am experiencing a similar issue. I am not certain it is the same issue but it sounds very similar.
It doesn't always crash the project, but a high percentage of tests will crash.

My environment differs slightly from rainlizard.
Godot version: master - d3a852f
OS/device including version:
Intel i7 8750h
Intel HD630 / Nvidia GTX1050ti Max-Q
32GB ram
Archlinux 5.3.7

This has occured for me in the past with various 3.x versions.

My project consists of multiple threaded processes which perform procedural generation tasks that are queued up and started at scene launch. These are not ongoing tasks but rather some initial generation that prepare the game world (they take around 15-30 seconds, depending on the complexity level I have chosen).

Shortly after starting the scene from the editor the output log starts printing the following:

ERROR: get_packet: Condition ' remaining < 4 ' is true. returned: ERR_UNAVAILABLE
   At: core/io/packet_peer.cpp:221.
ERROR: _poll_events: Condition ' err != OK ' is true. Continuing..:
   At: scene/debugger/script_debugger_remote.cpp:752.

After some time I receive the following:

ERROR: decode_variant: Condition ' len < 4 ' is true. returned: ERR_INVALID_DATA
   At: core/io/marshalls.cpp:108.
ERROR: _poll_events: Condition ' err != OK ' is true. Continuing..:
   At: scene/debugger/script_debugger_remote.cpp:752.

And the last logs before everything freezes(mixed in with the above logs):

ERROR: _poll_events: Condition ' var.get_type() != Variant::ARRAY ' is true. Continuing..:
ERROR: _poll_events: Condition ' cmd.size() == 0 ' is true. Continuing..:
   At: scene/debugger/script_debugger_remote.cpp:757.
   At: scene/debugger/script_debugger_remote.cpp:753.

The most important part however is that if I launch the editor from the terminal I am greeted with the following when the scene freezes:

*** stack smashing detected ***: <unknown> terminated

My assumption was that I am doing something wrong which was causing the kernel to kill the process, such as improper care when operating on shared data structures.

However, after seeing this issue I thought I would have a deeper look and found the following.

If I launch the project directly (executing the godot binary from the project directory) I have NOT experienced any issues with the processes completing.

As I mentioned this doesn't always fail but a high percentage of runs do fail.
To confirm that it was not just a fluke I have performed 10 runs each by launching the project either from the editor or directly from the project directory.

A sample test of 10 runs of each launch method resulted in the following:

start from editor: 9 fail, 1 success
start directly: 10 success, 0 fail

If I can provide any further information or assist at all in debugging please let me know.

@rainlizard
Copy link
Contributor Author

rainlizard commented Nov 25, 2019

For my game I have since switched to using only one Thread (in addition to main Thread) and as a result the errors have become very rare, only occurring once in a blue moon. The more threads you run the higher chance the errors will occur.

When I posted this issue I was on 3.11, now on 3.2 it seems "Max Errors Per Frame" has now been changed to "Max Errors Per Second", previously it was generating errors endlessly, up to thousands and would freeze the entire time. Now it still freezes for a moment while it's spitting out the errors but the freezes are much shorter (a second or two) because it only spits out 100 errors at a time, as that's the default in project settings. So this issue has had a small band-aid placed on it I guess. Unfortunately setting "Max Errors Per Second" to 0 isn't reducing the freezes, the freezes still seem to be around the same length as when it spits out 100 errors.

But to be clear, only brief freezes are accompanying these errors for me, not crashes.

It seems I can generate the errors using only this:

var threadArray = []
func _ready():
	threadArray.resize(10) #change this value to change number of concurrent threads
	for i in threadArray.size():
		threadArray[i] = Thread.new()
		threadArray[i].start(self, "_thread_function")

func _thread_function(userdata):
	while true:
		pass

I often have to restart the test a few times, waiting 10 seconds or so to see if the errors appear. It's still very inconsistent.
But because I can get the errors using a while loop like this I think that means it's not about the amount of memory/data - but rather it's simply about whether multiple threads are in the process of currently running. Godot will actually crash if you close the game while threads are in the middle of running so I think this area might be bit neglected. #33054

@BalaDeSilver
Copy link

Is this still a thing? Threads are a core necessity for my project and I kind of can't afford to have threads be so inconsistent.

@rainlizard
Copy link
Contributor Author

Yes, 3.2-beta5

Here's a test project of the last code snippet I posted. (if you get no errors in the debugger after 10 seconds, restart it, or change number of concurrent threads)

threaderrors.zip

@The5-1
Copy link

The5-1 commented May 6, 2020

Having the same issue with 3.2.1 stable release binaries.
Generating procedural geometry on a thread pool.
In my current case 6 threads start a function 256 times each running a a nested for loop of 64 x 64 iterations.
I get various different errors like:
get_packet: Condition "remaining < 4" is true. Returned: ERR_UNAVAILABLE
decode_variant: Condition "len < 4" is true. Returned: ERR_INVALID_DATA
_poll_events: Condition "err != OK" is true. Continuing.
decode_variant: Condition "(type & 0xFF) >= Variant::VARIANT_MAX" is true. Returned: ERR_INVALID_DATA
Occasional it runs without errors. Sometimes it produces 20-30 errors. And more rarely it produces hundreds of errors that slow the application down, likely due to the console prints.
The more threads you start the more likely it occurs.
But even with sensible counts like 4-6 Threads it occurs fairly often.

@cullumi
Copy link

cullumi commented Sep 9, 2022

I've been having this same issue. I should mention that it comes up faster whenever I open up the Remote Scene Tree. The Remote Scene Tree can no longer update in my project and is therefore unusable alongside my fairly straightforward Thread Pool script. If I understand correctly, accessing values in Dictionaries and Arrays and the like is thread-safe, but perhaps that has something to do with the Remote Scene Tree piece for me?

Pretty sure just having the threads exist is enough to be a problem, though.

@sdsaati
Copy link

sdsaati commented Jul 14, 2023

At the moment in Godot 4.1.1.rc1 I'm using multi-threading and having the same issue when click on the remote my game freezes and gave me these errors:

E 0:00:02:0799   get_message: Condition "!has_message()" is true. Returning: Array()
  <C++ Source>   core/debugger/remote_debugger_peer.cpp:47 @ get_message()
 E 0:00:02:0799   poll_events: Condition "arr.size() != 2" is true. Continuing.
  <C++ Source>   core/debugger/remote_debugger.cpp:503 @ poll_events()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants