Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket server crashes when client disconnects (50% of the time). #33279

Closed
Tracked by #39196
realkotob opened this issue Nov 3, 2019 · 33 comments
Closed
Tracked by #39196

Websocket server crashes when client disconnects (50% of the time). #33279

realkotob opened this issue Nov 3, 2019 · 33 comments

Comments

@realkotob
Copy link
Contributor

realkotob commented Nov 3, 2019

Godot version:
Latest 3.2alpha3 and master

OS/device including version:
Happens on Win10 and Ubuntu 18.x

Issue description:

I get the following error log, I suspect it could be because the server is trying to send an rpc call to a client that just left?

ERROR: _get_socket_error: Socket error: 10054
   At: drivers/unix/net_socket_posix.cpp:202
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 1252280632 was unregistered
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238

Steps to reproduce:
Occasionally (around 50% of the time), the server crashes when the client disconnects. I'm using the same code from the websockets demo projects from the demos repo so there is nothing custom going on in that area.

Minimal reproduction project:
The websockets projects from the demos repository.

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

I'd love to work on this myself if the regular networking contributors do not have the bandwidth, because the server crashing at disconnect is several orders worse than the client crashing. 😅

@realkotob realkotob changed the title Websocket server crashes when client disconnects. Websocket server crashes when client disconnects (50% of the time). Nov 3, 2019
@realkotob
Copy link
Contributor Author

Here is the error log when 2 clients disconnected:

ERROR: _get_socket_error: Socket error: 10054
   At: drivers/unix/net_socket_posix.cpp:202
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 836541384 was unregistered
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 836541384 disconnected
ERROR: Attempt to remote call unexisting ID: 836541384.
   At: core/io/multiplayer_api.cpp:482
ERROR: Attempt to remote call unexisting ID: 836541384.
   At: core/io/multiplayer_api.cpp:482
ERROR: Attempt to remote call unexisting ID: 836541384.
   At: core/io/multiplayer_api.cpp:482
ERROR: _get_socket_error: Socket error: 10054
   At: drivers/unix/net_socket_posix.cpp:202
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 995400908 was unregistered
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 995400908 disconnected
ERROR: Attempt to remote call unexisting ID: 995400908.
   At: core/io/multiplayer_api.cpp:482
ERROR: Attempt to remote call unexisting ID: 995400908.
   At: core/io/multiplayer_api.cpp:482
ERROR: Attempt to remote call unexisting ID: 995400908.
   At: core/io/multiplayer_api.cpp:482

@Faless
Copy link
Collaborator

Faless commented Nov 3, 2019

I'm unable to reproduce the crash using the websocket multiplayer demo, can you provide the stack trace?

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

@Faless Excuse my ignorance, but where can I find the stack trace? It's not showing in any of the bottom consoles (I can only copy the error) and there's nothing more verbose in the commandline either.

I tried having the Remote Debugger on and off, it didn't show anything else when the error happened.

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

I enabled verbose stdout in the editor settings, it showed an extra line that wasn't before (see the **-ed lines).

ERROR: get_global_transform: Condition ' !is_inside_tree() ' is true. returned: Transform()
   At: scene/3d/spatial.cpp:267
ERROR: _get_socket_error: Socket error: 10054
   At: drivers/unix/net_socket_posix.cpp:202
**Websocket get data error: 1, read (should be 0!): 0
Websocket (wslay) poll error: -400**
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 1107338214 was unregistered
ERROR: put_packet: Condition ' !is_connected_to_host() ' is true. returned: FAILED
   At: modules/websocket/wsl_peer.cpp:238
Client 1107338214 disconnected
ERROR: _get_socket_error: Socket error: 10054
   At: drivers/unix/net_socket_posix.cpp:202

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

I am able to reproduce with this project (on alpha 3 and master).

I forgot to reduce the display size for the client before zipping, you might want to reset it to the default instead of 1920x1080.

Server.zip
Client.zip

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

When trying to mass-reproduce it I discovered it is more likely to happen if the client has been connected for a few seconds, mashing Run -> Stop -> Run -> Stop is less likely to win.

I suspect I might be getting superstitious though, it's pretty random 😄

@Faless
Copy link
Collaborator

Faless commented Nov 3, 2019

@Faless Excuse my ignorance, but where can I find the stack trace? It's not showing in any of the bottom consoles (I can only copy the error) and there's nothing more verbose in the commandline either.

Well, you said the server is crashing.
Crashing means the server closes unexpectedly (i.e., not just showing errors in console).
When it crashes it should print (in the console) a "stack trace" like this:

[1] /lib/x86_64-linux-gnu/libc.so.6(+0x43f60) [0x7fd107294f60] (??:0)
[2] Class::function() (??:0)
[3] Class::funciton_2() (??:0)

Can you confirm the server crashes (i.e. exits) when those errors happen?

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

@Faless I'm running in the editor, that's where it crashes, as in it stops and goes back to the editor, and there is no stack trace. The process just stops/exits.

I can confirm it is crashing/exiting with every fiber in my bones 😄

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

@Faless If it makes a difference it would only take me a few minutes to run it on AWS as an alpha3 server build.

Also thank you for replying I really appreciate it.

@Faless
Copy link
Collaborator

Faless commented Nov 3, 2019

I'm still unable to reproduce this :(, tried connecting/disconnecting multiple times, with more than one client connected, waiting few seconds, still no crashes...
Can you try to run it from the console instead of from the editor and see if you get the stack trace?
/path/to/godot.x11.tools.64 --path /path/to/Server

@realkotob
Copy link
Contributor Author

@Faless Could it a platform issue? I'm developing on Win10 and rarely test on linux, so I'll take a few minutes to upload the pck to AWS and get back to you with the results.

@Faless
Copy link
Collaborator

Faless commented Nov 3, 2019

@Faless Could it a platform issue? I'm developing on Win10 and rarely test on linux, so I'll take a few minutes to upload the pck to AWS and get back to you with the results.

I'm not sure, but you mentioned that it happened also on Ubuntu 18.x so I thought you already had a way to test it on linux and reproduced the error there.

@Calinou
Copy link
Member

Calinou commented Nov 3, 2019

@asheraryam The crash stacktrace isn't printed back to the editor, so you should start Godot directly in a terminal while in the project directory (it will run automatically):

cd /path/to/project/folder
/path/to/godot/binary

# Or:
/path/to/godot/binary --path /path/to/project/folder

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

Well the good news is that I got it to crash on Ubuntu, the bad news is that the extra text was

Segmentation fault (core dumped)

and then back to the OS.

I used the pck should I use the project instead? Note that my linux testing environment is a server so I can't open the project in the editor on linux.

@Faless
Copy link
Collaborator

Faless commented Nov 3, 2019

Well the good news is that I got it to crash on Ubuntu, the bad news is that the extra text was

Segmentation fault (core dumped)

Weird, and that is with which version of Godot? 3.2-alpha3? Are you using the release export template? Can you try using the debug export template instead?

I used the pck should I use the project instead?

No, it shouldn't matter...

@Faless
Copy link
Collaborator

Faless commented Nov 3, 2019

Alternatively, can you try running the server with gdb? I know this is suboptimal, but the integrated stack tracing function does not seem to work :( .

gdb --args /path/to/godot/binary --path /path/to/project.pck

The (after gdb loads):
run (and press enter)

When the server segfault it should bring you to the gdb console.
there type:
bt and press enter.

Should give you the stack trace...

EDIT: I've also tested with the server platform, still unable to reproduce :'(

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

Yes this is with 3.2alpha and using a debug build.

@Faless I'll run gdb on the linux server and get back to you in a few minutes.

Edit: Also to be fair, it's not as easy to reproduce as I initially thought, it usually happens after longer testing sessions, so when I spam run/close it's much less likely to happen.

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

Finally something!

0x00000000008c6fb0 in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) ()

bt:

#0  0x00000000008c6fb0 in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) ()
#1  0x0000000001809af7 in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) ()
#2  0x000000000180c5c8 in GDScriptInstance::call(StringName const&, Variant const**, int, Variant::CallError&) ()
#3  0x0000000002364750 in ?? ()
#4  0x00007fffffffddc0 in ?? ()
#5  0x00007fffffffddc0 in ?? ()
#6  0x00007fffffffdaa0 in ?? ()
#7  0x00000000009151cf in Object::call(StringName const&, Variant const**, int, Variant::CallError&) ()
#8  0x00007fffffffdc60 in ?? ()
#9  0x0000000004e28210 in ?? ()
#10 0x0000000000000000 in ?? ()

Edit: It doesn't look strictly related to networking, but I never had it with the client or anything else so it must be websockets server thing.. right? 😅

@qarmin
Copy link
Contributor

qarmin commented Nov 3, 2019

It can be related to this issue - #33290
Usually Segmentation fault (core dumped) are shown when memory is corrupted to much and cause of this can be usually easy tracked when Godot is compiled with sanitizer support e.g.:

scons p=x11 -j6 use_asan=yes use_ubsan=yes

@realkotob
Copy link
Contributor Author

@qarmin I'll try to do that but it might take longer than a few minutes since the aws instance doesn't have the tooling/repo cloned yet.

@qarmin
Copy link
Contributor

qarmin commented Nov 3, 2019

Steps to reproduce:

  1. Download Server and Client from comment - Websocket server crashes when client disconnects (50% of the time). #33279 (comment)
  2. Run server and client projects in editor(to import data) and close it after importing
  3. Run server in console
  4. Run client in other console
  5. Click at some players
  6. Close client and back to point 4

I have got this backtrace:

[1] /lib/x86_64-linux-gnu/libc.so.6(+0x46470) [0x7ffacfb9c470] (??:0)
[2] Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) (/mnt/KubuntuWolne/godot/core/variant_call.cpp:1103)
[3] GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) (/mnt/KubuntuWolne/godot/modules/gdscript/gdscript_function.cpp:1079)
[4] GDScriptInstance::call(StringName const&, Variant const**, int, Variant::CallError&) (/mnt/KubuntuWolne/godot/modules/gdscript/gdscript.cpp:1164)
[5] Object::call(StringName const&, Variant const**, int, Variant::CallError&) (/mnt/KubuntuWolne/godot/core/object.cpp:900 (discriminator 1))
[6] Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) (/mnt/KubuntuWolne/godot/core/variant_call.cpp:1103 (discriminator 1))
[7] GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) (/mnt/KubuntuWolne/godot/modules/gdscript/gdscript_function.cpp:1079)
[8] GDScriptInstance::call(StringName const&, Variant const**, int, Variant::CallError&) (/mnt/KubuntuWolne/godot/modules/gdscript/gdscript.cpp:1164)
[9] Object::call(StringName const&, Variant const**, int, Variant::CallError&) (/mnt/KubuntuWolne/godot/core/object.cpp:900 (discriminator 1))
[10] MultiplayerAPI::_process_rpc(Node*, StringName const&, int, unsigned char const*, int, int) (/mnt/KubuntuWolne/godot/core/io/multiplayer_api.cpp:321 (discriminator 2))
[11] MultiplayerAPI::_process_packet(int, unsigned char const*, int) (/mnt/KubuntuWolne/godot/core/io/multiplayer_api.cpp:218)
[12] MultiplayerAPI::poll() (/mnt/KubuntuWolne/godot/core/io/multiplayer_api.cpp:118)
[13] SceneTree::idle(float) (/mnt/KubuntuWolne/godot/scene/main/scene_tree.cpp:519)
[14] Main::iteration() (/mnt/KubuntuWolne/godot/main/main.cpp:1976)
[15] OS_X11::run() (/mnt/KubuntuWolne/godot/platform/x11/os_x11.cpp:3259)
[16] godot(main+0x12c) [0x1401a52] (/mnt/KubuntuWolne/godot/platform/x11/godot_x11.cpp:57)
[17] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ffacfb7d1e3] (??:0)
[18] godot(_start+0x2e) [0x140186e] (??:?)

@realkotob
Copy link
Contributor Author

realkotob commented Nov 3, 2019

@qarmin Thank you for your help!

I'm maybe halfway done with the godot build so I might as well report back anyway just in case I get varying results.

Edit: After around an hour of compiling the custom build on AWS refused to run the server, could be that the extra debug flags caused too much congestion on the processor.
I might try again but I think we have what we need :D

@JoshLee0915
Copy link
Contributor

I think we are hitting a similar situation. At a glance I think it might be a race condition where a user disconnects in the middle of an RPC call. Upping the packet and buffer size seemed to help some but we still get it quite frequently if we have a fair number of players coming in and out.

Here is our log and stack dump:

Current player count: 5
Game Resumed
User: 1591479155 disconnected
Current player count: 4
Game Resumed
ERROR: get_peer: Condition ' !has_peer(p_id) ' is true. returned: __null
   At: modules/websocket/lws_server.cpp:178.

=================================================================
        Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================
/proc/self/maps:
402a9000-402b9000 rwxp 00000000 00:00 0
414bb000-414cb000 rwxp 00000000 00:00 0
41d21000-41d51000 rwxp 00000000 00:00 0
55c51d9fd000-55c52033c000 r-xp 00000000 fc:01 1553231                    /root/v                                                                                                                                                             sm-godot-server/bin/TestHeadlessServer/vsm_test_server
55c52053b000-55c5205c8000 r--p 0293e000 fc:01 1553231                    /root/v                                                                                                                                                             sm-godot-server/bin/TestHeadlessServer/vsm_test_server
55c5205c8000-55c5205d2000 rw-p 029cb000 fc:01 1553231                    /root/v                                                                                                                                                             sm-godot-server/bin/TestHeadlessServer/vsm_test_server
55c5205d2000-55c520600000 rw-p 00000000 00:00 0
55c520bf1000-55c52364c000 rw-p 00000000 00:00 0                          [heap]
7f5c60000000-7f5c60021000 rw-p 00000000 00:00 0
7f5c60021000-7f5c64000000 ---p 00000000 00:00 0
7f5c667c0000-7f5c66840000 rw-p 00000000 00:00 0
7f5c66844000-7f5c668c4000 rw-p 00000000 00:00 0
7f5c668c8000-7f5c66948000 rw-p 00000000 00:00 0
7f5c6694c000-7f5c669cc000 rw-p 00000000 00:00 0
7f5c669d0000-7f5c66a50000 rw-p 00000000 00:00 0
7f5c66a54000-7f5c66ad4000 rw-p 00000000 00:00 0
7f5c66ad8000-7f5c66b58000 rw-p 00000000 00:00 0
7f5c66b5c000-7f5c66bdc000 rw-p 00000000 00:00 0
7f5c66be0000-7f5c66c60000 rw-p 00000000 00:00 0
7f5c66c64000-7f5c66ce4000 rw-p 00000000 00:00 0
7f5c66ce8000-7f5c66d68000 rw-p 00000000 00:00 0
7f5c66d6c000-7f5c66dec000 rw-p 00000000 00:00 0
7f5c66df0000-7f5c66e70000 rw-p 00000000 00:00 0
7f5c66e74000-7f5c66ef4000 rw-p 00000000 00:00 0
7f5c66ef8000-7f5c66f78000 rw-p 00000000 00:00 0

=================================================================
        Native stacktrace:
=================================================================
        0x55c51e128be5 - ./vsm_test_server : (null)
        0x55c51e128f81 - ./vsm_test_server : (null)
        0x55c51e11b4d1 - ./vsm_test_server : (null)
        0x55c51e099ea1 - ./vsm_test_server : (null)
        0x55c51e4ea302 - ./vsm_test_server : _ZN24WebSocketMultiplayerPeer13_ser                                                                                                                                                             ver_relayEiiPKhj
        0x7ffce2eb2120 - Unknown

=================================================================
        Telemetry Dumper:
=================================================================
Pkilling 0x7f5c7575f700 from 0x7f5c77150780
Entering thread summarizer pause from 0x7f5c77150780
Finished thread summarizer pause from 0x7f5c77150780.

Waiting for dumping threads to resume

=================================================================
        External Debugger Dump:
=================================================================
mono_gdb_render_native_backtraces not supported on this platform, unable to find                                                                                                                                                              gdb or lldb

=================================================================
        Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x55c51e4ea302):0x55c51e4ea2f2  ff 90                                                                                                                                                              38 01 00 00 48 8b 3c 24 44 89 ea 4c 89 e6  ..8...H.<$D..L..
0x55c51e4ea302  48 8b 07 ff 90 b8 00 00 00 48 8b 3c 24 89 c3 48  H........H.<$..                                                                                                                                                             H
0x55c51e4ea312  85 ff 74 5c e8 45 14 3d 01 84 c0 74 53 48 8b 2c  ..t\.E.=...tSH.                                                                                                                                                             ,
0x55c51e4ea322  24 48 89 ef e8 e5 6f 3a 01 84 c0 74 43 48 8b 45  $H....o:...tCH.                                                                                                                                                             E

=================================================================
        Managed Stacktrace:
=================================================================
=================================================================
Aborted (core dumped)

@JoshLee0915
Copy link
Contributor

So maybe I found something that may help. Digging a bit into the websocket code (mainly lws_server in 3.1 and wsl_server in 3.2) I found what I think might be causing my error in the disconnect_peer function around line 286

void WSLServer::disconnect_peer(int p_peer_id, int p_code, String p_reason) {
ERR_FAIL_COND(!has_peer(p_peer_id));
get_peer(p_peer_id)->close(p_code, p_reason);
}

From this code you can see that an error check is being done to check if the passed peer id is in the _peer_map. This check is being repeated again though in get_peer

Ref<WebSocketPeer> WSLServer::get_peer(int p_id) const {
ERR_FAIL_COND_V(!has_peer(p_id), NULL);
return _peer_map[p_id];
}

If this one fails though NULL is returned. This means if anything modifies that map between the first and second check and removes that peer the close method will be called on a NULL pointer causing the segfault. While this could be intentional I assume this method should actually just be logging the error and continuing on its way.

Due to this I think the way to fix it is similar to the fix that was applied to the websocket_multiplayer_peer by @Faless on PR #31482 and checking if the peer is null instead of just checking if the peer id exists.

I do not know if this would fix the issue that @asheraryam is encountering but there seems to be a fair number of instances of get_peer being used directly without null checks within websocket_multiplayer_peer so maybe one of those is failing under similar circumstances.

This is all just a theory right now though so I could be wrong. I have the changes implemented for 3.1 and plan on testing it tomorrow so maybe I might be able to get more useful information then.

@ETdoFresh
Copy link

ETdoFresh commented Aug 20, 2020

Hi, I'm having this exact problem on a simple asteroids game I'm working with websockets. I have two server builds...

  • In Windows being directly run from Godot Editor 3.2.2
  • In Docker Container (Ubuntu) being run via Godot-Server 3.2.2 [Godot_v3.2.2-stable_linux_server.64.zip]

In windows, I have a GUI and i tested the server by connecting and disconnecting 1-6 clients on and off for about 10 minutes. No crashes or errors to report.

In docker container, I run the same server scene and connect and disconnect 1-6 clients on and off for about 10 minutes. It randomly crashes on me with Segmentation fault (core dumped) somewhere in that time.

I've done as suggested above, and run gdb --args /usr/local/bin/godot-server --path /app and get the following logs:

(gdb) run
Godot Engine v3.2.2.stable.official - https://godotengine.org
[New Thread 0x7ffff7779700 (LWP 2481)]
[New Thread 0x7ffff7738700 (LWP 2482)]

Kasteroids Server v0.0.3
Starting Server...
Awaiting new connection...
A Client has connected!
... "various client connections/disconnections"
A Client has connected!

Thread 1 "godot-server" received signal SIGSEGV, Segmentation fault.
0x00000000008e36e0 in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) ()
(gdb) bt
#0  0x00000000008e36e0 in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) ()
#1  0x000000000184e5d7 in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) ()
#2  0x0000000001851112 in GDScriptInstance::call_multilevel(StringName const&, Variant const**, int) ()
#3  0x0000000000000000 in ?? ()

I am fairly sure it's not my code... but who knows? right??? If it is helpful, I'll try to boil it down to a more simple project than the one I'm working on, only if that's helpful to someone troubleshooting the problem. Please let me know! Thanks and much appreciation to those that know more than me and help fix these problems! Cheers... Your dev buddy... - ET

PS - If there is a more "pro" way to debug this, please let me know, and I'll dump that data too! Thanks again!

@Faless
Copy link
Collaborator

Faless commented Aug 21, 2020

I am fairly sure it's not my code... but who knows? right??? If it is helpful, I'll try to boil it down to a more simple project than the one I'm working on, only if that's helpful to someone troubleshooting the problem.

Yes, that would be really helpful.
I think this issue is related to #33290 / #35423 and must be addressed by the user if they get the Object was freed or unreferenced while signal 'connection_failed' is being emitted from it. Try connecting to the signal using 'CONNECT_DEFERRED' flag, or use queue_free() to free the object (if this object is a Node) to avoid this error and potential crashes. editor error (are you getting it when running the server from the editor?).

@ETdoFresh
Copy link

ETdoFresh commented Aug 22, 2020

Demo Project - Platform Buddies!

The sole purpose of this demo is to replicate the problem I'm having above.

https://github.com/ETdoFresh/PlatformBuddies

However, no luck just yet... I can't get the server to crash! Here are the main differences at this point in the demo...

  • Using Linode Docker Server [Debian 9, Nanode 1GB: 1 CPU, 25GB Storage, 1GB RAM] instead of my local NAS Docker Server
    UPDATE 8/22/2020: I put Kasteroids on the linode server, and I get the same error. So I can rule out the different server.
  • No Client Side Prediction [potentially saying something is wrong with my code in my asteroids project]
  • VSync Enabled in Platform Buddies! and Disabled in Kasteroids
  • My demos thus far have been using ws://.... whereas Kasteroids uses secure wss:// instead

A quick(ish) breakdown of the demo project...
Platform Buddies - Debugging Godot WebSocket Server Issue #33279

@ETdoFresh
Copy link

  • My demos thus far have been using ws://.... whereas Kasteroids uses secure wss:// instead

OK, I've made a little more progress. I decided to grab a letsencrypt certificate for PlatformBuddies!. Now I'm starting to see a warning/error on disconnect.

  • When I disconnect using web browser/html5 I get no errors.
  • When I disconnect using Windows .exe build (from editor/debug or release) I get the following error everytime I disconnect
    mbedtls error: returned -0x6c00

Source: https://github.com/ETdoFresh/PlatformBuddies
Windows: https://github.com/ETdoFresh/PlatformBuddies/releases/download/v0.0.1/PlatformBuddiesWindowsv0.0.1.zip
HTML5: http://www.etdofresh.com/PlatformBuddies/v0.0.1/

@Faless
Copy link
Collaborator

Faless commented Aug 23, 2020

When I disconnect using Windows .exe build (from editor/debug or release) I get the following error everytime I disconnect
mbedtls error: returned -0x6c00

This doesn't seem related to the issue at hand.

@ETdoFresh
Copy link

This doesn't seem related to the issue at hand.

Agreed. This was just an observation on my journey.... Well... FYI, I'm really close. I decided to work my way backwards from my Asteroids project towards my PlatformBuddies project, and I fixed the websocket server issue. I'm in the process of trying to inject the code that causes the issue in my Asteroids project into Platform Buddies and I'll paste that up here when I can crash the server! :P Hopefully soon, but I am leaving for the next couple hours. Be back later!

@ETdoFresh
Copy link

ETdoFresh commented Aug 23, 2020

Woot! So good news! I replicated the problem! The culprit (in my case) was a script I downloaded to show "stats". There is a call to "weakref" and "call" during the connection process which I think causes the websocket server to crash [more info below].

TLDR: Looks like a questionable script that spits out stats caused errors during the WebSocket connection process.

Here's the server log:

C:\Users\etgarcia\Desktop\PlatformBuddies>docker run -itp 11003:11003 etdofresh/platform_buddies
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/godot-server...
(No debugging symbols found in /usr/local/bin/godot-server)
Starting program: /usr/local/bin/godot-server --path /app
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f14ca360700 (LWP 12)]
Godot Engine v3.2.2.stable.official - https://godotengine.org
[New Thread 0x7f14ca105700 (LWP 13)]
[New Thread 0x7f14ca0c4700 (LWP 14)]

Platform Buddies! v0.0.5
0
0
0
A Client has connected! 172.17.0.1:53360
A Client has disconnected! 172.17.0.1:53360
A Client has connected! 172.17.0.1:53366
A Client has connected! 172.17.0.1:53372
A Client has connected! 172.17.0.1:53378
...various more like this...
A Client has connected! 172.17.0.1:53450 <<< CRASH!!!
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "godot-server" received signal SIGSEGV, Segmentation fault.
0x00000000008e36e0 in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) ()
(gdb) bt
#0  0x00000000008e36e0 in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) ()
#1  0x000000000184e5d7 in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) ()
#2  0x0000000001851112 in GDScriptInstance::call_multilevel(StringName const&, Variant const**, int) ()
#3  0x0000000000000000 in ?? ()

I have committed the bad code into this repository (v0.0.5):
https://github.com/ETdoFresh/PlatformBuddies/releases/tag/v0.0.5

Here is the psuedo "trace" of the problem:

var server = WebSocketServer.new()
server.connect("client_connected", self, "create_player")
func create_player(id, _protocol):
    var client = server.get_peer(id)
    var input = NETWORK_INPUT.instance()
    add_child(input)
    var character = spawn_random_character(input)
    ## ERROR LINE BELOW: ------
    $CanvasLayer/ServerStats.add_stat("X", input, "x", false)
extends Panel

var stats = []

func add_stat(stat_name, object, stat_reference, is_method):
    stats.append([stat_name, object, stat_reference, is_method])

func _process(_delta):
    var label_text = ""
    for stat in stats:
        var value = null
        if stat[1] and weakref(stat[1]).get_ref(): # MY GUESS IS THIS
            if stat[3]:
                value = stat[1].call(stat[2]) # OR THIS THAT IS CAUSING THE ISSUE
            else:
                value = stat[1].get(stat[2])
        label_text += str(stat[0], ": ", value)
        label_text += "\n"
    $VBoxContainer/Value.text = label_text

@Faless
Copy link
Collaborator

Faless commented Aug 23, 2020

TLDR: Looks like a questionable script that spits out stats caused errors during the WebSocket connection process.

Yeah, the culprit doesn't seem related to the websocket implementation at all. Even from the stack trace, it seems a GDScript call to a freed object

@Faless Faless closed this as completed Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants