Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose ENet peer timeout settings #20056

Open
supagu opened this Issue Jul 9, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@supagu
Copy link
Contributor

supagu commented Jul 9, 2018

Bugsquad edit: This issue is about exposing http://enet.bespin.org/group__peer.html#gac48f35cdd39a89318a7b4fc19920b21b


So using the latest godot master branch. I'm working on a multiplayer game. Now sometimes I get about half an hour in to the game and there is a lag spike then I get a disconnect as a result.

It seems as if some packet exceeds its timeout, which causes a disconnect by the server. The client then locks up in PacketPeerUDPPosix::_poll while it waits for recvfrom (which is meant to be non-blocking? so how it can block there is beyond me). After about 30 seconds the client will continue and detect the disconnect. During this time, the game is totally locked up. I can't even get a signal to be emitted that the client is waiting for the server. This should be fixed.

More importantly, there should be a way to change the timeout settings for ENET. It seems a packet has about 1.6 seconds to be acknowledged as being received from the client, so that means its got about 0.8seconds to get tot he client, then 0.8seconds back at most. Any longer and then server will drop them.

Exposing a timeout would allow me to better handle low frame rates, playing over the internet. It would also allow me to easier to debug the network code - I would be able to set a break point and examine code and resume a multiplayer game without the fear of hitting a break point followed by a sudden disconnect.

@alex9099

This comment has been minimized.

Copy link

alex9099 commented Jul 10, 2018

Would you mind creating a test project were this happens?

@supagu

This comment has been minimized.

Copy link
Contributor Author

supagu commented Jul 11, 2018

NetworkTest.zip

You will need to edit the IP your connecting to in the script on the CanvasLayer.
I emulate a low frame rate, combined with a method taking a different amount of time on the client vs the server - which really seems to be the killer.

Once the client is disconnected, the whole app becomes unresponsive.

@alex9099

This comment has been minimized.

Copy link

alex9099 commented Jul 11, 2018

I can reproduce the disconnect part, but for me it is not freezing after disconnecting,

I'm still trying to figure it out (i can't fully understand the implementation), but @Faless might be able to help :)

@Faless

This comment has been minimized.

Copy link
Contributor

Faless commented Jul 12, 2018

@supagu
In the example provided:

  • There is delay in _process.
  • Then there is delay in do_it_everywhere.
  • In do_it_everywhere one of the two peers (the server in this case) has less delay than the client.
  • Which means it generate more calls to do_it_everywhere overall for the client.
  • Which in turns generate even more delay in the client.
  • Up to a point, when there are, say, 40 calls executed in a single _process in the client.
  • Which generates > 30 secs delay in a single _process (we are talking 0.025 FPS).

And THEN the clients disconnect.
This is normal behaviour, your client shouldn't freeze for such a long time.
And in any case, the example generate a strictly increasing amount of delay. So no matter what you do, given enough time, it will timeout.

The client then locks up in PacketPeerUDPPosix::_poll while it waits for recvfrom

It shouldn't lock there, how do you know that is actually the function where it locks? Did you get a stack trace?
Do you have any OS.delay_usec or OS.delay_msec in your project? At least in the example, that where it "locks" (as explained above, and tested with a debugger)

@supagu

This comment has been minimized.

Copy link
Contributor Author

supagu commented Jul 12, 2018

Well I'm using linux, the lock up seems to be coming from the "recvfrom" call, which is specific to linux I think. This is meant to be a non-blocking function call but somehow it ends up blocking for at least 30 seconds and the game becomes totally unresponsive.

Yeah ideally my app will be running at a higher frame rate but it's not at the moment, so having some way to allow my game to continue despite such poor frame rates would be ideal - and allow me to debug network code also.

@Faless

This comment has been minimized.

Copy link
Contributor

Faless commented Jul 12, 2018

@supagu

This comment has been minimized.

Copy link
Contributor Author

supagu commented Jul 12, 2018

Every time it is locked up, when i break with my IDE (qt) that is where the call stack points me.

@OvermindDL1

This comment has been minimized.

Copy link

OvermindDL1 commented Jul 12, 2018

Every time it is locked up, when i break with my IDE (qt) that is where the call stack points me.

You sure it is just not grabbing the wrong thread or so? What are the callstacks on the other threads?

@supagu

This comment has been minimized.

Copy link
Contributor Author

supagu commented Jul 12, 2018

Im pretty sure it is the primary thread. I think recvfrom is only called by the primary thread.

@Faless Faless self-assigned this Aug 31, 2018

@Faless

This comment has been minimized.

Copy link
Contributor

Faless commented Sep 2, 2018

@supagu you are right, after further investigation I found out that when the client disconnected, it would try to reopen the connection, but without setting the non-block option on the socket, causing the locking. The is fixed in #21692 .

I suggest we move the discussion on the ENet Timeout setting to #21121 where they propose to set the timeout to 0. Keep in mind that the MultiplayerAPI is polled once each frame and flushes all pending packets.

EDIT: I only now understand you are referring to the peer timeout setting.
I'm leaving this open, but will remove the assignment, since the locking bug is fixed.

@Faless Faless removed their assignment Sep 13, 2018

@akien-mga akien-mga changed the title Expose ENET Multiplayer Timeout Settings Expose ENet peer timeout settings Sep 15, 2018

@akien-mga akien-mga removed this from the 3.1 milestone Sep 15, 2018

@Faless Faless added this to the 3.2 milestone Dec 26, 2018

@Faless Faless self-assigned this Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.