Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teleports sometimes cause a disconnect when attempted right before Event Queue's long-polling timeout #16

Closed
SaladDais opened this issue Jun 21, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@SaladDais
Copy link
Owner

SaladDais commented Jun 21, 2021

The sim assumes that once it's sent the TeleportFinish event over the event queue, it can kill the event queue cap and that the viewer has been handed off to the new region. If viewer times out the EQ connection just as the TeleportFinish is sent, then the proxy will have read the TeleportFinish response, but the viewer won't have. This should be covered by the EventQueue's explicit acking mechanism, but it doesn't seem to work properly. It appears the server considers an event acked so long as the response bytes were sent off and immediately discards them. It should only discard messages once the viewer polls with an id that's not greater than the ack value POSTed by the viewer, but it discards them unconditionally. I'm not sure if this is intentional or if it's always been like this.

Since the viewer won't know it was sent the TeleportFinish it will keep trying to read the event queue CAP, which will never re-serve the TeleportFinish. CrossedRegion probably has the same problem, I haven't tested.

This seems to be a general problem with SL that's made worse when using an HTTP proxy, since the proxy may leave its connection to the server open and consume the event after the client timed out their connection. We can hack around that by always storing the last EQ response for a sim if there were events, along with the client's ack value in the request.

The sim's EQ implementation will need to be changed to actually make use of the ack value that gets posted and discard events that haven't been acked for this to be fully fixed.

@SaladDais SaladDais added the bug Something isn't working label Jun 21, 2021
@SaladDais SaladDais changed the title Teleports sometimes fail when they happen right before Event Queue's long-polling timeout Teleports sometimes cause a disconnect when attempted right before Event Queue's long-polling timeout Jun 21, 2021
@SaladDais
Copy link
Owner Author

Turns out this also affects viewers without proxies. The window for it happening is 150ms~ every 30s (.5% likelihood) rather than 1s every 30s (3.33% likelihood) when proxied. Teleports fail every time if you attempt a teleport .3 seconds before the EQ poll's HTTP timeout would trigger. Might be the cause of random TP crashes.

@SaladDais
Copy link
Owner Author

It's mostly worked around in the proxy now, so closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant