New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
websocket disconnects after pairing #11989
Comments
Indeed, I tried briefly looking at this; hopefully other developers looking at issues can help put it into perspective since at the moment it seems to be over my head. From that forum thread:
... and finally:
|
possible ws connection status & lila onlineUserSet out of sync issue |
I am relaying comments from https://lichess.org/forum/lichess-feedback/lag-spike-at-game-start-unable-to-move-and-game-is-aborted-after-time-limit-is-reached?page=3#21 which may or may not apply in this context: Next example is hard to read, not nesccesarily spam. Those requests make most of the problems, as they get often stalled: You can also see timings on message reguests themselves.
looks like those previous messages contain mainly chat presence, public presense in chat or tournament and etc.
There are also some random errors like this:
And some cosmetic bug -in every page load:
|
Here i can provide several client side logs .har format, you can use them via browser developer tools feature -import to see all saved internals. I try to provide those more problematic ones. If looking inside logs, you can consider my ping very stable 48-49 milliseconds. Everything else (media, fonts, board items, audio, java-script) looks like running very fast micro and milliseconds. Some ideas i currently have: i do not think, it's software. Too random are those socket stalls, drops. There is need to see what happens on and between lichess virtual servers those times. I do not know, if they did change anything on server level yesterday or today, but today most tournaments i was present ran almost perfectly without hiccups whatsoever. If any questions or need something more specific data, please ask. |
Some thoughts, do not know if this is right place to post them or who may be interested if anyone at all. For being successful in more complicated issue tracking, specially those like socket errors, happening possibly in server software level or on database interactions level and/or networking stack congestion's should be taken into account also on par with servers overall load monitoring, with timestamps if needed. So, if servers are in located OVH hosted environment, do anybody from server administering persons or developing teems have access to virtual machinery's load monitoring? If OVH is not allowing access to servers performance monitoring or graphs, then there are possibility to add something like Nagios system or equivalent monitoring on hosting system by admins. Another way to achieve such task is to run some simple scripts like atop or similar, paying attention to CPU, Memory and filesystem IO wait and stats and create alerts by findings. As for client-side, isn't it possible to make some small add-on script with on/off "knob"t o client interface, who will volunteer in client-side monitoring, activating performance analyzer or network-tools logging then by script in their browsers and automatic sending of logs after time intervals. So, just ideas... maybe that wheel is allready invented and i just do not know it :) |
That is very much the goal of #11991. We would start small with lichess log statement persistence and collection but once that is working, data from any number of analysis tools can be added if they opt in. Regarding event triggers such as unexpected socket errors, the current plan is to give users the option to review everything they'd be sending at the point of submission. This is mainly to reinforce that we're not harvesting their grocery lists and christmas card labels. Implementations may always be revised, but this is my current plan. |
Last one shouldn't be too big deal. I just tried as experiment both with Mozilla and Vivaldi browser - network tools log runs by default only in boundaries of one tab it was initially started on, so browsing on other pages does not affect log collected as long network analyzer is not started on that tab too. But, for security reasons or for paranoid people - overview of sent material may be safer if they commit log sending by themselves. That of cause limits sending process automation. |
Hey all, we've got the client log delivery mechanism in place (finally). Can anyone still reproduce this though? |
I do not see implementation on lichess clients web interface. So how could one invoke this log delivery? |
https://lichess.org/#debug but we are very sparse with logging there. Log statements need to be tailored to each issue. Currently we have statements in place for general lila-ws troubles but nothing centered around pairing. That's why I wanted to reach out and see if you guys could still reproduce this specific issue. I could add some statements and try to gather us useful data after the next asset deployment. |
Hi there.
For me it was never really pairing or lag issue.
I will touch few ideas here, but there is big time since... i really payed attention. And for most of the time (last months), server performance has got much better.
As i do not see server workloads in realtime or in logs, it's a guess, based on my own experience with servers.
It's propably java VM problem. I have seen similar thing on SAP enterprice production services.
On electronic factory production, we never found total final solution for java issues, as production server was needed to be restarted after some 2-3 days, even when java VM parameters were fine tuned.
And when it (java VM) got to it's limits - it slowed down all server processes by overhouling servers farm memory. From that point - restart and everything was fine till next time.
I suggest you to compare perfomance on basis of clients gateway to server. Mobile platform does much better, for whatever reason it does.
Looked from here (and yeah by - guessing) and measuring performance with web developer tools, it was mostly websockets timeouts, closes and stalls inside server farm.
As answer, today there was no longer websocket times than 1,3 second. No socket drops yet.
Issue, if still there, may be daytime dependant.
We will see in close furure.
Unfortunately, i can not measure things on the evenings.
…________________________________
Saatja: Jonathan Gamble ***@***.***>
Saadetud: esmaspäev, 22. jaanuar 2024 12:02
Adressaat: lichess-org/lila ***@***.***>
Koopia: Kudzu12 ***@***.***>; Comment ***@***.***>
Teema: Re: [lichess-org/lila] websocket disconnects after pairing (Issue #11989)
https://lichess.org/#debug but we are very sparse with logging there. Log statements need to be tailored to each issue. Currently we have statements in place for general lila-ws troubles but nothing centered around pairing. That's why I wanted to reach out and see if you guys could still reproduce this specific issue. I could add some statements and try to gather us useful data after the next asset deployment.
—
Reply to this email directly, view it on GitHub<#11989 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A4SSC2BXOMRBROASUIRBYRTYPY2JTAVCNFSM6AAAAAASSZICMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBTGY2DEOBRGA>.
You are receiving this because you commented.
|
So you experience poor websockets performance/uptime in general? There is an alternate IP with different routing we can try. PM me on lichess if you're interested. |
https://lichess.org/forum/lichess-feedback/lag-spike-at-game-start-unable-to-move-and-game-is-aborted-after-time-limit-is-reached?
We are trying to see if there is a socket disconnection issue and which side it is on pls can we get some help
The text was updated successfully, but these errors were encountered: