Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rememberchannelduration option is ignored after restarting murmur #5639

Closed
Hartmnt opened this issue Apr 29, 2022 · 8 comments · Fixed by #5646
Closed

rememberchannelduration option is ignored after restarting murmur #5639

Hartmnt opened this issue Apr 29, 2022 · 8 comments · Fixed by #5646
Labels
bug A bug (error) in the software server

Comments

@Hartmnt
Copy link
Member

Hartmnt commented Apr 29, 2022

Description

We use the rememberchannelduration server option on our servers. Last used channels are supposed to be remembered for 5 minutes for potential connection losses and such, but otherwise clients are expected to connect directly to the root channel. This usually works as expected. However, the murmur server is restarted over night every couple of weeks for maintenance.

Expected Behavior:
The server restart has no effect on the behavior of rememberchannelduration. Clients connect to the root channel, except when they reconnect within the specified grace period.

Actual Behavior:
On the first connection of a client after the server restarted the remembered channel is used, even if the rememberchannelduration has long expired.

Steps to reproduce

  1. Start murmur with rememberchannelduration > 0, rememberchannel = true and multiple permanent channels
  2. Make sure the test user is registered
  3. Make sure the channel remember delay works in regular use
  4. Disconnect the user
  5. Restart murmur
  6. Let the rememberchannelduration expire
  7. Reconnect

Mumble version

1.4.0

Mumble component

Server

OS

Linux

Reproducible?

Yes

Additional information

Offending Code:
Upon examining the code of the feature pull request #4147 we can see two things which are handled weirdly/wrongly:

  1. For a reason I do not fully understand the last_disconnect field is set to NULL when starting murmur. CODE
  2. And the logic in ServerDB.cpp is broken when rememberchannel == true, rememberchannelduration > 0 and last_disconnect is equal to NULL. rememberchannelduration is ignored and the server assigns the remembered channel anyway (as long as the channel still exists). CODE

Proposed Solution:
To fix this murmur could

  1. not set last_disconnect to NULL when starting.
  2. fix the logic in ServerDB.cpp by returning -1 in case last_disconnect is equal to NULL AND rememberchannelduration > 0.

Or possibly both.

Relevant log output

No response

Screenshots

No response

@Hartmnt Hartmnt added bug A bug (error) in the software triage This issue is waiting to be triaged by one of the project members labels Apr 29, 2022
@Krzmbrzl Krzmbrzl added server and removed triage This issue is waiting to be triaged by one of the project members labels May 1, 2022
@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 1, 2022

Looking back at the conversation in #4147 it seems that the reasoning for clearing last_disconnect on server start is this:

The last_disconnect field is normally set whenever a user disconnects from the server. However, when the server shuts down, while the user is still connected, apparently this value does not get set to the time when the server shuts down (I am no longer sure whether this was just an assumption or fact). Thus, the field would contain wrong data and if the user has been connected for a while before the shutdown, the user might get reconnected to Root instead of their old channel, even if the server restarts right after having been shutdown.

Since timing the update of last_disconnected field with the shutdown (to ensure it gets updated while the respective user object is still valid) appeared to be a bit complicated, we decided that it will be easier to just wipe that value from the DB after a server restart.

So I guess the first issue you found is actually code working as intended (though we should probably add a comment to that line explaining why it is there).
Therefore, I think the second way to fix this that you suggested is the way to go here. Do you want to create a PR yourself or do you want me to take care of this?

@Hartmnt
Copy link
Member Author

Hartmnt commented May 2, 2022

Therefore, I think the second way to fix this that you suggested is the way to go here. Do you want to create a PR yourself or do you want me to take care of this?

Well, I can certainly create a pull request for a fix, but I think there is still a problem with the situation here...

The last_disconnect field is normally set whenever a user disconnects from the server. However, when the server shuts down, while the user is still connected, apparently this value does not get set to the time when the server shuts down (I am no longer sure whether this was just an assumption or fact). Thus, the field would contain wrong data and if the user has been connected for a while before the shutdown, the user might get reconnected to Root instead of their old channel, even if the server restarts right after having been shutdown.

Since timing the update of last_disconnected field with the shutdown (to ensure it gets updated while the respective user object is still valid) appeared to be a bit complicated, we decided that it will be easier to just wipe that value from the DB after a server restart.

If we fix 2) without touching 1) that means that all clients will connect to the root channel instead of their last channel, if the server was restarted independent of their actual last_disconnect. Since murmur sets the column to NULL to prevent this in the first place, we would create the exact opposite problem.

Can we reliably assume that last_disconnect is valid, if and only if last_disconnect >= last_active? If the server shuts down, active clients would not update last_disconnect leaving it at NULL or < last_active. Therefore, we could check that condition and decide, if to use last_disconnect or not? Of course we MUST check this condition before the clients last_active field is updated or save the old value in memory for later. (Edit: And we would of course not set last_disconnect to NULL in that case)

If murmur saves its starting datetime, we can use that to compare to rememberchannelduration when the above check tells us last_disconnect is invalid.

I think if we are careful about using the correct values at the correct time, this procedure will not produce errors (?) I would appreciate, if someone would double-check my logical reasoning.

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 2, 2022

Since murmur sets the column to NULL to prevent this in the first place, we would create the exact opposite problem.

Indeed. However, I think in this case the argument could be made that by restarting the server, you essentially also restart from a clean slate and thus start connecting to Root. It would certainly be not ideal though.

Can we reliably assume that last_disconnect is valid, if and only if last_disconnect >= last_active?

This sounds reasonable to me. I can't come with a scenario in which this assumption would break 🤔

If murmur saves its starting datetime, we can use that to compare to rememberchannelduration when the above check tells us last_disconnect is invalid.

Do you intend to use the server's starting datetime as a replacement for last_disconnect?
This then holds the assumption that every case in which last_disconnect is invalid is due to the user having been connected while the server was shut down... Under normal circumstances this appears as if it should hold 👀

However, I think the first thing that should be tested is whether last_disconnect is really not set properly when the server is shut down. If it isn't then this seems to indicate that the server does not shut down very gracefully and the better fix to this entire issue might be to improve the shutdown procedure to first disconnect all users (properly, using the regular disconnect logic) and only if the server is empty actually shut it down. In this case we should always end up with the properly set last_disconnect and could just always use that, provided it is not NULL 🤔

@Hartmnt
Copy link
Member Author

Hartmnt commented May 2, 2022

Do you intend to use the server's starting datetime as a replacement for last_disconnect?

Yes, in an ideal world this would be used as replacement for maximum accuracy. For the sake of simplicity, we could also just assume that invalid last_disconnect values must mean the client was active when the server shut down. But if we have the server start datetime available, i think that should be used instead.

However, I think the first thing that should be tested is whether last_disconnect is really not set properly when the server is shut down. If it isn't then this seems to indicate that the server does not shut down very gracefully and the better fix to this entire issue might be to improve the shutdown procedure to first disconnect all users (properly, using the regular disconnect logic) and only if the server is empty actually shut it down. In this case we should always end up with the properly set last_disconnect and could just always use that, provided it is not NULL

True, but I think there are reasonable scenarios where we do not expect the last_disconnect to be updated correctly, even if the server close routine is improved. (SIGKILL for instance)

Here is the neat thing: My logic would work whether the server close routine is improved or not. If last_disconnect were to always be updated without race conditions or similar considerations, the logic will see that the value is valid and use it.
If there are unexpected scenarios where last_disconnect is still not updated correctly, the logic would fall back and not use last_disconnect.

I am willing to work on a pull request for my proposed solution, but for improving the server close logic I think I lack the necessary experience in the mumble code base. Maybe a complete test and audit of the disconnect logic would deserve its own Issue.

Do you want me to start working on the above logic?

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 2, 2022

Do you want me to start working on the above logic?

Yes, please. In so doing, maybe you could also just test whether last_disconnect is currently properly set when shutting down the server. If it is not, I might look into improving the shutdown routine myself.

@Hartmnt
Copy link
Member Author

Hartmnt commented May 2, 2022

@Krzmbrzl I have two questions regarding the implementation:

  1. The previous implementation uses both setTimeSpec(Qt::UTC) and toTimeSpec(Qt::UTC). Is there a reason why now is copied using the latter? I would rather just use the former method on all local objects. CODE
  2. I have searched for possible server starting datetime objects in the existing code and it appears there is none. I would propose adding a public member QDateTime to the Server class in Server.cpp and initializing it to QDateTime::currentDateTime() as this would work per virtual server instance, right? This member could also be used in the future, if there is any need. Any thoughts?

Edit: Nevermind, I found tUptime in the server class which can be used to calculate the server start datetime.

@Hartmnt
Copy link
Member Author

Hartmnt commented May 3, 2022

Yes, please. In so doing, maybe you could also just test whether last_disconnect is currently properly set when shutting down the server. If it is not, I might look into improving the shutdown routine myself.

With the clearLastDisconnect routine removed:

The server does NOT update last_disconnect when receiving SIGTERM, SIGKILL or SIGQUIT.
The server does update last_disconnect when the user manually quits client side or times out.

Do you want me to try more signals or scenarios?

@Krzmbrzl
Copy link
Member

Krzmbrzl commented May 4, 2022

The previous implementation uses both setTimeSpec(Qt::UTC) and toTimeSpec(Qt::UTC). Is there a reason why now is copied using the latter? I would rather just use the former method on all local objects. CODE

No idea, why this was done in this way. Looking at this now, it does indeed appear a bit odd 👀

The server does NOT update last_disconnect when receiving SIGTERM, SIGKILL or SIGQUIT.
The server does update last_disconnect when the user manually quits client side or times out.

Okay, thanks for testing this. In that case I guess we'll have to look into that eventually...

Do you want me to try more signals or scenarios?

Nah, I think this is sufficient, thanks!

Hartmnt added a commit to Hartmnt/mumble that referenced this issue May 5, 2022
The previous implementation of rememberchannelduration (mumble-voip#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of mumble-voip#4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes mumble-voip#5639
Hartmnt added a commit to Hartmnt/mumble that referenced this issue May 5, 2022
The previous implementation of rememberchannelduration (mumble-voip#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of mumble-voip#4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes mumble-voip#5639
Hartmnt added a commit to Hartmnt/mumble that referenced this issue May 5, 2022
The previous implementation of rememberchannelduration (mumble-voip#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of mumble-voip#4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes mumble-voip#5639
Hartmnt added a commit to Hartmnt/mumble that referenced this issue May 5, 2022
The previous implementation of rememberchannelduration (mumble-voip#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of mumble-voip#4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes mumble-voip#5639
Krzmbrzl pushed a commit to Hartmnt/mumble that referenced this issue May 5, 2022
The previous implementation of rememberchannelduration (mumble-voip#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of mumble-voip#4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes mumble-voip#5639
Krzmbrzl added a commit that referenced this issue May 5, 2022
…logic

The previous implementation of rememberchannelduration (#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of #4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes #5639
Krzmbrzl pushed a commit to Krzmbrzl/mumble that referenced this issue May 8, 2022
The previous implementation of rememberchannelduration (mumble-voip#4147)
acknowledged the unreliability of the last_disconnect database
field. It used a workaround to set the last_disconnect field
to NULL for every user on every start of the mumble server.
However, with a default behavior for NULL, this created the
unintended side-effect that rememberchannel was erroneously
used for every user on their first connection after a server
restart.

This new implementation reverts the workaround of mumble-voip#4147 and
uses the last_active database field to check if last_disconnect
is reliable. If last_disconnect is unreliable, it will now use
the server uptime and compare it to rememberchanneduration.
We expect last_disconnect to be unreliable when the server
was shut down while clients are still connected.

Fixes mumble-voip#5639

(cherry picked from commit ccbf407)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug (error) in the software server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants