New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix re-login behavior of remote clients to a server #3486

Merged
merged 11 commits into from Mar 14, 2018

Conversation

Projects
None yet
7 participants
@adcxyz
Copy link
Contributor

adcxyz commented Feb 1, 2018

*** PR Info text completely redone ***
Thanks to questions by @muellmusik and @brianlheim,
the cases covered in this PR became much clearer to me.
The general issue is handling repeated logins in network performance situations,
to recover as gracefully as possible from loss of connection between client and server,
whether by network glicht or client crash.
(server crash is outside the scope of this PR, but should be fully thought through elsewhere.)

Case 1: network connection is lost,
and the same client, same client-side server object logs in again.

  • the server program remembers the client, and sends its previous clientID back
  • then the client knows it is connected again,
  • and it can keep all its known buffers, nodes, synths as they are.
    -> so, just post that connection is back, and all is well.

Case 2: client crashes, and is restarted;
then the new client logs in again from the same network address:

  • the client-side server object is new, but the server program knows the address,
    so the server program sends its previous clientID back to the new client.
  • because the new server object may not know its previous clientID,
    the server program response may trigger new allocators - which is correct
  • there may be buffers and running nodes and synths on the server program
    that the new server object does not know about:
    depending on the specific setup used,
  • it may make sense to get rid of these by hand ( server.defaultGroup.release )
  • and reinit the server by running server.statusWatcher.prFinalizeBoot by hand.
    -> so, post about successful reconnect, and advice about release/reinit code.

(Note for later: maybe develop functionality to retrieve these resources
from the server program in the client.)

Case 3: loss of connection and login from a different network address:
This will look like a new login to the server, so the old defaultGroup
and everything in it keeps going, and the new client gets a new clientID.

In the future, this case could be handled better by releasing the previous notify registration,
and then asking for the same clientID it had earlier, to continue seamlessly from there.
This would need some more effort to work.

The tests provided also cover cases 1 and 2:
TestServer_clientID_booted:test_repeatedLogin tests case 1
TestServer_clientID_booted:test_remoteLogins tests a first login by the Server:remote method,
and a repeated login from a new server object after a simulated crash.

fork {
	Server.killAll;
	0.1.wait;
	TestServer_clientID_booted.new.test_repeatedLogin;
	0.1.wait;
	TestServer_clientID_booted.new.test_remoteLogins;
};

@snappizz snappizz added this to the 3.9.2 milestone Feb 1, 2018

@adcxyz adcxyz requested review from brianlheim and telephon Feb 2, 2018

// -> so the response should go thru prHandleNotifyFailString

remote = Server.remote(\remTest, homeServer.addr, homeServer.options);
// Server.default = remote; // to test IDE server display

This comment has been minimized.

@brianlheim
remote.serverRunning.not and: { timeout > 0 }
} {
timeout = timeout - dt;
dt.wait;

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

use a condition variable here


// make s play dead now, but leave the process running
homeServer.statusWatcher.stopStatusWatcher.stopAliveThread.serverRunning_(false);
// homeServer.serverRunning.postln; // client thinks it is dead

This comment has been minimized.

@brianlheim

this.assert(homeServer.clientID == 3,
"homeServer gets requested clientID back from server process."
);

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

this should be a separate test or removed altogether

This comment has been minimized.

@adcxyz

adcxyz Feb 3, 2018

Contributor

ok, done

);

// make s play dead now, but leave the process running
homeServer.statusWatcher.stopStatusWatcher.stopAliveThread.serverRunning_(false);

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

chaining these calls is somewhat confusing IMO

@@ -544,7 +544,8 @@ Server {
case
{ failString.asString.contains("already registered") } {
"% - already registered with clientID %.\n".postf(this, msg[3]);
statusWatcher.notified = true;
statusWatcher.prRecoverRemoteLogin(msg[3]);

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

could use a comment - without looking I don't know what msg[3] is

This comment has been minimized.

@adcxyz

adcxyz Feb 3, 2018

Contributor

done.

}, AppClock)
}

prRecoverRemoteLogin { |clientID|

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

needs to be added to the list of private methods in the help file

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

what is this name supposed to mean, btw? not immediately apparent to me, sorry

This comment has been minimized.

@adcxyz

adcxyz Feb 3, 2018

Contributor

hm, maybe really not a good name.
would you find prHandleRemoteLoginWhenAlreadyRegistered clearer?

// -> this client netaddr is already registered, and should say so!
// -> so the response should go thru prHandleNotifyFailString

remote = Server.remote(\remTest, homeServer.addr, homeServer.options);

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

flow of logic would make more sense to me if this server were created right after the homeServer; does this constructor call depend on preceding code in a way I'm not seeing?

"homeServer gets requested clientID back from server process."
);

// make s play dead now, but leave the process running

This comment has been minimized.

@brianlheim

brianlheim Feb 2, 2018

Member

s -> homeServer

@muellmusik

This comment has been minimized.

Copy link
Contributor

muellmusik commented Feb 2, 2018

Just two quick questions @adcxyz:

If clientIDs are handed out by the server, then if the server recognises the client address, it can just give the same ID or? Or is this for the backwards compatible manually specified ID case?

Does it make sense that a client necessarily gets the same ID? If you have a lang crash, leaving orphaned nodes, then getting the same ID could lead to resource conflicts. There are various ways of dealing with that of course (e.g. gracefully free orphaned nodes using predictable default group), but you might want different ones for different use cases.

@adcxyz

This comment has been minimized.

Copy link
Contributor

adcxyz commented Feb 2, 2018

@muellmusik :

If clientIDs are handed out by the server, then if the server recognises the client address, it can just give the same ID or?

Yes, exaclty. This is what the server program has been doing all along, and I think is the clearest and most predictable behavior for most cases, as listed below.

Or is this for the backwards compatible manually specified ID case?
No, not specifically - when you manually ask for a clientID, and it is free, you get that;
when it is not free, you get a free one instead.
When your are recovering a lost login, the recovered previous clientID overrides a manually specified ID. Users always get back a clientID that will work, as long as there are free ones.

Does it make sense that a client necessarily gets the same ID? If you have a lang crash, leaving orphaned nodes, then getting the same ID could lead to resource conflicts. There are various ways of dealing with that of course (e.g. gracefully free orphaned nodes using predictable default group), but you might want different ones for different use cases.

I think keeping previously known IDs is the clearest and most predictable behavior for most cases I can think of:
Case 1: remote server login code accidentally runs twice -> same ID, nothing breaks.
(still needs full testing whether node allocator is being reset in this case)
Case 2: remote client loses network connection, and logs in again:
gets the same clientID, so the same defaultGroup, and all your synths etc should still be there.
(still needs full testing whether node allocator is being reset in this case)
Case 3: remote client crashes, and logs in again:
gets the same clientID and can then decide how to handle things gracefully:
release or free their (same as before) defaultGroup, or whatever else is called for.
(allocator reset is desirable here)
Case 4: a remote client gets lost and leaves sounds running:
Because all default groups are known to all clients, other users can intervene and gracefully end or release everything in that client's defaultGroup.
Case 5: multiple losses of contact in one show: if the server program keeps handing out new clientIDs, it would run out of clientIDs quickly, so handing back the previous ones is also safer for this case.

hope that answers your questions?
best a

@adcxyz

This comment has been minimized.

Copy link
Contributor

adcxyz commented Feb 2, 2018

hi @brianlheim, thanks for the detailed comments,
will get back to them ASAP - off to a show with Julian and Marcus Schmickler now!

@muellmusik

This comment has been minimized.

Copy link
Contributor

muellmusik commented Feb 2, 2018

hope that answers your questions?

Indeed! Thanks for the detailed and patient explanation. :-)

@brianlheim

This comment has been minimized.

Copy link
Member

brianlheim commented Feb 2, 2018

off to a show with Julian and Marcus Schmickler now

Jealous! Have a great time!

@adcxyz

This comment has been minimized.

Copy link
Contributor

adcxyz commented Feb 5, 2018

Hi @brianlheim and @muellmusik,
thanks for the questions, and comments they made me rethink and rewrite this one quite a bit!
I think it is much clearer now what this PR aims to improve - please see updated top comment
This is the thread that discussed the problem:
https://www.listarc.bham.ac.uk/lists/sc-users/msg59049.html
best adc

@telephon
Copy link
Member

telephon left a comment

one minor request, should be easy to fix. Looks very good -- it'll be a great relief not having to teach the kids to kill servers on a daily basis.

"// so you may want to release currently running synths by hand:\n".postln;
"%.defaultGroup.release;\n".postf(server.cs);
"// and you may want to redo server boot finalization by hand:\n".postln;
"%.statusWatcher.prFinalizeBoot;\n".postf(server.cs);

This comment has been minimized.

@telephon

telephon Feb 5, 2018

Member

If we want to suggest this line as something to actually do, it shouldn't involve a private message. What about adding:

+ Server {
		finalizeBoot { this.statusWatcher.prFinalizeBoot }
}

then the line above would read:
"%.finalizeBoot;\n".postf(server.cs);

This comment has been minimized.

@telephon

telephon Feb 5, 2018

Member

I'm not sure about the name ...

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

IIUC this solution is a hack, so it's ok to call a private function. Adding access to a private function as a public function seems worse because it has to be walked back with deprecation.

This comment has been minimized.

@telephon

telephon Feb 5, 2018

Member

I see, makes some sense. My only worry is that the public use of private methods might be perceived as the standard way to go.

So let's note it in the comment that this is a temporary measure which will be replaced in future versions?

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

like the comment above the method says, the whole thing is about making it possible to recover in an emergency, so yes, it is a hack.

// cleanup
remote1.remove;
remote2.remove;
unixCmd("kill -9" + serverPid);

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

this is not portable at all

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

OK. Is there a crossplatform version of kill?
or is it OK to use Server.killAll in the testsuite?
I would have preferred that, but assumed it is cleaner to only kill the exact process and nothing else.

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

Is there a crossplatform version of kill?

Windows uses taskkill.

is it OK to use Server.killAll in the testsuite?

No. The single process approach is much more sound. Since we have Platform.killAll already, it would probably not be such a bad idea to add killProcess that accepts a pid. Just a suggestion. That should be done in a separate PR.

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

ok

This comment has been minimized.

@adcxyz

adcxyz Feb 6, 2018

Contributor

I have near zero windows shell skills - would the methods below work?
in UnixPlatform:
kill { |pid|
("kill -9 " ++ pid).unixCmd;
}
and in WindowsPlatform:
kill { |pid|
("taskkill /F /pid " ++ pid).unixCmd;
}

This comment has been minimized.

@brianlheim

brianlheim Feb 6, 2018

Member

I think so.. I can't test at the moment

} {
// same clientID, so leave all server resources in the state they were in!
"// This seems to be a login after a loss of network contact - \n"
"// - reconnected with the same clientID as before, so probably all is well.\n".postln;

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

does this really need to be posted? i think this is reaching a point where it's going to be regarded as noise by users

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

well, it is trying to be helpful in a rare emergency scenario - you are in the middle of playing a show, network drops out, you try to reconnect, and likely happy to see anything informative posted. I doubt that anyone would find that noisy in that situation.
But maybe that is just me.

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

gotcha, that makes sense. I think it would be nice if the post message was consistent with other messages coming out of this class, though. No other messages that I've seen start with "//", it seems arbitrary.

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

actually since it should be noticeable, how about making it begin with *** for bold posting?

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

just tried it - that does make the post-crash hints pop out nicely in the post window!

This comment has been minimized.

@brianlheim

brianlheim Feb 6, 2018

Member

all right!

// simulate a remote server process by starting a server process independently of SC
serverPid = unixCmd(Server.program + options.asOptionsString);

1.wait;

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

what is this wait for?

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

leftover from an earlier version of the test, unneeded now - can remove it.


this.assert(remote1.clientID == 3,
"after first login, remote client should have its requested clientID."
);

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

are these asserts necessary? they seem to be covered well by other tests.

This comment has been minimized.

@adcxyz

adcxyz Feb 5, 2018

Contributor

AFAIK the Server:remote method is not tested anywhere else, so testing both its normal and its emergency use seemed efficient and appropriate.

This comment has been minimized.

@brianlheim

brianlheim Feb 5, 2018

Member

then it should be in a separate test...

This comment has been minimized.

@adcxyz

adcxyz Feb 6, 2018

Contributor

so have a test for remoteLogin, which tests the first login only,
and a test for repeatedRemoteLogin, which will needs a first login as setup,
and then tests the repeated login only?

This comment has been minimized.

@brianlheim

This comment has been minimized.

@adcxyz

adcxyz Feb 6, 2018

Contributor

ok, I think all requests done,
and submitted separate PR for kill method

@adcxyz

This comment has been minimized.

Copy link
Contributor

adcxyz commented Feb 9, 2018

OK - I believe I addressed all requests for changes on this one,
so I guess the next steps are:
I wait for #3499 addKillMethod to be merged,
then change this one to use killProcessByID,
then it can be merged.
Did I miss anything?

@brianlheim

This comment has been minimized.

Copy link
Member

brianlheim commented Mar 1, 2018

#3499 has been merged, this can be updated now.

@adcxyz

This comment has been minimized.

Copy link
Contributor

adcxyz commented Mar 1, 2018

ok, updated - thanks @brianlheim !

.notified_(false)
.serverRunning_(false);

// 1.wait;

This comment has been minimized.

@brianlheim

brianlheim Mar 1, 2018

Member

Can you please remove this? Everything else looks good, thanks for the tests!

This comment has been minimized.

@adcxyz

adcxyz Mar 1, 2018

Contributor

sure, done!

@brianlheim

This comment has been minimized.

Copy link
Member

brianlheim commented Mar 1, 2018

Thanks, this is ready for merge IMO. @telephon - want to give it a look since you've got a review already?

@joshpar
Copy link
Member

joshpar left a comment

looks good!

@patrickdupuis

This comment has been minimized.

Copy link
Contributor

patrickdupuis commented Mar 14, 2018

@telephon please approve when you get the chance 👍

@telephon telephon merged commit d835d53 into supercollider:3.9 Mar 14, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@telephon

This comment has been minimized.

Copy link
Member

telephon commented Mar 14, 2018

(sorry for the delay, I had somehow missed the message)

@adcxyz adcxyz deleted the adcxyz:fixRecoverRemoteLogin branch Mar 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment