Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

before booting, find zombie servers, stop and advise if needed. #2524

Closed
wants to merge 3 commits into from

Conversation

adcxyz
Copy link
Contributor

@adcxyz adcxyz commented Dec 2, 2016

s.boot can end in a complicated hang when there is a zombie server.
This happens e.g. after sclang crashed, or after hard interpreter reboot:
s.boot fails with "Exception in World_OpenUDP: bind: Address already in use ",
and s. will not reboot because it thinks it is already booting.

This PR checks whether there is already a server responding at the desired address and port,
and if so, informs about it. Reproducer:

// test finding zombies first
s.findZombies(0.1, { "found.".postln }, { "none".postln });

// make sure there are no scsynth processed around
Server.killAll;
s.findZombies(0.1, { "found.".postln }, { "none".postln });

// Start a zombie server - reports that server is booted
x = (Server.program + s.options.asOptionsString(s.addr.port)).unixCmd;
s.findZombies(0.1, { "found.".postln }, { "none".postln });

// try booting s - fails ...
s.boot; // 
// --- ... and informs with:
WARNING: Cannot boot server because of a zombie server at NetAddr.new("127.0.0.1", 57110).
// Kill zombies first, then try booting again:
Server.killAll;
s.boot;
// ---

So one knows what to do.

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 2, 2016

@rukano
glad you like it :-)

@nhthn nhthn added the comp: class library SC class library label Dec 2, 2016
@nhthn nhthn added this to the 3.9 milestone Dec 2, 2016
@telephon
Copy link
Member

telephon commented Dec 3, 2016

does this overlap with the other improvements?

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 3, 2016

It sort of does - prPingApp and findZombies essentially do the same thing,
so only one of them is needed. I think checking earlier is better,
bumping non-response timeout of 0.5 is enough (findZombies has 0.1, prPingApp 3).
I'll update the PR.

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 3, 2016

in prPingApp, why do the sync command by hand?

@telephon
Copy link
Member

telephon commented Dec 3, 2016

I think better replace prPingApp by pidRunning. The only thing is that this won't work over the network. Not sure if that matters at all.

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 3, 2016

ok:
on local servers, one can preferentially use pidRunning for status watching,
and use status messages to assess server responsiveness by network.
on remote servers, it has to be status messages is any case.

More general rethinking:
A pinged server may either be a zombie (assuming that There Can Be Only One),
or a legitimate other scsynth process started by someone else that uses the same default port.
In case 2, quit and reboot (as boot now does) shoot another app in the foot.
How about having a flag in the Server class:

classvar <>allowAlienServers = true;

if allowAlienServers is true,

  • server politely finds the next free port number and uses that,
    (much like sclang already does when booting), and
  • else assume they are zombies, and do quit and boot as is.

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 3, 2016

// sketch for findFreePort with multiple servers:

(
// make sure there are no scsynth processed around
Server.killAll;
// Start three servers, as if by other programs, or zombies
x = (Server.program + s.options.asOptionsString(57110)).unixCmd;
x = (Server.program + s.options.asOptionsString(57109)).unixCmd;
x = (Server.program + s.options.asOptionsString(57108)).unixCmd;
)

// assume they are intended, and find free port, then boot
~findFreePort = { 
	"trying port % ...\n".postf(s.addr.port );
	s.findZombies({ 
		s.addr.port = s.addr.port - 1; 
		~findFreePort.value;
	}, { 
		s.boot;		
	});
};

~findFreePort.value;

@jamshark70
Copy link
Contributor

It's starting to sound like improvising workarounds for a fairly large set of cases, instead of analyzing the possible cases and deliberately designing the cleanest solution. My gut feeling is, it's time to back off from code and work up a proper spec. Then the code will be easier to write.

@crucialfelix
Copy link
Member

crucialfelix commented Dec 3, 2016 via email

@telephon
Copy link
Member

telephon commented Dec 3, 2016

A question: when I start a server with a unixCmd (as it is done now), the first argument is the action that is called when the server quits. Now I'm not sure – is this action called in your above scenario?

Because now with the new refactored server, we already get a cleaner quit when the server is killed externally.

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 3, 2016

@jamshark70

  • yes, we have gone back from pull request to discussion. so the PR was at least useful for that. I will work on a big picture of scenarios to be handled, and bring it up here.

@crucialfelix

  • I had 'hijack' for your 'steal' name in my strategies list ;-)
  • what is the easiest cross-platform way to check within SC whether a UDP port is open locally?

@telephon

  • yes, the unixCmd doneAction should be active in the scenario, and is a very good fast way to inform sclang about scsynth having quit for whatever reasons.

more later, adc

@adcxyz
Copy link
Contributor Author

adcxyz commented Dec 3, 2016

// possible server boot scenarios to support

  1. remote servers
  • server runs on remote machine, with maxLogins > 1

  • clients cannot boot or quit server

  • clients assume server is running (and ping to find out)

  • clients must know exact server options

  • maxLogins, ranges of nodeIDs, buses, bufnums etc
  • clients login with a unique clientID which create private ranges
  • clients access only their private range of nodeIDs, buffer numbers, buses, etc
  • clients can only free their own nodes (except emergency r.hardFreeAll);
  1. local servers
  • server is / servers are booted from and used by various apps,
    such SuperCollider.app, supercollider.js, SC-based standalones, others

a. Default - single app, single scsynth
A single app (e.g. SC3.8) uses scsynth, and only a single scsynth instance.

  • will work fine for most non-expert users, including newbies
  • should be very simple, stable, and 'just work'.

b. Multiple apps use scsynth/supernova processes (SuperCollider.app, supercollider.js, etc)

  • apps should keep server ports out of each other's way
  • when in doubt, leave any servers untouched

c. single app uses multiple parallel scsynth/supernova processes

  • typically advanced users
  • typically using fixed list of ports
  • typically handled in existing quarks?
  1. Network Music setups - multiple networked clients / servers
  • e.g. Republic, Utopia, others
  • clients manage (boot etc) local server(s)
  • clients publish their local servers to peers as remotes.

Logic of Server.boot:

  • if not local, dont boot and inform user
  • if server booted or booting (pidRunning), dont boot and inform user
    -- no running server found
  • check whether addr.port is free
    if true, boot server with given address
    if false, server found at intended port, what to do?

in single app/scsynth scenario,
-> assume that server is a zombie (e.g. after an sclang crash),
and quit and boot, or (faster but riskier) hijack, optionally with .freeAll?

in multiple apps case
-> assume it belongs to another app,
find a free port, and boot with that port

in parallel scsynths case
-> assume we know the exact address:port, quit and boot or hijack.

in Network Music setups
-> if single local server, reboot or hijack; else pick or add an appropriate strategy.

// here is a quick sketch of it in Server
// could also be in a separate ServerRecover class?
Server { 
	classvar <>defaultBootRecoverStrategy = \failAndAsk;
	classvar <>bootRecoverStrategies;
	...
	var <>bootRecoverStrategy; 
	
	*initClass { 
		bootRecoverStrategies = (
			\failAndAsk: { |serv, onBoot, onFailure| ... }, 
			\hijack: { |serv, onBoot, onFailure| ... }, 
			\reboot: { |serv, onBoot, onFailure| ... }, 
			\switchToFreePort: { |serv, onBoot, onFailure| ... }
		)
	}
	
	... 
	
	boot { ...
		if (isLocal.not) { 
			if (this.serverRunning) { 
				"remote server % is running.";
			} { 
				"Cannot boot remote server %, please boot it remotely.";
			};
			^this 
		};
		if (server.serverRunning) { inform ...  ^this };
		if (server.booting) { "booting .."; ^this };
		
		if (this.portIsFree) { 
			this.reallyBoot(...);
		};
		
		bootRecoverStrategies[bootRecoverStrategy ? defaultBootRecoverStrategy]
			.value(this, onBoot, onFailure);
	}

@jamshark70
Copy link
Contributor

Tangentially related: I just had a sclang crash with supernova booted. After relaunching sclang, I first hit "Kill all servers" in the menu, then booted the server, and it told me the client ID was taken. Huh? I think if I kill all servers, it ought to reset everything, shouldn't it?

Good workflow:

  1. Restart interpreter.
  2. Kill old servers.
  3. Boot server.

Silly workflow:

@jamshark70 jamshark70 closed this Dec 4, 2016
@jamshark70 jamshark70 reopened this Dec 4, 2016
@jamshark70
Copy link
Contributor

Damn mobile. Anyway.

Silly workflow:

  1. Restart interpreter.
  2. Kill old servers.
  3. Recompile the class library just because killing servers doesn't work.
  4. Boot server.

@adcxyz adcxyz closed this Dec 4, 2016
@adcxyz adcxyz deleted the topic-findZombies branch December 4, 2016 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants