New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix supernova reboot bug, add reboot tests #3848
Conversation
dont set clientID if no clientID from response
so we know when the process really has ended.
this is to avoid supernova hanging with incomplete boot state when rebooting.
supernova tends to gets stuck here for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code reads well and TestServer_boot
succeeds for both scsynth and supernova, tested on macOS.
}, 0.25); | ||
} | ||
} | ||
|
||
waitForPidRelease { |onDone, onFailure, timeout = 1| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we call this onComplete
.
} { | ||
0.05.wait | ||
}; | ||
if (pid.isNil) { onDone.value } { onFailure.value }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could write if(pid.isNil, onDone, onFailure)
, but well...
@@ -922,6 +946,7 @@ Server { | |||
} { | |||
this.disconnectSharedMemory; | |||
pid = unixCmd(program ++ options.asOptionsString(addr.port), { |exitCode| | |||
pid = nil; | |||
this.prOnServerProcessExit(exitCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the line pid = nil
happen in prOnServerProcessExit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either is fine by me - had it there first - lets see if @brianlheim has a preference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also wonder whether theoretically a unixCmd could return a non nil pid even after it has called the exit function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@telephon - do you mean, whether it's possible that unixCmd's callback could run while the process is still running? I looked over the associated code - I'm not an expert on the unixy stuff but it looks like it's a remote possibility? I think that would be an issue in the implementation of unixCmd though and IMO we're OK relying on it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @telephon that the line pid = nil
should go in prOnServerProcessExit
, makes this easier to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
s.doWhenBooted { | ||
// first boot OK, starting reboot from here | ||
s.reboot { | ||
s.doWhenBooted({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could leave away the ({
here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but please change at least the argument name to onComplete
ok all done, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this! Comments below
}, 0.25); | ||
} | ||
} | ||
|
||
waitForPidRelease { |onComplete, onFailure, timeout = 1| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a private method (and hidden in documentation), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think so, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
onComplete.value; | ||
^this | ||
}; | ||
// FIXME: quick and dirty fix for supernova reboot hang on osx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is good, thanks! Btw for future reference, the name is macOS now
@@ -922,6 +946,7 @@ Server { | |||
} { | |||
this.disconnectSharedMemory; | |||
pid = unixCmd(program ++ options.asOptionsString(addr.port), { |exitCode| | |||
pid = nil; | |||
this.prOnServerProcessExit(exitCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@telephon - do you mean, whether it's possible that unixCmd's callback could run while the process is still running? I looked over the associated code - I'm not an expert on the unixy stuff but it looks like it's a remote possibility? I think that would be an issue in the implementation of unixCmd though and IMO we're OK relying on it here.
@@ -922,6 +946,7 @@ Server { | |||
} { | |||
this.disconnectSharedMemory; | |||
pid = unixCmd(program ++ options.asOptionsString(addr.port), { |exitCode| | |||
pid = nil; | |||
this.prOnServerProcessExit(exitCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @telephon that the line pid = nil
should go in prOnServerProcessExit
, makes this easier to understand
server.notify_(true); | ||
while { server.notified.not } { 0.1.wait }; | ||
while { server.notified.not } { server.notify_(false); 0.1.wait }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the sake of clarity, it would probably be better here to write statusWatcher.sendNotifyRequest(false)
. Then the code immediately makes sense with the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} { | ||
0.05.wait | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rewrite this using a Condition
- waiting in a loop is inefficient and we have better idioms for this purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure: do we have a callback somewhere that would make this possible? There would have to be a mechanism that unhangs the condition when the pid becomes nil. As long as we don't have that, this loop is the only possible way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible way to implement it is that any code that changes the pid can signal
a Condition that is wait
ed on here. The Condition will have to be at class scope. pid == nil
can even be used as the test function for the Condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, the condition has to be at object scope, otherwise booting servers will block each other (but this is probably what you meant, anyhow).
Unless there are other uses for a condition at that level, I'd prefer to keep the method self-contained. Polling is not bad practice as such at all, it just depends on the case. But I'm fine either way if it is elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think efficiency is an issue here - waiting until something has finished properly so you can continue is very cheap. As Julian points out, the polling solution here stays completely local; also, it is just a single local poll/wait thread, so there is no possible confusion of order of execution of waiting tasks.
IIRC, Condition.hangWithTimeout (which would be needed here) ran into problems that are still not resolved yet, right?
To me, polling seems the right choice here, and conditions would make it more complex than necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would a case for whenTrueWithin, as proposed in #3850.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do whatever you want. This is not just about efficiency, but this code is going to be rewritten soon anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, eg. when I get around to redoing the ServerProcess refactor :-)
which is hopefully soon
as requested in reviews
as requested in review
@@ -280,6 +280,7 @@ Server { | |||
var <window, <>scopeWindow, <emacsbuf; | |||
var <volume, <recorder, <statusWatcher; | |||
var <pid, serverInterface; | |||
var <pidReleaseCondition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary that this condition is externally accessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had it accessible for testing, happy to make it internal only
@@ -339,6 +340,8 @@ Server { | |||
this.name = argName; | |||
all.add(this); | |||
|
|||
pidReleaseCondition = Condition({ this.pid == nil }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sets the class pidReleaseCondition
, but above it is defined as instance variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like pid
, pidReleaseCondition
is an instance variable, because quitting is individual/independent per server instance. Not sure I understand your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you are setting it in *initClass
, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… I've just read the original code, false alarm, it is all in init
. This is a github/diff artefact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will go over this anyway in the ServerProcess refactor/cleanup, where every ServerProcess state gets its own condition, so this is just to make all tests with supernova to run now, rather than waiting for the ServerProcess refactor/cleanup to get through.
@telephon - removed condition getter, |
currently supernova servers tends to hang on reboot (macOS 10.13.4):
this is a timing issue: supernova sends back the quit message early, but the supernova process still lives long enough to interfere with the boot messaging traffic with the rebooted instance.
Thus, this PR adds waiting for release of the (previous) process PID when booting, which fixes the hang condition.
This PR includes
Result: All tests invoving booted servers now run with supernova!
They all pass, except test_getn still fails because of a bug in supernova handle_s_getn #3841
I believe supernova will be part of the 3.10 release, yes?
If so, I would be happy if this bug fix went into 3.10, so supernova supported as fully as possible.