New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix linux tests to report correctly, get rid of stream destroyed messages on raw kernel shutdown #12643
Fix linux tests to report correctly, get rid of stream destroyed messages on raw kernel shutdown #12643
Conversation
On the first commit of this got a successful test run with Linux / Mac / Windows results all in: |
One Mac and Linux jupyter server and raw kernel daemon processes were not getting shut down on dispose. The server would be closed, but the proc.kill was not killing the process. They would pile up and start to hold onto more and more ports on the machines which would lead to problems (on linux it was slowing things down so much the machine was stopping reporting, mac tests usually finished, but logs were packed with Port XXXX blocked messages). This reproed on my local mac machine.
|
Also on both mac and windows there were lots of messages like this in the error log both when running the live extension (shows when shutting down the interactive window) and running functional test. Both on windows and mac / linux. This update seems to resolve that as well: |
this.proc.kill(); | ||
} catch { | ||
noop(); | ||
if (!this.disposed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue one was that the dispose pattern wasn't checking for already being disposed. This daemon for a jupyter connection we being held both by the connection and the main async disposable service and dispose was being called from both. Would be trying to send notifications to an already killed process. Or trying to signal for a second kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can we add this comment into the code, I think its very useful.
Personally I like to see this in the code as reminder of how crucial such a simple block of code is (esy github has these comments, however I think in code would be better - IMHO)
if (this.platformService.isWindows) { | ||
this.proc.kill(); | ||
} else { | ||
this.proc.kill('SIGKILL'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here to directly send the sigkill on non windows actually cleaned up the processes. Full confession I tried to do the research here, but I'm still a small bit iffy on this. Feels like this makes it more direct close then it needs to be versus default (SIGTERM I think?). However for Jupyter they do directly use a SIGKILL so I believe this works here.
Note that we also used to send an 'exit' notification here. I don't believe this is needed. We were sending this as a notification and then directly closing the process. This message then seemed to often cause that "STREAM_DESTORYED" error as it was trying to be send to a process that had been killed. I believe that if we had needed this before then we would have been awaiting on it to actually happen.
Windows didn't have the issue with the process sticking around in my testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, i'd put this comment in the code, so we know xactly why we're using two different ways of killing a process based on OS..
@@ -43,10 +45,6 @@ export class PythonKernelDaemon extends BasePythonDaemon implements IPythonKerne | |||
const request = new RequestType0<void, void, void>('kill_kernel'); | |||
await this.sendRequestWithoutArgs(request); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The disposed was removed from this class. This was also causing issues with stream destroying. And killing the process. The code for the kill_kernel is here:
def m_kill_kernel(self): |
And it looks to do exactly what dispose is doing now. Close on windows or sigkill on linux. This was performing the dispose work before the super dispose was even called, which was also leading to message being send to a process that had already been sigkilled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Note: Will test with a full flake run before submitting. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nightly test results: |
Non-required Windows Py3x Single Workspace looks to be timeout and not legit test failure (happening on many PRs) so merging. |
For #12539
package-lock.json
has been regenerated by runningnpm install
(if dependencies have changed).