Skip to content

fix: meta dev does not exit properly upon SIGTERM#895

Merged
michael-0acf4 merged 5 commits into
mainfrom
meta-dev-sigterm-exit
Nov 5, 2024
Merged

fix: meta dev does not exit properly upon SIGTERM#895
michael-0acf4 merged 5 commits into
mainfrom
meta-dev-sigterm-exit

Conversation

@michael-0acf4

@michael-0acf4 michael-0acf4 commented Oct 29, 2024

Copy link
Copy Markdown
Contributor

Migration notes

None

  • The change comes with new or modified tests
  • Hard-to-understand functions have explanatory comments
  • End-user documentation is updated to reflect the change

@michael-0acf4

michael-0acf4 commented Oct 29, 2024

Copy link
Copy Markdown
Contributor Author

A few remarks so far,

  • This is not an issue with Ctrl+C or SIGINT
  • This happens when we send a SIGTERM, SIGKILL to the parent process meta dev or SIGHUP (controlling terminal death) , meta typegate is still pending: making next cli spawn error out saying that the port is already used. So I imagine the issue is apparent in programmatically spawned processes.

@codecov

codecov Bot commented Oct 29, 2024

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.49%. Comparing base (18a13d6) to head (07bc751).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #895      +/-   ##
==========================================
+ Coverage   77.46%   77.49%   +0.02%     
==========================================
  Files         149      149              
  Lines       18244    18304      +60     
  Branches     1785     1787       +2     
==========================================
+ Hits        14133    14184      +51     
- Misses       4088     4097       +9     
  Partials       23       23              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Natoandro
Natoandro previously approved these changes Oct 30, 2024

@Natoandro Natoandro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you can add some tests?

@michael-0acf4

michael-0acf4 commented Oct 30, 2024

Copy link
Copy Markdown
Contributor Author

Do you think you can add some tests?

So, I digged a bit and as far as I know, there seems to be no way to retrieve all PIDs or get handles of the child processes that are spawned by child processes in Deno.

Spawn, kill with SIGTERM/SIGHUP, spawn again then checking the std output for Addr in use is an alternative but I wonder if it's really worth it as it may came from bad port allocation and not the child process being alive itself...

@luckasRanarison

luckasRanarison commented Oct 30, 2024

Copy link
Copy Markdown
Contributor

So, I digged a bit and as far as I know, there seems to be no way to retrieve all PIDs or get handles of the child processes that are spawned by child processes in Deno.

Can't you just check the processes using shell commands?

Yohe-Am
Yohe-Am previously approved these changes Oct 30, 2024

@Yohe-Am Yohe-Am left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. So, there's this FIXME case in the tests where the meta dev leaks the typegate process. It doesn't seem to show up on my dev machine anymore but it was a regular on the coder machines. It also regularly leaked on the CI machines and...

...yeah, I just checked by running the CI on this PR and it still leaks it seems. The following screenshot is from this CI run observed through a tmate session.
image
The two tests that use the meta-full build have already passed by the time I took this screenshot.
Hopefully, this PR improves the state in developer shells but the bug seems still at large.

Edit: replaced screenshot with one that shows currently running test.

Comment thread src/meta-cli/src/deploy/actors/task_manager/signal_handler.rs Outdated
@michael-0acf4

michael-0acf4 commented Oct 30, 2024

Copy link
Copy Markdown
Contributor Author

Looks good. So, there's this FIXME case in the tests where the meta dev leaks the typegate process. It doesn't seem to show up on my dev machine anymore but it was a regular on the coder machines. It also regularly leaked on the CI machines and...

...yeah, I just checked by running the CI on this PR and it still leaks it seems. The following screenshot is from this CI run observed through a tmate session.
image
The two tests that use the meta-full build have already passed by the time I took this screenshot.
Hopefully, this PR improves the state in developer shells but the bug seems still at large.

Edit: replaced screenshot with one that shows currently running test.

That is surprising, the way I tested it was sending the signals directly through htop, and kill the parent cargo run -p meta-cli -- dev for SIGTERM. I also experimented quitting the terminal that spawned it for the SIGHUP case (+htop) so it's all manual.. not sure if there is something that I might have missed but I didn't have the already used port issue anymore in all cases, ultimately i will try to write a test for it after all.

@michael-0acf4

michael-0acf4 commented Oct 30, 2024

Copy link
Copy Markdown
Contributor Author

there's this FIXME

Ctrlc would be SIGKILL for that matter and that at least should work even without this patch.

@Yohe-Am

Yohe-Am commented Oct 30, 2024

Copy link
Copy Markdown
Contributor

Ctrlc would be SIGKILL for that matter and that at least should work even without this patch.

Ctrl C corresponds to SIGTERM actually. SIGKILL is the harsher cousin that immediately kills the process. You're not able to register handlers for it and do clean up. In that FIXME, I'm using SIGTERM to give it a chance to clean up but no bueno.

@michael-0acf4

Copy link
Copy Markdown
Contributor Author

Ctrlc would be SIGKILL for that matter and that at least should work even without this patch.

Ctrl C corresponds to SIGTERM actually. SIGKILL is the harsher cousin that immediately kills the process. You're not able to register handlers for it and do clean up. In that FIXME, I'm using SIGTERM to give it a chance to clean up but no bueno.

My bad, I meant SIGINT. But you are right, SIGTERM is more appropriate.

Natoandro
Natoandro previously approved these changes Nov 1, 2024
Comment thread src/meta-cli/src/deploy/actors/typegate.rs Outdated
Comment thread tests/e2e/cli/dev_test.ts
@michael-0acf4 michael-0acf4 merged commit 1cbbe57 into main Nov 5, 2024
@michael-0acf4 michael-0acf4 deleted the meta-dev-sigterm-exit branch November 5, 2024 12:23
michael-0acf4 added a commit that referenced this pull request Nov 5, 2024
#### Migration notes

None

- [x] The change comes with new or modified tests
- [ ] Hard-to-understand functions have explanatory comments
- [ ] End-user documentation is updated to reflect the change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants