Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsandboxed mode leaks the nex-agent process #238

Closed
autodidaddict opened this issue May 22, 2024 · 1 comment
Closed

Unsandboxed mode leaks the nex-agent process #238

autodidaddict opened this issue May 22, 2024 · 1 comment
Labels
defect Suspected defect such as a bug or regression

Comments

@autodidaddict
Copy link
Contributor

Observed behavior

Start a node running in just about any configuration, so long as sandbox mode is disabled. Hit Ctrl-c to stop the nex node process.

We see a failed attempt to terminate an OS process. When looking at the output from the procman shutdown, we don't actually see it try and shut down the active workload, only the agent that was in the pending pool:

[INFO] 2024-05-22 10:02:41 - Spawning process manager stopping
[ERROR] 2024-05-22 10:02:41 - Failed to interrupt agent process agent_id=cp6vlhn52omlk6o3m83g pid=153209 err=os: process already finished
[WARN] 2024-05-22 10:02:41 - Failed to stop spawned agent process workload_id=cp6vlhn52omlk6o3m83g error=os: process already finished
[INFO] 2024-05-22 10:02:41 - Publishing node stopped event

In this trace, the running agent was workload cp6vlfv52omlk6o3m830. Here's the log from the agent attempting to terminate the workload that was running:

cp6vlfv52omlk6o3m830 | [INFO] 2024-05-22 10:02:40 - Couldn't terminate elf binary process workload_id=cp6vlfv52omlk6o3m830 from_agent=true
cp6vlfv52omlk6o3m830 | [INFO] 2024-05-22 10:02:40 - Received signal: interrupt workload_id=cp6vlfv52omlk6o3m830 from_agent=true

Expected behavior

What I would expect here is that the agent is able to kill its child process, and the agent is able to be killed by the node parent/spawning process manager.

After terminating a node with Ctrl-C in no sandbox mode, we should see no running nex-agent processes.
When terminate a node with Ctrl-C in sandboxed mode, we should see no leftover firecracker processes.

Nex and NATS version

Nex main, NATS 2.10.9

Host environment

Ubuntu Linux

Steps to reproduce

Start unsandboxed mode, start a workload, hit ctrl-C, observe leaked agent.

@autodidaddict autodidaddict added the defect Suspected defect such as a bug or regression label May 22, 2024
@autodidaddict
Copy link
Contributor Author

This issue is getting pretty far behind main. I spent a few hours attempting to reproduce this and I was unable to. Closing for now. If we detect this again, we'll open a new issue to track it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

1 participant