-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chromium-browse not being cleaned up properly after a screenshot #302
Comments
Successfully locally reproduced. Process tree from my dev agent
PID w/ PPID
natlas-agent.py makes a subprocess call to aquatone, which is what should be producing the chromium processes. These processes get cleaned up no problem in a native deployment with aquatone 1.7. Possible solutions will probably either be:
|
Cleaning up zombie processes is generally done by PID 1 (the initd process.) In our case, the Python script is not doing that. Better options are:
Long-term, I prefer option 3 because we then can isolate components into separate docker images/containers which reduces image sizes and improves security by isolating low-trust zones (targeting unknown hosts) from slightly higher trust zones (the agent.) Short-term, option 2 is probably our best approach. |
The idea of option 3 is nice but it becomes a bit harder to ship something that "just works" without something like docker-compose. I'm also unclear on how one container runs another container, but admittedly haven't looked too far into it. My option 2 around getting rid of aquatone will still eventually happen since it's a mostly-unneeded external dependency that just wraps chrome headless, but could be replaced by a selenium chrome container or just a chrome-only container. |
How does switching to Chrome Headless fix the zombie process problem? If they're still running in the same PID namespace, then we're still subject to this problem. Thus, we can drop Aquatone, but we still need a zombie-reaping PID 1. |
The aquatone subprocesses are being reaped though, which makes me think that the process being directly interacted with by |
natlas/natlas-agent/natlas/screenshots.py Line 40 in 3c39126
The call to communicate causes us to fetch the return code of the child process which is what we need to reap that process. Reference). I'm not sure why aquatone doesn't clean-up the chrome processes, but it should be easy to add https://github.com/Yelp/dumb-init |
The use of dumb-init handles zombie reaping. Closes #302.
The use of dumb-init handles zombie reaping. Closes #302.
Affecting:
Agent
Version:
0.6.10
Describe the bug
Dockerized version of agent has issues with aquatone not properly cleaning up chromium-browse and ends up with a ton of defunct chromium-browser procs.
www-data 18656 0.0 0.0 0 0 ? Z 03:01 0:00 [chromium-browse] <defunct>
In a 31 hour period, I ended up with 2354 defunct procs (as seen in screenshot).
To Reproduce
Steps to reproduce the behavior:
ps auxww | grep -i "chromium" | grep -i "defunct"
Expected behavior
chromium should be cleaned up properly after a screenshot occurs
Screenshots
The text was updated successfully, but these errors were encountered: