Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chromium-browse not being cleaned up properly after a screenshot #302

Closed
fdcarl opened this issue Jun 23, 2020 · 6 comments · Fixed by #308
Closed

Chromium-browse not being cleaned up properly after a screenshot #302

fdcarl opened this issue Jun 23, 2020 · 6 comments · Fixed by #308
Labels
agent affecting natlas-agent bug Something isn't working

Comments

@fdcarl
Copy link

fdcarl commented Jun 23, 2020

Affecting:
Agent

Version:
0.6.10

Describe the bug
Dockerized version of agent has issues with aquatone not properly cleaning up chromium-browse and ends up with a ton of defunct chromium-browser procs.
www-data 18656 0.0 0.0 0 0 ? Z 03:01 0:00 [chromium-browse] <defunct>
In a 31 hour period, I ended up with 2354 defunct procs (as seen in screenshot).

To Reproduce
Steps to reproduce the behavior:

  1. Run dockerized natlas agent for a period of time
  2. Make sure agent is scanning hosts where screenshots should occur
  3. Wait a small period of time
  4. ps auxww | grep -i "chromium" | grep -i "defunct"

Expected behavior
chromium should be cleaned up properly after a screenshot occurs

Screenshots
Screenshot from 2020-06-23 16-47-21

@fdcarl fdcarl added the bug Something isn't working label Jun 23, 2020
@0xdade 0xdade added the agent affecting natlas-agent label Jun 24, 2020
@0xdade
Copy link
Member

0xdade commented Jun 24, 2020

Successfully locally reproduced.

Process tree from my dev agent

www-data     1  0.0  0.0 150844 25192 ?        Ssl  Jun23   0:00 python3 /opt/natlas/natlas-agent/natlas-agent.py
www-data    42  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data    43  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data    64  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data    98  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data    99  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   121  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   158  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   159  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   180  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   240  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   241  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   262  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   329  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   330  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   352  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   386  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   387  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   409  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   464  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   465  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   488  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   524  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   525  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   545  0.0  0.0      0     0 ?        Z    Jun23   0:00 [chromium-browse] <defunct>
www-data   608  0.0  0.0      0     0 ?        Z    00:01   0:00 [chromium-browse] <defunct>
www-data   609  0.0  0.0      0     0 ?        Z    00:01   0:00 [chromium-browse] <defunct>
www-data   631  0.0  0.0      0     0 ?        Z    00:01   0:00 [chromium-browse] <defunct>
www-data   663  0.0  0.0      0     0 ?        Z    00:01   0:00 [chromium-browse] <defunct>
www-data   664  0.0  0.0      0     0 ?        Z    00:01   0:00 [chromium-browse] <defunct>
www-data   684  0.0  0.0      0     0 ?        Z    00:01   0:00 [chromium-browse] <defunct>
www-data   746  0.0  0.0      0     0 ?        Z    00:04   0:00 [chromium-browse] <defunct>
www-data   747  0.0  0.0      0     0 ?        Z    00:04   0:00 [chromium-browse] <defunct>
www-data   768  0.0  0.0      0     0 ?        Z    00:04   0:00 [chromium-browse] <defunct>
www-data   806  0.0  0.0      0     0 ?        Z    00:04   0:00 [chromium-browse] <defunct>
www-data   807  0.0  0.0      0     0 ?        Z    00:04   0:00 [chromium-browse] <defunct>
www-data   828  0.0  0.0      0     0 ?        Z    00:04   0:00 [chromium-browse] <defunct>
www-data   886  0.0  0.0      0     0 ?        Z    00:05   0:00 [chromium-browse] <defunct>
www-data   887  0.0  0.0      0     0 ?        Z    00:05   0:00 [chromium-browse] <defunct>
www-data   907  0.0  0.0      0     0 ?        Z    00:05   0:00 [chromium-browse] <defunct>
www-data   968  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data   969  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data   991  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1024  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1025  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1046  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1079  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1080  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1100  0.0  0.0      0     0 ?        Z    00:07   0:00 [chromium-browse] <defunct>
www-data  1128  0.7  0.1  96108 54328 ?        S    00:07   0:01 nmap --privileged -oA data/natlas.dd938ad6ce7b1ad3b18a4ba2ba48cc30/nmap.dd938ad6ce7b1ad3b18a4ba2ba48cc30 --servi
www-data  1129  0.0  0.0  20256  3612 pts/0    Ss   00:09   0:00 /bin/bash
www-data  1136  0.0  0.0  36152  3228 pts/0    R+   00:09   0:00 ps aux

PID w/ PPID

www-data@616c29e99f4f:/opt/natlas/natlas-agent$ ps xao pid,ppid,comm
  PID  PPID COMMAND
    1     0 python3
   42     1 chromium-browse <defunct>
   43     1 chromium-browse <defunct>
   64     1 chromium-browse <defunct>
   98     1 chromium-browse <defunct>
   99     1 chromium-browse <defunct>
  121     1 chromium-browse <defunct>
  158     1 chromium-browse <defunct>
  159     1 chromium-browse <defunct>
  180     1 chromium-browse <defunct>
  240     1 chromium-browse <defunct>
  241     1 chromium-browse <defunct>
  262     1 chromium-browse <defunct>
  329     1 chromium-browse <defunct>
  330     1 chromium-browse <defunct>
  352     1 chromium-browse <defunct>
  386     1 chromium-browse <defunct>
  387     1 chromium-browse <defunct>
  409     1 chromium-browse <defunct>
  464     1 chromium-browse <defunct>
  465     1 chromium-browse <defunct>
  488     1 chromium-browse <defunct>
  524     1 chromium-browse <defunct>
  525     1 chromium-browse <defunct>
  545     1 chromium-browse <defunct>
  608     1 chromium-browse <defunct>
  609     1 chromium-browse <defunct>
  631     1 chromium-browse <defunct>
  663     1 chromium-browse <defunct>
  664     1 chromium-browse <defunct>
  684     1 chromium-browse <defunct>
  746     1 chromium-browse <defunct>
  747     1 chromium-browse <defunct>
  768     1 chromium-browse <defunct>
  806     1 chromium-browse <defunct>
  807     1 chromium-browse <defunct>
  828     1 chromium-browse <defunct>
  886     1 chromium-browse <defunct>
  887     1 chromium-browse <defunct>
  907     1 chromium-browse <defunct>
  968     1 chromium-browse <defunct>
  969     1 chromium-browse <defunct>
  991     1 chromium-browse <defunct>
 1024     1 chromium-browse <defunct>
 1025     1 chromium-browse <defunct>
 1046     1 chromium-browse <defunct>
 1079     1 chromium-browse <defunct>
 1080     1 chromium-browse <defunct>
 1100     1 chromium-browse <defunct>
 1129     0 bash
 1172     1 chromium-browse <defunct>
 1173     1 chromium-browse <defunct>
 1193     1 chromium-browse <defunct>
 1229     1 chromium-browse <defunct>
 1230     1 chromium-browse <defunct>
 1252     1 chromium-browse <defunct>
 1284     1 chromium-browse <defunct>
 1285     1 chromium-browse <defunct>
 1307     1 chromium-browse <defunct>
 1332     1 nmap
 1335  1129 ps

natlas-agent.py makes a subprocess call to aquatone, which is what should be producing the chromium processes. These processes get cleaned up no problem in a native deployment with aquatone 1.7.

Possible solutions will probably either be:

  1. Running a periodic cleanup thread that just identifies defunct child processes and reaps them
  2. Getting rid of aquatone altogether and just directly using chrome headless to take screenshots. This option is favorable in the long term I think, but will require building our own list of urls to try to screenshot based on scan data instead of passing the nmap.xml to aquatone.

@ajacques
Copy link
Member

ajacques commented Jun 24, 2020

Cleaning up zombie processes is generally done by PID 1 (the initd process.) In our case, the Python script is not doing that. Better options are:

  1. Configure Python to respond to SIGCHLD and reap child processes
  2. Use something like supervisord as PID 1 to auto reap zombies
  3. Run each scanner component in a separate Docker container such that Docker cleans it up inherently while destroying the container

Long-term, I prefer option 3 because we then can isolate components into separate docker images/containers which reduces image sizes and improves security by isolating low-trust zones (targeting unknown hosts) from slightly higher trust zones (the agent.)

Short-term, option 2 is probably our best approach.

@0xdade
Copy link
Member

0xdade commented Jun 24, 2020

The idea of option 3 is nice but it becomes a bit harder to ship something that "just works" without something like docker-compose. I'm also unclear on how one container runs another container, but admittedly haven't looked too far into it.

My option 2 around getting rid of aquatone will still eventually happen since it's a mostly-unneeded external dependency that just wraps chrome headless, but could be replaced by a selenium chrome container or just a chrome-only container.

@ajacques
Copy link
Member

How does switching to Chrome Headless fix the zombie process problem? If they're still running in the same PID namespace, then we're still subject to this problem. Thus, we can drop Aquatone, but we still need a zombie-reaping PID 1.

@0xdade
Copy link
Member

0xdade commented Jun 24, 2020

The aquatone subprocesses are being reaped though, which makes me think that the process being directly interacted with by subprocess.Popen is being reaped correctly but it's the grand-child that isn't.

@ajacques
Copy link
Member

The aquatone subprocesses are being reaped though, which makes me think that the process being directly interacted with by subprocess.Popen is being reaped correctly but it's the grand-child that isn't.

out, err = process.communicate(timeout=proctimeout)

The call to communicate causes us to fetch the return code of the child process which is what we need to reap that process. Reference).

I'm not sure why aquatone doesn't clean-up the chrome processes, but it should be easy to add https://github.com/Yelp/dumb-init

0xdade referenced this issue in 0xdade/natlas Jun 26, 2020
The use of dumb-init handles zombie reaping. Closes #302.
0xdade added a commit that referenced this issue Jun 27, 2020
The use of dumb-init handles zombie reaping. Closes #302.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent affecting natlas-agent bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants