Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fd leaks #9

Closed
tarekziade opened this issue Jun 4, 2012 · 8 comments
Closed

fd leaks #9

tarekziade opened this issue Jun 4, 2012 · 8 comments
Assignees

Comments

@tarekziade
Copy link
Contributor

The token server is leaking fds on stage2.

This is probably in powerhose, either in the client Pool, or in the workers restarting.

Will write a test that counts the number of fds before and after each request to find out where the problem happens

/cc @fetep @ametaireau

@ghost ghost assigned tarekziade Jun 4, 2012
@tarekziade
Copy link
Contributor Author

It's clear that it's coming from the client-side of powerhose, since there's no runner crypto workers on stage2, and most gunicorn process eat more than 10k fds:

token2
[root@token2 tarek]# ps aux|grep guni
token      847 19.9  0.5 19398712 393676 ?     Sl   01:24   0:04 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token     9619 11.7  0.7 32770976 487452 ?     Sl   01:24   0:05 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    10343 24.8  0.2 6670920 155860 ?      Sl   01:25   0:02 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    10984 13.7  0.7 33275668 498964 ?     Sl   01:24   0:06 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    11796  0.3  0.0 109372 11800 ?        S    Jun01  12:51 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    14189  6.7  0.7 31951172 468020 ?     Sl   01:24   0:04 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    14191  8.8  0.7 39765532 487920 ?     Sl   01:24   0:06 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    14193  7.3  0.7 34042692 465516 ?     Sl   01:24   0:05 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    14201 10.1  0.7 41437216 503880 ?     Sl   01:24   0:07 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    14203 11.0  0.7 42490944 517628 ?     Sl   01:24   0:07 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    23989 14.2  0.7 28161028 464228 ?     Sl   01:24   0:05 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    26006 11.1  0.7 40459876 509476 ?     Sl   01:24   0:07 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application
token    30224 15.8  0.7 25082540 472260 ?     Sl   01:24   0:04 /usr/bin/python /usr/bin/gunicorn -k gevent -w 12 -b 127.0.0.1:8000 tokenserver.run:application

[root@token2 tarek]# ls /proc/14189/fd|wc -l
9674
[root@token2 tarek]# ls /proc/14191/fd|wc -l
14481
[root@token2 tarek]# ls /proc/14193/fd|wc -l
14260
[root@token2 tarek]# ls /proc/14201/fd|wc -l
17261
[root@token2 tarek]# ls /proc/14203/fd|wc -l
18490

Now looking what's happening with the powerhose pool.

@tarekziade
Copy link
Contributor Author

I was unable to find any leaks in Powerhose with these tests:

  • count how many FDS when a powerhose cluster is started
  • every worker has --max-age = 10 seconds
  • a client is constantly sending jobs

However, running gunicorn in stage2, I found out that the number of used FDs is very high even when you just start it and
do nothing.

The current formula is:

FDs = 12 + NUMWORKERS + for I IN NUMWORKERS (1305 + I)

For 12 workers:
12 + 12 + 1305 + 1306 + 1307 + .... + 1317 = 15715

So, just running gunicorn with 12 workers eats 15715 already, which is a lot

We have 60 to 65 sql connectors per worker, and 50 connectors for the crypto clients pool

So I don't know where the 1000+ extra fds are going... maybe membase ? :s

Continuing the investigation

@tarekziade
Copy link
Contributor Author

Each powerhose worker eats 25 fds. So, that's why we have 1250+ fds per gunicorn worker.

Now looking why and if this can be reduced.

But so far, I have seen no leaks, just big amounts of fds used by pyzmq

@tarekziade
Copy link
Contributor Author

we have 6 KQUEUE and 19 sockets launched in a client.

The number of KQUEUEs can be reduced to 2 by setting the iothread value from 5 to 1 at https://github.com/mozilla-services/powerhose/blob/master/powerhose/client.py#L42

That should not impact the speed, and bring us back to 21 fds per worker, so down to 1050 per gunicorn worker.

I don't think I can reduce the sockets -- looking

@tarekziade
Copy link
Contributor Author

These lines (which is the gist of the client) create already 2 KQUEUEs and 10 sockets :

import zmq

c = zmq.Context()
s = c.socket(zmq.REQ)
poller = zmq.Poller()
poller.register(s, zmq.POLLIN)

@tarekziade
Copy link
Contributor Author

I have found a way to share some FDs between clients of the same pool. Doing the change now and trying in stage2

@jbonacci
Copy link

@tarekziade how did the change and retest go? Is this considered fixed?

@rfk
Copy link
Contributor

rfk commented Jun 11, 2014

Closing this out since we no longer have powerhose stuff directly in this repo, if it's still a problem it could be moved to powerhose repo

@rfk rfk closed this as completed Jun 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants