-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible fix for https://github.com/rockstor/rockstor-core/issues/1656 ("Too many open files" / umlimit issue) #1934
Conversation
pass api calls through nginx to combat ulimit. See my comments in this thread: rockstor#1656 and https://forum.rockstor.com/t/exception-while-running-command-usr-bin-hostnamectl-static-errno-24-too-many-open-files/2272/5
Update api_wrapper.py
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Hi @phillxnet , pitchining in on this: @ericrisler why moving from tcp socket 8000 to tcp socket 443 should solve this?!? (your PR doing only this, maybe you missed something / had wrong PR) ?!? By the way, Rockstor gevent part has been replaced using python.socketio instead of gevent, so we probably not using gevent anymore M. P.S.: @phillxnet & @schakrava , missing you all, getting mad with code refactoring on work projects :( |
@MFlyer - that change doesn't move the service to another port...all I did was change the port that the API calls go through from 8000 (direct to gunicorn) to 443 (via nginx). Reading these threads... benoitc/gunicorn#1428 ....hints that py wsgi can keep file descriptors open and not close them causing a file descriptor leak. I then found this post http://carsonip.me/posts/gevent-pywsgi-http-keep-alive-fd-leak that says if you use a reverse proxy such a nginx it'll take care of the http connection in such a way that pywsgi will properly close the FDs. :) As I said, it probably should be tested more and reviewed by Rockstor developers to make sure it doesn't break something else. Assuming that gunicorn is there to server the webui only I figure this change will be ok. As for gevent not being in use: perhaps there is some other underlying issue that affected the users in those posts and Rockstor that is not specific to gevent? My PR is probably better labelled as a workaround instead of a fix. |
Hi @ericrisler, I thing you're missing some Rockstor structure infos: gunicorn port 8000 has never been directly exposed to frontend gui, all Rockstor tcps already proxy passed by nginx, check this: rockstor-core/conf/nginx.conf.in Lines 65 to 74 in 041655a
That General performace update should be having unix socket instead of tcp connections, but this too won't fix that issue: nginx grants us all pending connections closed, "too many open files" happens on btrfs operations (snapshots, balance, scrubs, etc). M. |
Hey @MFlyer - I'm certainly not arguing with you on your points - I saw all that before I made the change. The api_wrapper.py has a comment "# directly connect to gunicorn, bypassing nginx as we are on the same host" so I assume that this wrapper is used internally someplace to make API calls directly to gunicorn and not through nginx. I'm sure you know this of course - the change I make forces all api calls to nginx, which proxy-passes to gunicorn as you stated above. Perhaps this shouldn't be a pull request but a discussion elsewhere? I just did it this way to draw attention to the point that this fixes the issue locally for me and may for others but you're the pro on this project not me so I'll defer to your expertise. Hopefully this discussion will lead to the real underlying cause of the issue. This change certainly solves the issue on my local rockstor machine. If I change the line back to port 8000, restart the rockstor service and open the webui then watch the open files I get a steady growing count form gunicorn. Listing them shows repeated open file descriptions to "/tmp/afp.conf (deleted)". This grows and grows until the error. Changing the port to 443 and hence forcing the calls through nginx and watch the open files the symptoms are gone. Your point on the fact that nginx has a default timeout may be a factor that causes problems if there are long running gunicorn tasks that don't respond to nginx. |
@ericrisler : I believe you and your tests, so if we have gunicorn directly exposed that's not expected Rockstor bahaviour and you've got a nice catch! I'll find time to catch it, that's a promise! ;) M. |
Cool - it's really a great system, it's easy to setup, configure and manage. Thanks for all the efforts! |
Linking back for context and to make it easier to find this pr from our main issue for what this pr is intended to address: @ericrisler and @MFlyer any more progress on where we are to go with this one? @ericrisler This pr has not been forgotten by the way. I'm just in 2 minds as to it's appropriateness. Hopefully we will get more input / validation and I try and find the time to look into this more myself when I can. Thanks for sharing your fix by the way, much appreciated. |
Closing as a general tidy-up of now old pull requests and the suspected initiator of this issue's root cause has now been removed entirely as part of our removing AFP and all associated code: @ericrisler Apologies for never enacting this but we've not had a similar report since our above code removal. Please take a look at our more recent v4 "Built on openSUSE" offering and see if you still experience this. We also now have a DIY or pre-build installer option. |
Typical problem looks like this:
https://forum.rockstor.com/t/exception-while-running-command-usr-bin-hostnamectl-static-errno-24-too-many-open-files/2272/5
...
fix inspired by this post: http://carsonip.me/posts/gevent-pywsgi-http-keep-alive-fd-leak
NOTE: this fix needs review as I don't know if this will cause issues elsewhere. I've update my local rockstor and left the WebUI open and open files from gunicorn remains stable at < 1000 open files.