Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix for https://github.com/rockstor/rockstor-core/issues/1656 ("Too many open files" / umlimit issue) #1934

Closed
wants to merge 2 commits into from

Conversation

ericrisler
Copy link

Typical problem looks like this:
https://forum.rockstor.com/t/exception-while-running-command-usr-bin-hostnamectl-static-errno-24-too-many-open-files/2272/5
...
fix inspired by this post: http://carsonip.me/posts/gevent-pywsgi-http-keep-alive-fd-leak

NOTE: this fix needs review as I don't know if this will cause issues elsewhere. I've update my local rockstor and left the WebUI open and open files from gunicorn remains stable at < 1000 open files.

@schakrava
Copy link
Member

Can one of the admins verify this patch?

1 similar comment
@schakrava
Copy link
Member

Can one of the admins verify this patch?

@MFlyer
Copy link
Member

MFlyer commented Jun 8, 2018

Hi @phillxnet , pitchining in on this: @ericrisler why moving from tcp socket 8000 to tcp socket 443 should solve this?!? (your PR doing only this, maybe you missed something / had wrong PR) ?!?

By the way, Rockstor gevent part has been replaced using python.socketio instead of gevent, so we probably not using gevent anymore

M.

P.S.: @phillxnet & @schakrava , missing you all, getting mad with code refactoring on work projects :(

@ericrisler
Copy link
Author

@MFlyer - that change doesn't move the service to another port...all I did was change the port that the API calls go through from 8000 (direct to gunicorn) to 443 (via nginx). Reading these threads...

benoitc/gunicorn#1428
benoitc/gunicorn#1375
benoitc/gunicorn#1327
https://bugs.python.org/issue10099

....hints that py wsgi can keep file descriptors open and not close them causing a file descriptor leak. I then found this post http://carsonip.me/posts/gevent-pywsgi-http-keep-alive-fd-leak that says if you use a reverse proxy such a nginx it'll take care of the http connection in such a way that pywsgi will properly close the FDs. :)

As I said, it probably should be tested more and reviewed by Rockstor developers to make sure it doesn't break something else. Assuming that gunicorn is there to server the webui only I figure this change will be ok.

As for gevent not being in use: perhaps there is some other underlying issue that affected the users in those posts and Rockstor that is not specific to gevent?

My PR is probably better labelled as a workaround instead of a fix.

@MFlyer
Copy link
Member

MFlyer commented Jun 9, 2018

Hi @ericrisler, I thing you're missing some Rockstor structure infos: gunicorn port 8000 has never been directly exposed to frontend gui, all Rockstor tcps already proxy passed by nginx, check this:

location / {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-Proto https;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_connect_timeout 75;
proxy_read_timeout 120;
proxy_pass http://${init-gunicorn:bind}:${init-gunicorn:port}/;

That http://${init-gunicorn:bind}:${init-gunicorn:port}/ resolves to http://127.0.0.1:8000, so having port 443 rather than 8000 won't solve this issue ( pls consider nginx has a default connection timeout closing them and sending connection closed to underlying wsgi server ).

General performace update should be having unix socket instead of tcp connections, but this too won't fix that issue: nginx grants us all pending connections closed, "too many open files" happens on btrfs operations (snapshots, balance, scrubs, etc).

M.

@ericrisler
Copy link
Author

Hey @MFlyer - I'm certainly not arguing with you on your points - I saw all that before I made the change. The api_wrapper.py has a comment "# directly connect to gunicorn, bypassing nginx as we are on the same host" so I assume that this wrapper is used internally someplace to make API calls directly to gunicorn and not through nginx. I'm sure you know this of course - the change I make forces all api calls to nginx, which proxy-passes to gunicorn as you stated above.

Perhaps this shouldn't be a pull request but a discussion elsewhere? I just did it this way to draw attention to the point that this fixes the issue locally for me and may for others but you're the pro on this project not me so I'll defer to your expertise. Hopefully this discussion will lead to the real underlying cause of the issue.

This change certainly solves the issue on my local rockstor machine. If I change the line back to port 8000, restart the rockstor service and open the webui then watch the open files I get a steady growing count form gunicorn. Listing them shows repeated open file descriptions to "/tmp/afp.conf (deleted)". This grows and grows until the error. Changing the port to 443 and hence forcing the calls through nginx and watch the open files the symptoms are gone.

Your point on the fact that nginx has a default timeout may be a factor that causes problems if there are long running gunicorn tasks that don't respond to nginx.

@MFlyer
Copy link
Member

MFlyer commented Jun 13, 2018

@ericrisler : I believe you and your tests, so if we have gunicorn directly exposed that's not expected Rockstor bahaviour and you've got a nice catch! I'll find time to catch it, that's a promise! ;)

M.

@ericrisler
Copy link
Author

Cool - it's really a great system, it's easy to setup, configure and manage. Thanks for all the efforts!

@phillxnet
Copy link
Member

Linking back for context and to make it easier to find this pr from our main issue for what this pr is intended to address:
"[BUGS] Default ulimit setting too low" #1656

@ericrisler and @MFlyer any more progress on where we are to go with this one?

@ericrisler This pr has not been forgotten by the way. I'm just in 2 minds as to it's appropriateness. Hopefully we will get more input / validation and I try and find the time to look into this more myself when I can. Thanks for sharing your fix by the way, much appreciated.

@phillxnet
Copy link
Member

Closing as a general tidy-up of now old pull requests and the suspected initiator of this issue's root cause has now been removed entirely as part of our removing AFP and all associated code:
https://github.com/rockstor/rockstor-core/releases/tag/3.9.2-56
released on 1st April 2020.

@ericrisler Apologies for never enacting this but we've not had a similar report since our above code removal.

Please take a look at our more recent v4 "Built on openSUSE" offering and see if you still experience this. We also now have a DIY or pre-build installer option.

@phillxnet phillxnet closed this Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants