Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridges - Too Many Files Open, Consecutive Sessions Create #1361

Closed
mingtaiha opened this issue Jun 2, 2017 · 6 comments
Closed

Bridges - Too Many Files Open, Consecutive Sessions Create #1361

mingtaiha opened this issue Jun 2, 2017 · 6 comments
Assignees
Milestone

Comments

@mingtaiha
Copy link
Contributor

I am automating my experiment runs, where each run executes some number of Synapse emulations (CUs). The first few runs complete successfully (with some failed CUs, but that's another ticket). However, subsequent runs give the following errors, which seem like variants of the problem.

Too many open files (src/epoll.cpp:50)
Caught Exception unexpected child message () [5.0]
Caught Exception unexpected child message (error: RuntimeError("initialize: IOError(24, 'Too many open files')",)) [5.0]
Caught Exception unexpected child message (error: RuntimeError("initialize: ZMQError('Too many open files')",)) [5.0]

There are no other errors from log files that can be found. The error only occurs after RP mentions the database used in the terminal, namely database : [mongodb:// ....... ]

When I restart my experiments, the same situation occurs; the first few runs execute correctly, but all subsequent runs encounter the same error.

I will include a timer of 5 minutes between runs to see if the problem is running too many sessions to quickly.

@andre-merzky
Copy link
Member

What versions are you running, and on what resource? session.close() is called? If you have a small script to reproduce this, this would be great!

Thanks, Andre

@andre-merzky
Copy link
Member

I can reproduce this by now. Tracking this down will unfortunately take a while...

@mingtaiha
Copy link
Contributor Author

@andre-merzky I get this error also when I am submitting jobs to Comet, fyi.

@andre-merzky
Copy link
Member

Yeah, that ist somewhat independent of the target resource.

@andre-merzky
Copy link
Member

So, here is the catch: Python's logging module does not really allow us to reclaim log handles. The documentation says:

logging.shutdown()
    Informs the logging system to perform an orderly shutdown by flushing and closing all handlers.
    This should be called at application exit and no further use of the logging system should be
    made after this call.

The last statement kind of makes this unusable for your specific use case: the logging system will be used in the next session. So, we are collecting log handles along the way. I tried to manually close handles, but that seems not to work - I assume the module keeps private handles.

This is a bloody mess. I'll keep on it, but if that logging problem does not go away, we either have to rethink our generous use of logging handles (ugh), or write our own logging module (UGH), or live with that limitation (gah!).

We also collected socket handles, but that part is fixed now, so you should be able to get more sessions in an application than before. The exact number depends on system specific settings, so your mileage may vary. You may want to set export RU_USE_PYPOLL=1 to get a somewhat higher limit - but that won't work on MacOS (select.poll() is not implemented there).

Lets discuss on Monday if this is worth holding up the release - because either way, I don't see a quick fix to that. FWIW, I did not yet merge the partial fix into devel.

@andre-merzky
Copy link
Member

A partial fix for this has now been merged into devel. The problem is not completely solved - but the resource leakage has been reduced to a level where one can create about 60 sessions per application. I'll open a new ticket as a reminder that this needs more work (see #1387).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants