New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure to iterate over all keys when cancelling keys in a SelectorPool #4471

Merged
merged 1 commit into from Feb 4, 2017

Conversation

Projects
None yet
3 participants
@cheister

cheister commented Feb 3, 2017

When we updated our server from 9.1.2.0 to 9.1.7.0 we started seeing a file descriptor leak which, if left for long enough, would run the machine out of file descriptors.

We tracked down the problem to #3952 which changed how SelectionKeys were closed.

It looks like if exceptions aren't caught while calling cancel on the SelectionKeys then the loop can exit early and leave some keys un-cancelled.

When we went back to the same key cancelling code that was used in 9.1.2.0 the problem went away.

@cheister

This comment has been minimized.

Show comment
Hide comment

cheister commented Feb 3, 2017

@mohamedhafez

This comment has been minimized.

Show comment
Hide comment
@mohamedhafez

mohamedhafez Feb 3, 2017

Contributor

Hmm, what exceptions are being thrown at https://github.com/jruby/jruby/pull/4471/files#diff-f99fc044de4e2aa68b3bb1f212348b07R92 ? If the cancelling fails for some reason and the SelectionKey is still valid and uncancelled, then it will lead to old SelectionKeys being returned in Selectors returned by retrieveFromPool, which is what was causing the problems that lead me to file #3952 in the first place.

If the SelectionKey isn't valid (i.e. calling isValid on it returns false), then it won't cause the problems I observed and I suppose it would be fine to ignore them the way we were doing prior to my PR, but perhaps it would be best to handle them?

Contributor

mohamedhafez commented Feb 3, 2017

Hmm, what exceptions are being thrown at https://github.com/jruby/jruby/pull/4471/files#diff-f99fc044de4e2aa68b3bb1f212348b07R92 ? If the cancelling fails for some reason and the SelectionKey is still valid and uncancelled, then it will lead to old SelectionKeys being returned in Selectors returned by retrieveFromPool, which is what was causing the problems that lead me to file #3952 in the first place.

If the SelectionKey isn't valid (i.e. calling isValid on it returns false), then it won't cause the problems I observed and I suppose it would be fine to ignore them the way we were doing prior to my PR, but perhaps it would be best to handle them?

@cheister

This comment has been minimized.

Show comment
Hide comment
@cheister

cheister Feb 3, 2017

The exceptions are getting swallowed somewhere so I haven't seen what they are. I could try adding logging to capture them if you're interested?

From what you're saying it sounds like I should remove the if (key.isValid()) check from the loop?

cheister commented Feb 3, 2017

The exceptions are getting swallowed somewhere so I haven't seen what they are. I could try adding logging to capture them if you're interested?

From what you're saying it sounds like I should remove the if (key.isValid()) check from the loop?

@mohamedhafez

This comment has been minimized.

Show comment
Hide comment
@mohamedhafez

mohamedhafez Feb 3, 2017

Contributor

@cheister yeah could you put in logging to see what the exceptions are, and also to check the status of key.isValid() inside the catch statement at https://github.com/jruby/jruby/pull/4471/files#diff-f99fc044de4e2aa68b3bb1f212348b07R93

Contributor

mohamedhafez commented Feb 3, 2017

@cheister yeah could you put in logging to see what the exceptions are, and also to check the status of key.isValid() inside the catch statement at https://github.com/jruby/jruby/pull/4471/files#diff-f99fc044de4e2aa68b3bb1f212348b07R93

@cheister

This comment has been minimized.

Show comment
Hide comment
@cheister

cheister Feb 3, 2017

The exception being thrown was an NPE because the key was null. So changing the code to check for null also makes it work.

cheister commented Feb 3, 2017

The exception being thrown was an NPE because the key was null. So changing the code to check for null also makes it work.

@headius headius merged commit 3f97c7a into jruby:master Feb 4, 2017

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 4, 2017

Member

Great find, thank you!

Member

headius commented Feb 4, 2017

Great find, thank you!

@headius headius added this to the JRuby 9.1.8.0 milestone Feb 4, 2017

@cheister cheister deleted the cheister:file-descriptor-leak branch Feb 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment