-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8277795: ldap connection timeout not honoured under contention #6568
Conversation
👋 Welcome back robm! A progress list of the required criteria for merging this PR into |
@robm-openjdk The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What testing is there for this fix?
public PooledConnection createPooledConnection(PoolCallback pcb, long timeout) | ||
throws NamingException { | ||
return new LdapClient(host, port, socketFactory, | ||
(int)timeout, readTimeout, trace, pcb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A blunt cast from long
to int
is a bit worrying as it could lead to positive values becoming negative, unless you have checks in place in the calling code that will ensure that the long value is never > Integer.MAX_VALUE? And it could also result in a large value becoming a small positive value.
I'd suggest to remove the inconsistency one way or the other - or add an explicit check to make it obvious that this case cannot happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've added a check to the case. (this actually comes from LdapPoolManager.getLdapClient which takes an int for the connection timeout parameter, but it makes sense to be careful)
remaining = checkRemaining(start, remaining); | ||
|
||
if (!conns.grabLock(remaining)) { | ||
throw new NamingException("Timed out waiting for lock"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the appropriate exception? I see in checkRemaining
:
throw new CommunicationException(
"Timeout exceeded while waiting for a connection: " +
timeout + "ms");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to a CommuncationException.
I've just added a test modelled on LdapTimeoutTest.java. (with some whitespace issues which I'm about to fix!) |
env.put("com.sun.jndi.ldap.read.timeout", String.valueOf(READ_MILLIS)); | ||
env.put("com.sun.jndi.ldap.connect.timeout", String.valueOf(CONNECT_MILLIS)); | ||
env.put("com.sun.jndi.ldap.connect.pool", "true"); | ||
env.put(Context.PROVIDER_URL, "ldap://example.com:1234"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this makes the assumption that requests to "example.com:1234" will fail in timeout?
If so wouldn't it be safer to create a ServerSocket that never accepts connections?
Otherwise looks OK to me.
@robm-openjdk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 82 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me.
Few minor comments:
The last modification year in copyright headers can be updated.
* @library /test/lib | ||
* lib/ | ||
* @run testng/othervm LdapPoolTimeoutTest | ||
* @bug JDK-8277795 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the JDK-
prefix is not needed in @ bug
tag value.
Tags can be also reordered to follow the recommendations here.
* connection from the pool. | ||
* @param timeout the connection timeout | ||
*/ | ||
public abstract PooledConnection createPooledConnection(PoolCallback pcb, long timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use int timeout to be consistent with existing code ?
You've been required to "squash" it into an int in the factory ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this was a request from an earlier review. (long being the standard throughout other new public apis) I'm happy with either, but int does avoid the trouble of casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I guess the request was "why not use long everywhere to avoid casting to int" ;-)
But I'm happy with either too - as long as the place where you have a long (e.g obtained by substracting two nano times) and call a method that takes an int has the proper guards in place, and either assert/throws/floor or ceil if the assumptions are not met - provided that a comment explains why that particular alternative is selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think its better to deal with the casts at the edges since the timeout handling will use long by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes a redeclaration of timeout with a type long across the component would be a consistent approach, also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so just to clarify, we're not taking the approach to homogenise the timeout declarations, throughout the component, to be of type long?
which would see LdapClientFactory constructor take a long timeout and timeout member varaiables be redefined as long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this issue. I plan to file a follow up bug to make a slight change to the test. I would like to get this issue fixed ASAP and would appreciate the time to take a good look at the transition to a long timeout. (i.e. I'll handle it in that follow up issue)
@@ -65,6 +65,13 @@ public PooledConnection createPooledConnection(PoolCallback pcb) | |||
connTimeout, readTimeout, trace, pcb); | |||
} | |||
|
|||
public PooledConnection createPooledConnection(PoolCallback pcb, long timeout) | |||
throws NamingException { | |||
return new LdapClient(host, port, socketFactory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any need to perform sanity check against erroneous negative values on the timeout supplied here and in other parts of the solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... Good point. I had looked into this yesterday when I reviewed - and AFAIU a value <= 0 would be interpreted as no timeout (that is, infinite timeout) - and that seems consistent throughout. It's non obvious - but I convinced myself that passing a negative value here would not necessarily be an error, and would work as expected. However the narrowing down of a negative long to an int doesn't necessarily preserve the sign.
@robm-openjdk the conversion from long to int probably needs to also take care of values that are < Integer.MIN_VALUE.
jshell> long l = Integer.MIN_VALUE * 2L
l ==> -4294967296
jshell> int x = (int)l
x ==> 0
jshell> long l = Integer.MIN_VALUE * 2L + 1
l ==> -4294967295
jshell> int x = (int)l
x ==> 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Though I don't think it can happen - but maybe I'm mistaken)
In any case it's safer to sanitize the input.
7c27762
to
6af94f3
Compare
/integrate |
Going to push as commit 3d926dd.
Your commit was automatically rebased without conflicts. |
@robm-openjdk Pushed as commit 3d926dd. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
/backport jdk18u |
@robm-openjdk Unknown command |
This fix attemps to resolve an issue where threads can stack up on each other while waiting to get a connection from the ldap pool to an unreachable server. It does this by having each thread start a countdown prior to holding the pools' lock. (which has been changed to a ReentrantLock) Once the lock has been grabbed, the timeout is adjusted to take the waiting time into account and the process of getting a connection from the pool or creating a new one commences.
Note: this fix also changes the meaning of the connection pools initSize somewhat. In a situation where we have a large initSize and a small timeout the first thread could actually exhaust the timeout before creating all of its initial connections. Instead this fix simply creates a single connection per pool-connection-request. It continues to do so for subsequent requests regardless of whether an existing unused connection is available in the pool until initSize is exhausted. As such it may require a CSR.
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6568/head:pull/6568
$ git checkout pull/6568
Update a local copy of the PR:
$ git checkout pull/6568
$ git pull https://git.openjdk.java.net/jdk pull/6568/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6568
View PR using the GUI difftool:
$ git pr show -t 6568
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6568.diff