Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8277795: ldap connection timeout not honoured under contention #6568

Closed
wants to merge 7 commits into from

Conversation

robm-openjdk
Copy link
Member

@robm-openjdk robm-openjdk commented Nov 25, 2021

This fix attemps to resolve an issue where threads can stack up on each other while waiting to get a connection from the ldap pool to an unreachable server. It does this by having each thread start a countdown prior to holding the pools' lock. (which has been changed to a ReentrantLock) Once the lock has been grabbed, the timeout is adjusted to take the waiting time into account and the process of getting a connection from the pool or creating a new one commences.

Note: this fix also changes the meaning of the connection pools initSize somewhat. In a situation where we have a large initSize and a small timeout the first thread could actually exhaust the timeout before creating all of its initial connections. Instead this fix simply creates a single connection per pool-connection-request. It continues to do so for subsequent requests regardless of whether an existing unused connection is available in the pool until initSize is exhausted. As such it may require a CSR.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed
  • Change requires a CSR request to be approved

Issues

  • JDK-8277795: ldap connection timeout not honoured under contention
  • JDK-8280829: ldap connection timeout not honoured under contention (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6568/head:pull/6568
$ git checkout pull/6568

Update a local copy of the PR:
$ git checkout pull/6568
$ git pull https://git.openjdk.java.net/jdk pull/6568/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6568

View PR using the GUI difftool:
$ git pr show -t 6568

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6568.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 25, 2021

👋 Welcome back robm! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 25, 2021

@robm-openjdk The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Nov 25, 2021
@robm-openjdk robm-openjdk marked this pull request as ready for review November 26, 2021 00:10
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 26, 2021
@mlbridge
Copy link

mlbridge bot commented Nov 26, 2021

Webrevs

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What testing is there for this fix?

public PooledConnection createPooledConnection(PoolCallback pcb, long timeout)
throws NamingException {
return new LdapClient(host, port, socketFactory,
(int)timeout, readTimeout, trace, pcb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blunt cast from long to int is a bit worrying as it could lead to positive values becoming negative, unless you have checks in place in the calling code that will ensure that the long value is never > Integer.MAX_VALUE? And it could also result in a large value becoming a small positive value.
I'd suggest to remove the inconsistency one way or the other - or add an explicit check to make it obvious that this case cannot happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added a check to the case. (this actually comes from LdapPoolManager.getLdapClient which takes an int for the connection timeout parameter, but it makes sense to be careful)

remaining = checkRemaining(start, remaining);

if (!conns.grabLock(remaining)) {
throw new NamingException("Timed out waiting for lock");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the appropriate exception? I see in checkRemaining:

                throw new CommunicationException(
                        "Timeout exceeded while waiting for a connection: " +
                                timeout + "ms");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to a CommuncationException.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 26, 2021
@robm-openjdk
Copy link
Member Author

What testing is there for this fix?

I've just added a test modelled on LdapTimeoutTest.java. (with some whitespace issues which I'm about to fix!)

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 27, 2021
env.put("com.sun.jndi.ldap.read.timeout", String.valueOf(READ_MILLIS));
env.put("com.sun.jndi.ldap.connect.timeout", String.valueOf(CONNECT_MILLIS));
env.put("com.sun.jndi.ldap.connect.pool", "true");
env.put(Context.PROVIDER_URL, "ldap://example.com:1234");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this makes the assumption that requests to "example.com:1234" will fail in timeout?
If so wouldn't it be safer to create a ServerSocket that never accepts connections?

Otherwise looks OK to me.

@openjdk
Copy link

openjdk bot commented Jan 11, 2022

@robm-openjdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8277795: ldap connection timeout not honoured under contention

Reviewed-by: dfuchs, aefimov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 82 new commits pushed to the master branch:

  • a1d1e47: 8280823: Remove NULL check in DumpTimeClassInfo::is_excluded
  • 094db1a: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash
  • 7857405: 8280744: Allow SuppressWarnings to be used in all declaration contexts
  • 40a2ce2: 8270476: Make floating-point test infrastructure more lambda and method reference friendly
  • 6d242e4: 8280835: jdk/javadoc/tool/CheckManPageOptions.java depends on source hierarchy
  • ece89c6: 8280366: (fs) Restore Files.createTempFile javadoc
  • b94ebaa: 8280686: Remove Compile::print_method_impl
  • a3a0dcd: 8280353: -XX:ArchiveClassesAtExit should print warning if base archive failed to load
  • cab5905: 8280583: Always build NMT
  • 7f68759: 8280719: G1: Remove outdated comment in RemoveSelfForwardPtrObjClosure::apply
  • ... and 72 more: https://git.openjdk.java.net/jdk/compare/ab2c8d3c9baf1080f436287785e4e02fd79953a7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 11, 2022
Copy link
Member

@AlekseiEfimov AlekseiEfimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me.
Few minor comments:
The last modification year in copyright headers can be updated.

* @library /test/lib
* lib/
* @run testng/othervm LdapPoolTimeoutTest
* @bug JDK-8277795
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the JDK- prefix is not needed in @ bug tag value.
Tags can be also reordered to follow the recommendations here.

* connection from the pool.
* @param timeout the connection timeout
*/
public abstract PooledConnection createPooledConnection(PoolCallback pcb, long timeout)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use int timeout to be consistent with existing code ?
You've been required to "squash" it into an int in the factory ?

Copy link
Member Author

@robm-openjdk robm-openjdk Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this was a request from an earlier review. (long being the standard throughout other new public apis) I'm happy with either, but int does avoid the trouble of casting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess the request was "why not use long everywhere to avoid casting to int" ;-)
But I'm happy with either too - as long as the place where you have a long (e.g obtained by substracting two nano times) and call a method that takes an int has the proper guards in place, and either assert/throws/floor or ceil if the assumptions are not met - provided that a comment explains why that particular alternative is selected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think its better to deal with the casts at the edges since the timeout handling will use long by default.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes a redeclaration of timeout with a type long across the component would be a consistent approach, also

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so just to clarify, we're not taking the approach to homogenise the timeout declarations, throughout the component, to be of type long?

which would see LdapClientFactory constructor take a long timeout and timeout member varaiables be redefined as long

Copy link
Member Author

@robm-openjdk robm-openjdk Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this issue. I plan to file a follow up bug to make a slight change to the test. I would like to get this issue fixed ASAP and would appreciate the time to take a good look at the transition to a long timeout. (i.e. I'll handle it in that follow up issue)

@@ -65,6 +65,13 @@ public PooledConnection createPooledConnection(PoolCallback pcb)
connTimeout, readTimeout, trace, pcb);
}

public PooledConnection createPooledConnection(PoolCallback pcb, long timeout)
throws NamingException {
return new LdapClient(host, port, socketFactory,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any need to perform sanity check against erroneous negative values on the timeout supplied here and in other parts of the solution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... Good point. I had looked into this yesterday when I reviewed - and AFAIU a value <= 0 would be interpreted as no timeout (that is, infinite timeout) - and that seems consistent throughout. It's non obvious - but I convinced myself that passing a negative value here would not necessarily be an error, and would work as expected. However the narrowing down of a negative long to an int doesn't necessarily preserve the sign.
@robm-openjdk the conversion from long to int probably needs to also take care of values that are < Integer.MIN_VALUE.

jshell> long l = Integer.MIN_VALUE * 2L
l ==> -4294967296

jshell> int x = (int)l
x ==> 0

jshell> long l = Integer.MIN_VALUE * 2L + 1
l ==> -4294967295

jshell> int x = (int)l
x ==> 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Though I don't think it can happen - but maybe I'm mistaken)
In any case it's safer to sanitize the input.

@openjdk openjdk bot added csr Pull request needs approved CSR before integration and removed ready Pull request is ready to be integrated csr Pull request needs approved CSR before integration labels Jan 27, 2022
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 28, 2022
@robm-openjdk
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Feb 4, 2022

Going to push as commit 3d926dd.
Since your change was applied there have been 163 commits pushed to the master branch:

  • 51b53a8: 8280913: Create a regression test for JRootPane.setDefaultButton() method
  • 46c6c6f: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64
  • c936e70: 8280593: [PPC64, S390] redundant allocation of MacroAssembler in StubGenerator ctor
  • 63e11cf: 8280970: Cleanup dead code in java.security.Provider
  • e44dc63: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack
  • b6935df: 8251505: Use of types in compiler shared code should be consistent.
  • 130cf46: 4750574: (se spec) Selector spec should clarify calculation of select return value
  • cda9c30: 8278753: Runtime crashes with access violation during JNI_CreateJavaVM call
  • 86c24b3: 8240908: RetransformClass does not know about MethodParameters attribute
  • 1f92660: 8281057: Fix doc references to overriding in JLS
  • ... and 153 more: https://git.openjdk.java.net/jdk/compare/ab2c8d3c9baf1080f436287785e4e02fd79953a7...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 4, 2022
@openjdk openjdk bot closed this Feb 4, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 4, 2022
@openjdk
Copy link

openjdk bot commented Feb 4, 2022

@robm-openjdk Pushed as commit 3d926dd.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robm-openjdk
Copy link
Member Author

/backport jdk18u

@openjdk
Copy link

openjdk bot commented Feb 7, 2022

@robm-openjdk Unknown command backport - for a list of valid commands use /help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants