Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8275535: Retrying a failed authentication on multiple LDAP servers can lead to users blocked #6043

Closed
wants to merge 1 commit into from

Conversation

martinuy
Copy link
Contributor

@martinuy martinuy commented Oct 20, 2021

I'd like to propose a fix for JDK-8275535. This fix reverts the behavior to the state previous to JDK-8160768, where an authentication failure stops from trying other LDAP servers with the same credentials [1]. After JDK-8160768 we have 2 possible loops to stop: the one that iterates over different URLs and the one that iterates over different endpoints (after a DNS query that returns multiple values).

No test regressions observed in jdk/com/sun/jndi/ldap.

--
[1] - https://hg.openjdk.java.net/jdk/jdk/rev/a609d549992a#l2.137


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires a CSR request to be approved
  • Change must be properly reviewed (1 review required, with at least 1 reviewer)

Issues

  • JDK-8275535: Retrying a failed authentication on multiple LDAP servers can lead to users blocked
  • JDK-8276959: Retrying a failed authentication on multiple LDAP servers can lead to users blocked (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6043/head:pull/6043
$ git checkout pull/6043

Update a local copy of the PR:
$ git checkout pull/6043
$ git pull https://git.openjdk.java.net/jdk pull/6043/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6043

View PR using the GUI difftool:
$ git pr show -t 6043

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6043.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 20, 2021

👋 Welcome back mbalao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 20, 2021
@openjdk
Copy link

openjdk bot commented Oct 20, 2021

@martinuy The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Oct 20, 2021
@mlbridge
Copy link

mlbridge bot commented Oct 20, 2021

Webrevs

@AlekseiEfimov
Copy link
Member

Hi Martin,

The change looks reasonable to me.
I would suggest having a CSR logged for this change due to the following behavioral incompatibility:
Before the change - all available endpoints/URLs are tried to create an LDAP context.
With the proposed change - incorrect credentials will prevent other endpoints to be exercised to create an LDAP context.

Having a CSR will also help to document difference in handling AuthenticationException and NamingException during construction of an LDAP context from the list of endpoints acquired from a LDAP DNS provider.

@martinuy
Copy link
Contributor Author

Hi @AlekseiEfimov ,

Thanks for your feedback. I'll open a CSR as suggested and wait for approval before integrating this fix. With that said, I could not find information in the CSR associated to JDK-8160768 (JDK-8192975) about this behavioral change. My intention here is to restore the previous JDK behavior; and not to introduce a new behavior or revert a previously-approved one.

Martin.-

@martinuy
Copy link
Contributor Author

@dfuch
Copy link
Member

dfuch commented Nov 10, 2021

/csr needed

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Nov 10, 2021
@openjdk
Copy link

openjdk bot commented Nov 10, 2021

@dfuch this pull request will not be integrated until the CSR request JDK-8276959 for issue JDK-8275535 has been approved.

@martinuy
Copy link
Contributor Author

Hi @AlekseiEfimov

Can you please review the CSR [1]?

Thanks,
Martin.-

--
[1] - https://bugs.openjdk.java.net/browse/JDK-8276959

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 16, 2021

@martinuy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@martinuy
Copy link
Contributor Author

@martinuy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Please do not close, waiting for CSR approval.

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Dec 16, 2021
@michael-o
Copy link

michael-o commented Dec 17, 2021

@martinuy, I am the reporter of JDK-8160768. Regarding this PR, isn't everything protocol related a fail-fast issue? E.g., if the socket is up and running, but the LDAP message is rejected can we assume that all subsequent servers for the same resolution will reject the request as well before authentication has happened?

The purpose of JDK-8160768 was to discover LDAP servers and connect to the first one reachable. BTW, this code has been running for years now at work: https://github.com/michael-o/activedirectory-dns-locator

@michael-o
Copy link

@martinuy Also your Compatibility Risk talks about KDCs, but this is about directory servers. Not sure how this relates here.

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 15, 2022

@martinuy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jerboaa
Copy link
Contributor

jerboaa commented Feb 8, 2022

@martinuy The CSR is approved, fwiw.

@martinuy
Copy link
Contributor Author

martinuy commented Feb 8, 2022

@martinuy Also your Compatibility Risk talks about KDCs, but this is about directory servers. Not sure how this relates here.

Correct, it was an unconscious mistake :) I will try to get this fixed (as the CSR was approved, I'll ask before editing directly).

@martinuy
Copy link
Contributor Author

martinuy commented Feb 8, 2022

@martinuy, I am the reporter of JDK-8160768. Regarding this PR, isn't everything protocol related a fail-fast issue? E.g., if the socket is up and running, but the LDAP message is rejected can we assume that all subsequent servers for the same resolution will reject the request as well before authentication has happened?

It looks to me that it's not only a fail-fast issue because the state on the directory side might be altered by each try, as it happened in the reported case. In other words, the client might cause a denial-of-service blocking an account by trying multiple times the same incorrect authentication credentials on each resolved server.

In regards to the 2nd question, I guess that we cannot assume that. But the revert is intended for failed authentication only.

Is there a risk that you foresee by reverting the failed-authentication behavior back to pre-JDK-8160768?

@michael-o
Copy link

@martinuy, I am the reporter of JDK-8160768. Regarding this PR, isn't everything protocol related a fail-fast issue? E.g., if the socket is up and running, but the LDAP message is rejected can we assume that all subsequent servers for the same resolution will reject the request as well before authentication has happened?

It looks to me that it's not only a fail-fast issue because the state on the directory side might be altered by each try, as it happened in the reported case. In other words, the client might cause a denial-of-service blocking an account by trying multiple times the same incorrect authentication credentials on each resolved server.

Let me add a few points for consideration from my usecases, since I don't use any password-based authentication with SASL and Active Directory only:

  • SASL EXTERNAL: What if the certificate is rejected? Trust store is not properly populated, CRL misconfigured, etc. Will an exception also thrown here?
  • SASL GSSAPI/GSS-SPNEGO: Dir server does not manage creds, but KDC does. I am thinking whether fail fast is reasonable. SPN not registered, next server might have. replay/out of sequence, etc.

In regards to the 2nd question, I guess that we cannot assume that. But the revert is intended for failed authentication only.

But the auth is part of the LDAP message as well since the auth does not happen out of band. I would expect that any non-transport issue should fail fast.

Is there a risk that you foresee by reverting the failed-authentication behavior back to pre-JDK-8160768?

Not really, I virtually never had problems, thought might need annotations when this does not work, see above.

@martinuy
Copy link
Contributor Author

martinuy commented Feb 9, 2022

Thanks @michael-o for your input. I see the observations that you raised. In my view, we should revert to the previous behavior (as there are users currently affected by the change), and perhaps we can discuss future improvements from there, based on actual cases.

@AlekseiEfimov can you please review this change? We need the formal review-approval to move forward.

@michael-o
Copy link

@martinuy Agree

@dfuch
Copy link
Member

dfuch commented Feb 9, 2022

Is the new behavior something that could be tested with a new non regression test?
See existing tests in test/jdk/com/sun/jndi/ldap

@AlekseiEfimov
Copy link
Member

Hi Martin,

The source changes looks good to me.

In case you have an LDAP server that can be used to reproduce this problem then maybe you could try to create a test that uses classes from LDAP test library (LDAPServer,LDAPTestUtils)?

In a nutshell, it could be done by following these steps:

  • Create a regression tests which uses LDAPServer/LDAPTestUtils similar to tests available in test/jdk/com/sun/jndi/ldap/blits/AddTests and test/jdk/javax/naming/module/src/test/test
  • Add -trace flag (see test/jdk/com/sun/jndi/ldap/lib/LDAPTestUtils.java for details) as an argument to the test app. When this argument is passed to LDAPTestUtils.initEnv (see first snippet below) '.ldap' trace file is generated by a framework. This is where the real LDAP server is required.
  • Once .ldap trace file is collected use LDAPTestUtils.initEnv and a ServerSocket to create a test server that will replay the collected trace file (see second snippet below).

Snippet 1: How trace file can be collected:

/*
 * @test
 * @library /test/lib ../../lib
 * @build LDAPServer LDAPTestUtils /javax/naming/module/src/test/test/
 * @run main/othervm TraceExampleTest -trace
 */

public static void collectTrace(String [] args) throws Exception {
    Hashtable<Object, Object> env;

    // initialize test
    env = LDAPTestUtils.initEnv(null, "ldap://127.0.0.1:1389",
            TraceExampleTest.class.getName(), args, true);

    DirContext ctx = null;
    // connect to server
    ctx = new InitialDirContext(env);
}

Snippet 2: How trace file can be used (note that the trace file name should match the test class name in this example) it can be used to create an instance of the LDAPServer:

/*
 * @test
 * @library /test/lib ../../lib
 * @build LDAPServer LDAPTestUtils /javax/naming/module/src/test/test/
 * @run main/othervm TraceExampleTest
 */
 
public static void runWithTrace(String [] args) throws Exception {
    // Create unbound server socket
    ServerSocket serverSocket = new ServerSocket();

    // Bind it to the loopback address
    SocketAddress sockAddr = new InetSocketAddress(
            InetAddress.getLoopbackAddress(), 0);
    serverSocket.bind(sockAddr);

    // Construct the provider URL for LDAPTestUtils
    String providerURL = URIBuilder.newBuilder()
            .scheme("ldap")
            .loopback()
            .port(serverSocket.getLocalPort())
            .buildUnchecked().toString();

    Hashtable<Object, Object> env;

    // initialize test
    env = LDAPTestUtils.initEnv(serverSocket, providerURL,
            TraceExampleTest.class.getName(), args, true);

    DirContext ctx = null;
    // connect to the test LDAP server
    ctx = new InitialDirContext(env);
}

@martinuy
Copy link
Contributor Author

martinuy commented Feb 9, 2022

Unfortunately I don't have access to the environment where this problem reproduces and will be difficult/impossible for me to get a real trace from there. What I can say, though, is that the fail-fast authentication behavior previous to the changes in JDK-8160768 was working fine in such environment. Besides that, we didn't have any users reporting issues regarding authentication.

The change to revert to the previous behavior is, in my view, trivial. I can try to build a whole new environment that reproduces this problem or see if it's feasible to mock something, but before getting into that I need to understand what the concerns or motivation for that are. This would require more time than originally planned and might postpone this for a while.

Martin.-

@dfuch
Copy link
Member

dfuch commented Feb 9, 2022

The concerns are two folds:

  • without a regression test, you have to trust that the source changes actually work, and there's typically no verification possible
  • without a regression test, the next refactoring change in this area two years from now could undo the fix and nobody would notice

I agree that the changes seem safe and given the history seem to be doing what the fix/PR says they do, so I'd be more concerned with the latter. So if it's practical to add a test I'd rather have one. If the behavior is too hard to test - then the appropriate keyword would be noreg-hard rather than noreg-trivial (I am not sure trivial actually qualifies here - I can see no obvious flaw but I'm not sure we can say that there are obviously no flaws - or that a flaw would become obvious to future maintainers if that code was refactored in a way that removed the fix).

@martinuy
Copy link
Contributor Author

martinuy commented Feb 9, 2022

The concerns are two folds:

  • without a regression test, you have to trust that the source changes actually work, and there's typically no verification possible
  • without a regression test, the next refactoring change in this area two years from now could undo the fix and nobody would notice

Thanks, that was what I initially thought but wanted to check if I was missing something else given the previous discussion. I should be able to generate a build for the user and ask him to test in his real environment. As for the other concern, I'll evaluate options and let you know.

@AlekseiEfimov
Copy link
Member

Thanks, that was what I initially thought but wanted to check if I was missing something else given the previous discussion. I should be able to generate a build for the user and ask him to test in his real environment. As for the other concern, I'll evaluate options and let you know.

Thanks for the update, Martin.
I'm ok with pushing the fix without a test given that the user will verify it in his real environment.
Maybe, you could also log a bug for a test addition and describe in it an environment, and sort of a high-level scenario of how JDK-8275535 can be reproduced.

@dfuch
Copy link
Member

dfuch commented Feb 10, 2022

I'm ok with pushing the fix without a test given that the user will verify it in his real environment.
Maybe, you could also log a bug for a test addition and describe in it an environment, and sort of a high-level scenario of how JDK-8275535 can be reproduced.

Right - I would be fine with that too.

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 11, 2022

@martinuy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jerboaa
Copy link
Contributor

jerboaa commented Mar 11, 2022

Please don't auto-close this bot :)

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 8, 2022

@martinuy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@gnu-andrew
Copy link
Member

What's the status of this? The last comment reads as if this is good to go.

@dfuch
Copy link
Member

dfuch commented Apr 12, 2022

What's the status of this? The last comment reads as if this is good to go.

I believe we're still waiting for Martin to reply to the last comment:

Maybe, you could also log a bug for a test addition and describe in it an environment, and sort of a high-level scenario of how JDK-8275535 can be reproduced.

Otherwise yes - this would be good to go.

@cmadsen
Copy link

cmadsen commented Apr 13, 2022

Look here for spring based test case.

@bridgekeeper
Copy link

bridgekeeper bot commented May 11, 2022

@martinuy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jerboaa
Copy link
Contributor

jerboaa commented May 11, 2022

please don't auto-close this bot.

Copy link
Member

@AlekseiEfimov AlekseiEfimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @martinuy,

I think this fix is in a good state: code changes look good, the CSR is approved and our CI shows no JNDI test failures related to this change.
As it was mentioned before the only thing we're waiting for is a bug logged for a test addition with a scenario of how issue can be reproduced. If it is not feasible to do that we can proceed without it - I will log a bug and will use a Spring reproducer shared by Carsten (thank you) as a starting point.
Therefore, I'm approving this change.

@openjdk
Copy link

openjdk bot commented May 12, 2022

@martinuy This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8275535: Retrying a failed authentication on multiple LDAP servers can lead to users blocked

Reviewed-by: aefimov, dfuchs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 2489 new commits pushed to the master branch:

  • 40f43c6: 8286601: Mac Aarch: Excessive warnings to be ignored for build jdk
  • be97b4b: 8278348: [macos12] javax/swing/JTree/4908142/bug4908142.java fails in macos12
  • ff17f49: 8284888: [macos] javax/swing/JInternalFrame/8146321/JInternalFrameIconTest.java failed with "NimbusLookAndFeel] : ERROR: icon and imageIcon not same."
  • 50d47de: 8286582: Build fails on macos aarch64 when using --with-zlib=bundled
  • 89392fb: 8285820: C2: LCM prioritizes locally dependent CreateEx nodes over projections after 8270090
  • 96d48f3: 8286433: Cache certificates decoded from TLS session tickets
  • 7567627: 8286467: G1: Collection set pruning adds one region too many
  • 82d2570: 8283001: windows-x86-cmp-baseline fails in some jvmti native libs
  • e9f45bb: 8282966: AArch64: Optimize VectorMask.toLong with SVE2
  • 57a7670: 8285612: Remove jtreg tag manual=yesno for java/awt/print/PrinterJob/ImagePrinting/ClippedImages.java
  • ... and 2479 more: https://git.openjdk.java.net/jdk/compare/895e2bd7c0bded5283eca8792fbfb287bb75016b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 12, 2022
@martinuy
Copy link
Contributor Author

Hi @AlekseiEfimov ,

Apologies for the delay. We had no feedback from our customer. However, we released the fix in our builds and there were no regressions reported. I'll proceed with the integration of this fix.

Thanks,
Martin.-

@martinuy
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented May 12, 2022

Going to push as commit 3be394e.
Since your change was applied there have been 2496 commits pushed to the master branch:

  • cc7560e: 8286287: Reading file as UTF-16 causes Error which "shouldn't happen"
  • 82aa045: 8286015: JFR: Remove jfr.save.generated.asm
  • 1904e9d: 8286423: Destroy password protection in the example code in KeyStore
  • e4439ca: 8284283: javac crashes when several transitive supertypes are missing
  • 752ad1c: 8286422: Add OIDs for RC2 and Blowfish
  • 36bdd25: 8286573: Remove the unnecessary method Attr#attribTopLevel and its usage
  • dea6e88: 8284680: sun.font.FontConfigManager.getFontConfig() leaks charset
  • 40f43c6: 8286601: Mac Aarch: Excessive warnings to be ignored for build jdk
  • be97b4b: 8278348: [macos12] javax/swing/JTree/4908142/bug4908142.java fails in macos12
  • ff17f49: 8284888: [macos] javax/swing/JInternalFrame/8146321/JInternalFrameIconTest.java failed with "NimbusLookAndFeel] : ERROR: icon and imageIcon not same."
  • ... and 2486 more: https://git.openjdk.java.net/jdk/compare/895e2bd7c0bded5283eca8792fbfb287bb75016b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 12, 2022
@openjdk openjdk bot closed this May 12, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 12, 2022
@openjdk
Copy link

openjdk bot commented May 12, 2022

@martinuy Pushed as commit 3be394e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@martinuy martinuy deleted the JDK-8275535 branch July 25, 2022 17:24
@ryanflegel
Copy link

ryanflegel commented Aug 30, 2022

Will this fix be back-ported to JDK 8/11/17?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

8 participants