Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not throw on initial LDAP connection failure #9

Merged
merged 10 commits into from
Jan 10, 2018

Conversation

alechenninger
Copy link
Contributor

Throwing exceptions during LdapRolesProvider initialization can put
applications into a bad state where they do not recover on their own.
If LDAP connectivity is unavailable, we can at least let the
application keep running and report a more meaningful failure, as
opposed to failing to startup completely, unable to report to us
anything automatically (and thus requiring looking at logs and such).
This can also allow applications to recover on their own without
requiring manual intervention.

Throwing exceptions during LdapRolesProvider initialization can put
applications into a bad state where they do not recover on their own.
If LDAP connectivity is unavailable, we can at least let the
application keep running and report a more meaningful failure, as
opposed to failing to startup completely, unable to report to us
anything automatically (and thus requiring looking at logs and such).
This can also allow applications to recover on their own without
requiring manual intervention.
Copy link
Member

@dcrissman dcrissman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens now? Is an empty connection pool returned? Or will it try to make connections?

@alechenninger
Copy link
Contributor Author

What happens now is what I described in the description. The LdapRolesProvider constructor throws an exception, which bubbles up. Depending on the application, this can prevent the application from deploying or starting successfully, never recovering without a restart.

@alechenninger
Copy link
Contributor Author

alechenninger commented Jan 7, 2018

I haven't had a chance to test this yet though, I'd like confirm that still. Specifically that it's failing at creating the pool as opposed to an earlier point.

@alechenninger
Copy link
Contributor Author

Yeah nevermind, it's failing before it gets to the pool... I was hoping by the existence of that parameter it wouldn't try to connect until the pool was created. I'll have to get a little more creative.

@alechenninger alechenninger changed the title Do not throw on initial LDAP connection failure [Do not merge] Do not throw on initial LDAP connection failure Jan 7, 2018
When true, existing behavior remains. Existing behavior does not allow
an instance to be constructed if LDAP connectivity cannot be
established.

When false, an instance will be created even if the connection fails.
getUserRoles will fail until the connection is established. Connection
will be retried on the next role lookup after the configured retry
interval.

Ideally connections would be retried in the background without a lookup
being called, but this requires a thread pool, and a thread pool
requires managing the life cycle of the roles provider, which might be
difficult in the JBoss login module mode. For now this is a simpler
alternative that should more or less work the same under traffic.
@alechenninger alechenninger changed the title [Do not merge] Do not throw on initial LDAP connection failure Do not throw on initial LDAP connection failure Jan 7, 2018
@alechenninger
Copy link
Contributor Author

Ready for re-review

ldapConfiguration.getBindDn(),
ldapConfiguration.getBindDNPwd()
);
ldapConnection = new LDAPConnection(options);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the above branch use a different LDAPConnection constructor so we create an LDAPConnection without actually connecting yet.

BindRequest bindRequest = new SimpleBindRequest(ldapConfiguration.getBindDn(), ldapConfiguration.getBindDNPwd());
BindResult bindResult = ldapConnection.bind(bindRequest);
if (!readyForConnectionAttempt()) {
// It's too early to retry connecting again.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we want to have automated connection retries after a certain time period (if no requests come in that trigger connectIfNeeded()), but if we're getting requests, why would we want to wait, as it is possible that the connection works the next time you try it before the specified time interval passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all requests wait, and it takes a while, we could fill thread pools server side and/or client side as lots of requests could pile up. It could also cause cascading failure if timeouts are hit by clients. It's the same idea as hystrix. We can control how often we want to retry with the retry interval config parameter. There's not much advantage I think to retrying multiple times within a second for example, but I suppose someone could even configure that time to 0 and then we would constantly retry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess we wouldn't define a long interval anyway, so what you're saying makes sense.

if (bindResult.getResultCode() != ResultCode.SUCCESS) {
LOGGER.error("Error binding to LDAP" + bindResult.getResultCode());
throw new LDAPException(bindResult.getResultCode(), "Error binding to LDAP");
if (!attemptingConnect.compareAndSet(/*expect*/ false, /*if false then set to*/ true)) {
Copy link
Member

@derek63 derek63 Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider revising how this is documented, (either with non-inline comments or extracting this out to a method call or something), as I find these inline comments vary hard to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so the comment is awesome, and describes very well the iceberg of reasoning under this one line, but it still has the /* inline comments */

Maybe I am alone in being bothered by these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I usually inline comment any parameters that are ambiguous (like boolean flags), but if you think it hurts readability in this case I'm fine with removing them.

@@ -63,6 +63,7 @@
public static final String ROLES_CACHE_EXPIRY_MS = "rolesCacheExpiryMS";
public static final String ENVIRONMENT = "environment";
public static final String ALL_ACCESS_OU = "allAccessOu";
public static final String RETRY_INTERVAL_SECONDS = "retryIntervalSeconds";

private static final String[] ALL_VALID_OPTIONS = {
AUTH_ROLE_NAME, SERVER, PORT, SEARCH_BASE, BIND_DN, BIND_PWD, USE_SSL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add RETRY_INTERVAL_SECONDS to ALL_VALID_OPTIONS, otherwise we'll get warnings in the logs

if (bindResult.getResultCode() != ResultCode.SUCCESS) {
LOGGER.error("Error binding to LDAP" + bindResult.getResultCode());
throw new LDAPException(bindResult.getResultCode(), "Error binding to LDAP");
if (!attemptingConnect.compareAndSet(/*expect*/ false, /*if false then set to*/ true)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so the comment is awesome, and describes very well the iceberg of reasoning under this one line, but it still has the /* inline comments */

Maybe I am alone in being bothered by these

private LDAPConnectionPool connectionPool;
private volatile LDAPException connectionException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd to have an exception stored volatile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other thread might have assigned an exception while another thread goes to read it, we want to avoid reading an earlier failure or worse seeing null. It's not a huge deal if it's not immediately seen but because the variable won't be checked that often I figured better to be correct.

@dcrissman dcrissman merged commit 4235aa6 into esbtools:master Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants