-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not throw on initial LDAP connection failure #9
Do not throw on initial LDAP connection failure #9
Conversation
Throwing exceptions during LdapRolesProvider initialization can put applications into a bad state where they do not recover on their own. If LDAP connectivity is unavailable, we can at least let the application keep running and report a more meaningful failure, as opposed to failing to startup completely, unable to report to us anything automatically (and thus requiring looking at logs and such). This can also allow applications to recover on their own without requiring manual intervention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens now? Is an empty connection pool returned? Or will it try to make connections?
What happens now is what I described in the description. The LdapRolesProvider constructor throws an exception, which bubbles up. Depending on the application, this can prevent the application from deploying or starting successfully, never recovering without a restart. |
I haven't had a chance to test this yet though, I'd like confirm that still. Specifically that it's failing at creating the pool as opposed to an earlier point. |
Yeah nevermind, it's failing before it gets to the pool... I was hoping by the existence of that parameter it wouldn't try to connect until the pool was created. I'll have to get a little more creative. |
This reverts commit fee195e.
When true, existing behavior remains. Existing behavior does not allow an instance to be constructed if LDAP connectivity cannot be established. When false, an instance will be created even if the connection fails. getUserRoles will fail until the connection is established. Connection will be retried on the next role lookup after the configured retry interval. Ideally connections would be retried in the background without a lookup being called, but this requires a thread pool, and a thread pool requires managing the life cycle of the roles provider, which might be difficult in the JBoss login module mode. For now this is a simpler alternative that should more or less work the same under traffic.
Ready for re-review |
ldapConfiguration.getBindDn(), | ||
ldapConfiguration.getBindDNPwd() | ||
); | ||
ldapConnection = new LDAPConnection(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the above branch use a different LDAPConnection constructor so we create an LDAPConnection without actually connecting yet.
BindRequest bindRequest = new SimpleBindRequest(ldapConfiguration.getBindDn(), ldapConfiguration.getBindDNPwd()); | ||
BindResult bindResult = ldapConnection.bind(bindRequest); | ||
if (!readyForConnectionAttempt()) { | ||
// It's too early to retry connecting again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we want to have automated connection retries after a certain time period (if no requests come in that trigger connectIfNeeded()), but if we're getting requests, why would we want to wait, as it is possible that the connection works the next time you try it before the specified time interval passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all requests wait, and it takes a while, we could fill thread pools server side and/or client side as lots of requests could pile up. It could also cause cascading failure if timeouts are hit by clients. It's the same idea as hystrix. We can control how often we want to retry with the retry interval config parameter. There's not much advantage I think to retrying multiple times within a second for example, but I suppose someone could even configure that time to 0 and then we would constantly retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess we wouldn't define a long interval anyway, so what you're saying makes sense.
if (bindResult.getResultCode() != ResultCode.SUCCESS) { | ||
LOGGER.error("Error binding to LDAP" + bindResult.getResultCode()); | ||
throw new LDAPException(bindResult.getResultCode(), "Error binding to LDAP"); | ||
if (!attemptingConnect.compareAndSet(/*expect*/ false, /*if false then set to*/ true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider revising how this is documented, (either with non-inline comments or extracting this out to a method call or something), as I find these inline comments vary hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so the comment is awesome, and describes very well the iceberg of reasoning under this one line, but it still has the /* inline comments */
Maybe I am alone in being bothered by these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I usually inline comment any parameters that are ambiguous (like boolean flags), but if you think it hurts readability in this case I'm fine with removing them.
@@ -63,6 +63,7 @@ | |||
public static final String ROLES_CACHE_EXPIRY_MS = "rolesCacheExpiryMS"; | |||
public static final String ENVIRONMENT = "environment"; | |||
public static final String ALL_ACCESS_OU = "allAccessOu"; | |||
public static final String RETRY_INTERVAL_SECONDS = "retryIntervalSeconds"; | |||
|
|||
private static final String[] ALL_VALID_OPTIONS = { | |||
AUTH_ROLE_NAME, SERVER, PORT, SEARCH_BASE, BIND_DN, BIND_PWD, USE_SSL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add RETRY_INTERVAL_SECONDS to ALL_VALID_OPTIONS, otherwise we'll get warnings in the logs
if (bindResult.getResultCode() != ResultCode.SUCCESS) { | ||
LOGGER.error("Error binding to LDAP" + bindResult.getResultCode()); | ||
throw new LDAPException(bindResult.getResultCode(), "Error binding to LDAP"); | ||
if (!attemptingConnect.compareAndSet(/*expect*/ false, /*if false then set to*/ true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so the comment is awesome, and describes very well the iceberg of reasoning under this one line, but it still has the /* inline comments */
Maybe I am alone in being bothered by these
private LDAPConnectionPool connectionPool; | ||
private volatile LDAPException connectionException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is odd to have an exception stored volatile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other thread might have assigned an exception while another thread goes to read it, we want to avoid reading an earlier failure or worse seeing null. It's not a huge deal if it's not immediately seen but because the variable won't be checked that often I figured better to be correct.
Throwing exceptions during LdapRolesProvider initialization can put
applications into a bad state where they do not recover on their own.
If LDAP connectivity is unavailable, we can at least let the
application keep running and report a more meaningful failure, as
opposed to failing to startup completely, unable to report to us
anything automatically (and thus requiring looking at logs and such).
This can also allow applications to recover on their own without
requiring manual intervention.