Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisHealthIndicator unable to recover from authentication failures #19436

Closed
chadlwilson opened this issue Dec 22, 2019 · 10 comments
Closed

RedisHealthIndicator unable to recover from authentication failures #19436

chadlwilson opened this issue Dec 22, 2019 · 10 comments
Labels
for: external-project For an external project and not something we can fix

Comments

@chadlwilson
Copy link

chadlwilson commented Dec 22, 2019

When a Redis (non-cluster mode) server requiring authentication is restarted the Spring Boot Actuator RedisHealthIndicator seems unable to recover. It repeatedly logs:

i.l.c.RedisCommandExecutionException: NOAUTH Authentication required.
	at i.l.c.ExceptionFactory.createExecutionException(ExceptionFactory.java:135)
	at i.l.c.ExceptionFactory.createExecutionException(ExceptionFactory.java:108)
	at i.l.c.p.AsyncCommand.completeResult(AsyncCommand.java:120)
	at i.l.c.p.AsyncCommand.complete(AsyncCommand.java:111)
	at i.l.c.p.CommandWrapper.complete(CommandWrapper.java:59)
	at i.l.c.p.CommandWrapper.complete(CommandWrapper.java:59)
	at i.l.c.p.CommandHandler.complete(CommandHandler.java:654)
	at i.l.c.p.CommandHandler.decode(CommandHandler.java:614)
	at i.l.c.p.CommandHandler.channelRead(CommandHandler.java:565)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
	at i.n.c.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
	at i.n.h.ssl.SslHandler.unwrap(SslHandler.java:1478)
	at i.n.h.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227)
	at i.n.h.ssl.SslHandler.decode(SslHandler.java:1274)
	at i.n.h.c.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503)
	at i.n.h.c.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
	at i.n.h.c.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
	at i.n.c.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
	at i.n.c.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
	at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
	at i.n.c.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
	at i.n.c.e.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
	at i.n.c.e.EpollEventLoop.processReady(EpollEventLoop.java:502)
	at i.n.c.e.EpollEventLoop.run(EpollEventLoop.java:407)
	at i.n.u.c.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
	at i.n.u.i.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 common frames omitted
Wrapped by: o.s.d.r.RedisSystemException: Error in execution; nested exception is io.lettuce.core.RedisCommandExecutionException: NOAUTH Authentication required.
	at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:54)
	at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:52)
	at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
	at o.s.d.r.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
	at o.s.d.r.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
	at o.s.d.r.c.l.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:270)
	at o.s.d.r.c.l.LettuceServerCommands.convertLettuceAccessException(LettuceServerCommands.java:571)
	at o.s.d.r.c.l.LettuceServerCommands.info(LettuceServerCommands.java:215)
	at o.s.d.r.c.DefaultedRedisConnection.info(DefaultedRedisConnection.java:1291)
	at o.s.b.a.r.RedisHealthIndicator.doHealthCheck(RedisHealthIndicator.java:64)
	at o.s.b.a.h.AbstractHealthIndicator.health(AbstractHealthIndicator.java:82)
	at o.s.b.a.h.HealthIndicator.getHealth(HealthIndicator.java:37)
	at o.s.b.a.h.HealthEndpointWebExtension.getHealth(HealthEndpointWebExtension.java:95)
	at o.s.b.a.h.HealthEndpointWebExtension.getHealth(HealthEndpointWebExtension.java:43)
	at o.s.b.a.h.HealthEndpointSupport.getContribution(HealthEndpointSupport.java:108)
	at o.s.b.a.h.HealthEndpointSupport.getAggregateHealth(HealthEndpointSupport.java:119)
	at o.s.b.a.h.HealthEndpointSupport.getContri...

After initially losing the connection and having it restored:

December 21st 2019, 22:33:31.857 lettuce-eventExecutorLoop-1-3 INFO io.lettuce.core.protocol.ConnectionWatchdog                 Reconnecting, last destination was REDACTED.apse1.cache.amazonaws.com/aaa.bbb.ccc.ddd:6379
December 21st 2019, 22:33:31.891 lettuce-epollEventLoop-4-2    WARN io.lettuce.core.protocol.ConnectionWatchdog                 Cannot reconnect to [REDACTED.apse1.cache.amazonaws.com:6379]: java.nio.channels.ClosedChannelException
December 21st 2019, 22:33:32.016 lettuce-epollEventLoop-4-3    INFO io.lettuce.core.protocol.ReconnectionHandler                Reconnected to REDACTED.apse1.cache.amazonaws.com:6379
December 21st 2019, 22:33:37.615 XNIO-2 task-12                WARN org.springframework.boot.actuate.redis.RedisHealthIndicator Redis health check failed
... 
repeated warnings

Configuration is standard

spring.redis.host=blah
spring.redis.port=blah
spring.redis.password=blah
spring.redis.ssl=true
spring.redis.timeout=5s

Environment
Spring Boot 2.2.2.RELEASE
Spring Data Redis 2.2.3.RELEASE
Spring Framework 5.2.2.RELEASE
Lettuce 5.2.1.RELEASE
AWS ElastiCache for Redis (underlying Redis is 5.0.3) with authentication enabled

Possibly this is an issue with the underlying Spring Data Redis or Lettuce configuration, but we were a bit surprised that this wasn't working/auto-recovering and re-authenticating out of the box.

Since we use the Actuator endpoint as a Kubernetes readinessProbe this meant traffic was not sent to the application and it was not able to self-heal.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Dec 22, 2019
@snicoll
Copy link
Member

snicoll commented Dec 22, 2019

Thanks for the report. What version of Spring Boot are you using?

@snicoll snicoll added the status: waiting-for-feedback We need additional information before we can continue label Dec 22, 2019
@chadlwilson
Copy link
Author

Hi @snicoll - 2.2.2.RELEASE - and here's the relevant part of our dependency tree

+--- org.springframework.boot:spring-boot-starter-data-redis -> 2.2.2.RELEASE
|    +--- org.springframework.boot:spring-boot-starter:2.2.2.RELEASE (*)
|    +--- org.springframework.data:spring-data-redis:2.2.3.RELEASE
|    |    +--- org.springframework.data:spring-data-keyvalue:2.2.3.RELEASE
|    |    |    +--- org.springframework.data:spring-data-commons:2.2.3.RELEASE (*)
|    |    |    +--- org.springframework:spring-context:5.2.2.RELEASE (*)
|    |    |    +--- org.springframework:spring-tx:5.2.2.RELEASE (*)
|    |    |    \--- org.slf4j:slf4j-api:1.7.26 -> 1.7.29
|    |    +--- org.springframework:spring-tx:5.2.2.RELEASE (*)
|    |    +--- org.springframework:spring-oxm:5.2.2.RELEASE
|    |    |    +--- org.springframework:spring-beans:5.2.2.RELEASE (*)
|    |    |    \--- org.springframework:spring-core:5.2.2.RELEASE (*)
|    |    +--- org.springframework:spring-aop:5.2.2.RELEASE (*)
|    |    +--- org.springframework:spring-context-support:5.2.2.RELEASE
|    |    |    +--- org.springframework:spring-beans:5.2.2.RELEASE (*)
|    |    |    +--- org.springframework:spring-context:5.2.2.RELEASE (*)
|    |    |    \--- org.springframework:spring-core:5.2.2.RELEASE (*)
|    |    \--- org.slf4j:slf4j-api:1.7.26 -> 1.7.29
|    \--- io.lettuce:lettuce-core:5.2.1.RELEASE
|         +--- io.netty:netty-common:4.1.43.Final
|         +--- io.netty:netty-handler:4.1.43.Final (*)
|         +--- io.netty:netty-transport:4.1.43.Final (*)
|         \--- io.projectreactor:reactor-core:3.3.0.RELEASE -> 3.3.1.RELEASE (*)

Also, if it helps for wider context; this is against the AWS ElastiCache Redis implementation. We noticed this problem after scheduling Redis maintenance/service updates in non-prod environments.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 22, 2019
@snicoll
Copy link
Member

snicoll commented Dec 24, 2019

@chadlwilson unfortunately I can't reproduce the problem. To help us identify where the problem might come from, can you please share a small sample (zip or github repo) with instructions on how the problem can be reproduced?

@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Dec 24, 2019
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Dec 31, 2019
@chadlwilson
Copy link
Author

I've been struggling to reproduce this on a small sample with a local Redis. There are a lot of variables to eliminate which is a bit difficult without knowing all the underlying Spring Boot Actuator/Spring Data Redis/Lettuce magic. I'm thinking it's also possible that the issue lies with the way AWS ElasticCache behaves; and/or a particular state of pooled connections. :-/

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue labels Jan 4, 2020
@snicoll
Copy link
Member

snicoll commented Jan 4, 2020

Thanks for trying Chad. Perhaps @mp911de has an idea?

@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jan 4, 2020
@chadlwilson
Copy link
Author

chadlwilson commented Jan 4, 2020

I might raise an issue on the Lettuce GitHub; since after looking through a lot of the code through Spring Boot and Spring Data Redis, it seems this behaviour is intended to be fully managed by the Lettuce ConnectionWatchdog that handles reconnects of the single multiplexed connection.

It seems after reconnect it has left the StatefulRedisConnection in an un-authenticated state so subsequent uses by the RedisHealthIndicator fail.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 4, 2020
@snicoll
Copy link
Member

snicoll commented Jan 4, 2020

That’s my feeling as well. Please hold on creating a duplicate to avoid efforts being spent in two places. I’ve pinged the lead of the lettuce driver so we should get additional feedback soon.

@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jan 4, 2020
@mp911de
Copy link
Member

mp911de commented Jan 4, 2020

I tried to reproduce the issue without luck so far.

Lettuce attempts to reconnect using the initially provided password and the reconnect process does not report failures when the password changed during reconnects. If you can reproduce that issue in any environment, please raise the log level of io.lettuce.core.protocol to DEBUG or TRACE (caution: creates lots of noise and performance drag on busy systems).

That issue isn't Spring Boot related, more likely it is Lettuce-related.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 4, 2020
@snicoll
Copy link
Member

snicoll commented Jan 4, 2020

Thanks for the feedback. I am closing this issue now.

Please create an issue in the lettuce issue tracker if you manage to reproduce with the request logging information.

@snicoll snicoll closed this as completed Jan 4, 2020
@snicoll snicoll removed the status: feedback-provided Feedback has been provided label Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix
Projects
None yet
Development

No branches or pull requests

4 participants