Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After timeout occurs on MULTI call every following transaction fails #2865

Closed
BadEvil1 opened this issue Mar 11, 2024 · 13 comments
Closed

After timeout occurs on MULTI call every following transaction fails #2865

BadEvil1 opened this issue Mar 11, 2024 · 13 comments
Labels
type: documentation A documentation update

Comments

@BadEvil1
Copy link

BadEvil1 commented Mar 11, 2024

I have a piece of code:

stringRedisTemplate.execute((RedisCallback<Void>) connection -> {
            connection.multi();
            ///some logic
            connection.exec();
            }

And timeout occurs on MULTI call:

"exception":{
      "message":"Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)",
      "stackTrace":"org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
      \tat org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
      \tat org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
      \tat org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
      \tat org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
      \tat org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:277)
      \tat org.springframework.data.redis.connection.lettuce.LettuceConnection.multi(LettuceConnection.java:659)
      \tat org.springframework.data.redis.core.RedisTemplate.lambda$multi$23(RedisTemplate.java:1032)
      \tat org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224)
      \tat org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191)
      \tat org.springframework.data.redis.core.RedisTemplate.multi(RedisTemplate.java:1031)

After that, any following MULTI invocation fails with the following exception:
RedisCommandExecutionException: ERR MULTI calls can not be nested

I was able to reproduce it with a live redis instance using redis-cli CLIENT PAUSE to simulate a timeout in redis.

Workaround:
I surrounded it with try/catch to call DISCARD manually:

stringRedisTemplate.execute((RedisCallback<Void>) connection -> {
          try {
           connection.multi();
           ///some logic
           connection.exec();
           } catch(QueryTimeoutException e) {
           connection.discard():
           }

Actually, I expected that it would not help if DISCARD failed because of the timeout, but even if the timeout happens DICARD rollbacks transaction somehow.

My env:
Java version: 17.0.9
spring-data-redis version: 2.7.15
redis version:3.2.12

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 11, 2024
@mp911de
Copy link
Member

mp911de commented Mar 11, 2024

Can you reliably reproduce the timeout? If so, please attach debug/trace logs. I expect that this is rather a Lettuce driver issue.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Mar 11, 2024
@BadEvil1
Copy link
Author

Hello @mp911de, here are the debug logs:

2024-03-11 09:52:05.037 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Fetching Redis Connection from RedisConnectionFactory
2024-03-11 09:52:05.038 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9] write() writeAndFlush command AsyncCommand [type=MULTI, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:52:05.040 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandHandler  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9, chid=0xa] write(ctx, AsyncCommand [type=MULTI, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command], promise)
2024-03-11 09:52:05.040 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandEncoder  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379] writing command AsyncCommand [type=MULTI, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:52:05.040 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9] write() done
2024-03-11 09:53:05.066 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Unbinding Redis Connection.
2024-03-11 09:53:10.650 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Unbinding Redis Connection.
2024-03-11 09:53:10.651 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection.
2024-03-11 09:53:13.674 ERROR  --- [   scheduling-1] o.s.s.s.TaskUtils$LoggingErrorHandler    : Unexpected error occurred in scheduled task

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
2024-03-11 09:53:13.705 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Fetching Redis Connection from RedisConnectionFactory
2024-03-11 09:53:13.741 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379, epid=0x3] write() writeAndFlush command AsyncCommand [type=EVALSHA, output=BooleanOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:53:13.742 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379, epid=0x3] write() done
2024-03-11 09:53:13.742 DEBUG  --- [llEventLoop-9-2] io.lettuce.core.protocol.CommandHandler  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379, epid=0x3, chid=0x4] write(ctx, AsyncCommand [type=EVALSHA, output=BooleanOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command], promise)
2024-03-11 09:53:13.742 DEBUG  --- [llEventLoop-9-2] io.lettuce.core.protocol.CommandEncoder  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379] writing command AsyncCommand [type=EVALSHA, output=BooleanOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:53:13.742 DEBUG  --- [llEventLoop-9-2] io.lettuce.core.protocol.CommandHandler  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379, epid=0x3, chid=0x4] Received: 4 bytes, 1 commands in the stack
2024-03-11 09:53:13.742 DEBUG  --- [llEventLoop-9-2] io.lettuce.core.protocol.CommandHandler  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379, epid=0x3, chid=0x4] Stack contains: 1 commands
2024-03-11 09:53:13.742 DEBUG  --- [llEventLoop-9-2] i.l.core.protocol.RedisStateMachine      : Decode done, empty stack: true
2024-03-11 09:53:13.742 DEBUG  --- [llEventLoop-9-2] io.lettuce.core.protocol.CommandHandler  : [channel=0x9098fdf5, /127.0.0.1:49328 -> 127.0.0.1/127.0.0.1:16379, epid=0x3, chid=0x4] Completing command LatencyMeteredCommand [type=EVALSHA, output=BooleanOutput [output=true, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]
2024-03-11 09:53:13.743 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection.
2024-03-11 09:53:13.746 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Fetching Redis Connection from RedisConnectionFactory
2024-03-11 09:53:13.746 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5] write() writeAndFlush command AsyncCommand [type=SET, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:53:13.746 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5] write() done
2024-03-11 09:53:13.746 DEBUG  --- [lEventLoop-10-2] io.lettuce.core.protocol.CommandHandler  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5, chid=0x6] write(ctx, AsyncCommand [type=SET, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command], promise)
2024-03-11 09:53:13.746 DEBUG  --- [lEventLoop-10-2] io.lettuce.core.protocol.CommandEncoder  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379] writing command AsyncCommand [type=SET, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:53:13.746 DEBUG  --- [lEventLoop-10-2] io.lettuce.core.protocol.CommandHandler  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5, chid=0x6] Received: 5 bytes, 1 commands in the stack
2024-03-11 09:53:13.747 DEBUG  --- [lEventLoop-10-2] io.lettuce.core.protocol.CommandHandler  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5, chid=0x6] Stack contains: 1 commands
2024-03-11 09:53:13.747 DEBUG  --- [lEventLoop-10-2] i.l.core.protocol.RedisStateMachine      : Decode done, empty stack: true
2024-03-11 09:53:13.747 DEBUG  --- [lEventLoop-10-2] io.lettuce.core.protocol.CommandHandler  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5, chid=0x6] Completing command LatencyMeteredCommand [type=SET, output=StatusOutput [output=OK, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]
2024-03-11 09:53:13.747 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection.
2024-03-11 09:53:13.899 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Fetching Redis Connection from RedisConnectionFactory
2024-03-11 09:54:13.554 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9] write() writeAndFlush command AsyncCommand [type=MULTI, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:54:13.554 DEBUG  --- [lEventLoop-10-2] io.lettuce.core.protocol.CommandHandler  : [channel=0xbf158a77, /127.0.0.1:49338 -> 127.0.0.1/127.0.0.1:16379, epid=0x5, chid=0x6] Completing command LatencyMeteredCommand [type=INFO, output=StatusOutput [output=# Server
redis_version:3.2.12
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b0df607ad3315254
redis_mode:standalone
os:Linux 5.15.0-76-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:6.3.0
process_id:1
run_id:77c21d7eb04aaf10eb13301b889eb1df825f0210
tcp_port:16379
uptime_in_seconds:3233
uptime_in_days:0
hz:10
lru_clock:15648437
executable:/data/redis-server
config_file:/etc/redis/redis.conf
, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand]
2024-03-11 09:54:13.554 DEBUG  --- [   scheduling-1] i.lettuce.core.protocol.DefaultEndpoint  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9] write() done
2024-03-11 09:54:13.554 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandHandler  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9, chid=0xa] write(ctx, AsyncCommand [type=MULTI, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command], promise)
2024-03-11 09:54:13.555 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandEncoder  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379] writing command AsyncCommand [type=MULTI, output=StatusOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]
2024-03-11 09:54:13.556 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandHandler  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9, chid=0xa] Received: 36 bytes, 1 commands in the stack
2024-03-11 09:54:13.556 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandHandler  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9, chid=0xa] Stack contains: 1 commands
2024-03-11 09:54:13.556 DEBUG  --- [llEventLoop-9-4] i.l.core.protocol.RedisStateMachine      : Decode done, empty stack: true
2024-03-11 09:54:13.557 DEBUG  --- [llEventLoop-9-4] io.lettuce.core.protocol.CommandHandler  : [channel=0x68d24f27, /127.0.0.1:42182 -> 127.0.0.1/127.0.0.1:16379, epid=0x9, chid=0xa] Completing command LatencyMeteredCommand [type=MULTI, output=StatusOutput [output=null, error='ERR MULTI calls can not be nested'], commandType=io.lettuce.core.protocol.AsyncCommand]
2024-03-11 09:54:13.557 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Unbinding Redis Connection.
2024-03-11 09:54:13.557 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Unbinding Redis Connection.
2024-03-11 09:54:13.557 DEBUG  --- [   scheduling-1] o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection.
2024-03-11 09:54:13.558 ERROR  --- [   scheduling-1] o.s.s.s.TaskUtils$LoggingErrorHandler    : Unexpected error occurred in scheduled task

org.springframework.data.redis.RedisSystemException: Error in execution; nested exception is io.lettuce.core.RedisCommandExecutionException: ERR MULTI calls can not be nested

Let me know if it's what you asked for.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 11, 2024
@mp911de
Copy link
Member

mp911de commented Mar 11, 2024

After write() done of MULTI, there's no incoming response. It should look like:

2024-03-11 10:51:04 [DEBUG] [lettuce-nioEventLoop-4-1] [channel=0x41d9bfc4, /127.0.0.1:56710 -> localhost/127.0.0.1:6379, epid=0x1, chid=0x1] Received: 5 bytes, 1 commands in the stack (CommandHandler:576)
2024-03-11 10:51:04 [DEBUG] [lettuce-nioEventLoop-4-1] [channel=0x41d9bfc4, /127.0.0.1:56710 -> localhost/127.0.0.1:6379, epid=0x1, chid=0x1] Stack contains: 1 commands (CommandHandler:650)
2024-03-11 10:51:04 [DEBUG] [lettuce-nioEventLoop-4-1] Decode done, empty stack: true (RedisStateMachine:322)
2024-03-11 10:51:04 [DEBUG] [lettuce-nioEventLoop-4-1] [channel=0x41d9bfc4, /127.0.0.1:56710 -> localhost/127.0.0.1:6379, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=MULTI, output=StatusOutput [output=OK, error='null'], commandType=io.lettuce.core.protocol.AsyncCommand] (CommandHandler:678)

It looks like to me as if the Eventloop thread was blocked. Can you (if you observe that there was no response for over 10-20 seconds) take a thread dump and attach it here?

@mp911de mp911de added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 11, 2024
@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 11, 2024
@BadEvil1
Copy link
Author

BadEvil1 commented Mar 11, 2024

Sorry, I cannot attach the full dump. I attached my working thread and all lettuce threads
threads_report.txt

@mp911de
Copy link
Member

mp911de commented Mar 11, 2024

the one attached is free from locks, however you might want to watch out for blocked threads that correlate with the thread used in the last log message saying that MULTI was sent.

@BadEvil1
Copy link
Author

Why do you think that timeout was caused by thread lock? I rather consider timeout 'natural' due to the unavailability of redis

@mp911de
Copy link
Member

mp911de commented Mar 12, 2024

If Redis was unavailable, there would be a disconnect most likely.

@egor-ponomarev
Copy link
Contributor

Hello @mp911de !
I'm prepared MCVE https://github.com/egor-ponomarev/spring-boot-lettuce-issue-mcve/
Could you please try it?

@mp911de
Copy link
Member

mp911de commented Mar 13, 2024

Thanks a lot. The code intentionally pauses the client response to cause a failure. RedisTemplate isn't aware of you putting the connection into a transactional state so you need to clean up the transactional state via DISCARD on error.

@egor-ponomarev
Copy link
Contributor

egor-ponomarev commented Mar 13, 2024

@mp911de thank you for your feedback. Do I understand right that it is not a bug? If so, we can add a note in docs https://docs.spring.io/spring-data/redis/docs/3.2.0-M2/reference/html/#tx mentioning that such error should be handled and 'DISCARD' should be called? I can add this note if you don't mind. Because I think it would be helpful for others, we have faced with this issue on production, redis was unavailable for some time causing transaction state of connections being invalid.

What do you think if we handle this error like this?

	private class SessionCallbackImpl implements SessionCallback<Boolean> {
		@Override
		public Boolean execute(RedisOperations operations) throws DataAccessException {
			boolean transactionStateIsActive = true;
			try {
				operations.multi();

				operations.exec();
				transactionStateIsActive = false;
				return true;
			} finally {
				if (transactionStateIsActive) {
					LOGGER.error("Unexpected transaction state");
					operations.discard();
				}
			}
		}
	}

@mp911de
Copy link
Member

mp911de commented Mar 13, 2024

Yeah, I do not consider this a bug per se. You're right that our documentation should outline handling error cases.

@mp911de mp911de added type: documentation A documentation update and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Mar 13, 2024
@egor-ponomarev
Copy link
Contributor

@mp911de I've updated documentation, could you please review it? #2870

I'm not sure yet if it's better to handle such case in finally block or just by catching QueryTimeoutException

mp911de added a commit that referenced this issue Mar 14, 2024
Tweak wording.

See #2865
Original pull request: #2870
mp911de pushed a commit that referenced this issue Mar 14, 2024
mp911de added a commit that referenced this issue Mar 14, 2024
Tweak wording.

See #2865
Original pull request: #2870
@mp911de mp911de added this to the 3.2.4 (2023.1.4) milestone Mar 14, 2024
@mp911de
Copy link
Member

mp911de commented Mar 14, 2024

I updated the code to catching RuntimeException. Thanks for your help.

@mp911de mp911de linked a pull request Mar 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation A documentation update
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants