Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection failures and application hangs with RedisLockRegistry since Spring Boot upgrade #3805

Closed
hameno opened this issue May 19, 2022 · 15 comments

Comments

@hameno
Copy link

hameno commented May 19, 2022

In what version(s) of Spring Integration are you seeing this issue?

5.5.11

Describe the bug

We've just recently updated Spring Boot from 2.5.5 to 2.5.13 (so basically upgrade from 5.5.4 to 5.5.11).
Immediately we received constant connection failure logs:

ERROR 7 --- [lock-registry-1] o.s.d.r.l.RedisMessageListenerContainer  : Connection failure occurred. Restarting subscription task after 5000 ms

The application also randomly hangs for multiple seconds.

We are using RedisLockRegistry with a ElastiCache Redis (6.0.5) in Primary/Replica mode. For RedisLockRegistry we only connect to the primary.

Our configuration is basically this:

@Bean
fun redisConnectionFactory(
    builderCustomizers: ObjectProvider<LettuceClientConfigurationBuilderCustomizer>,
    clientResources: ClientResources
): LettuceConnectionFactory {
    val clientConfig = getLettuceClientConfiguration(builderCustomizers, clientResources, redisProperties.lettuce.pool)
    val config = RedisStaticMasterReplicaConfiguration(
        redisProperties.host,
        redisProperties.port
    )
    config.username = redisProperties.username
    config.password = RedisPassword.of(redisProperties.password)
    config.database = redisProperties.database
    return LettuceConnectionFactory(config, clientConfig)
}

@Bean
fun lockRegistry(
    redisConnectionFactory: RedisConnectionFactory
): ExpirableLockRegistry = RedisLockRegistry(
    redisConnectionFactory,
    "exampleService",
    120_000
)

We had to downgrade back to 2.5.5 to fix the issues for now

To Reproduce

Not sure

Expected behavior

No hangs / error logs

Sample

Not sure how to show

@hameno hameno added status: waiting-for-triage The issue need to be evaluated and its future decided type: bug labels May 19, 2022
@hameno hameno changed the title Connection failuresand application hangs with RedisLockRegistry since Spring Boot upgrade Connection failures and application hangs with RedisLockRegistry since Spring Boot upgrade May 19, 2022
@artembilan
Copy link
Member

Hi @Meteorkor!

Would you mind taking a look here?
Sounds like you you pub-sub feature for lock registry causes some hiccups.

Thank you!

@Meteorkor
Copy link
Contributor

Meteorkor commented May 22, 2022

I looked at the "RedisMessageListenerContainer fail logs" posted by the author.
It looks like the RedisMessageListenerContainer did not start normally.

@hameno
Are there any additional logs?

In ElastiCache Redis (6.0.5)
Can you check if the RedisMessageListenerContainer works normally when I start it alone?
image

image

@hameno
Copy link
Author

hameno commented May 23, 2022

No, there are no additional logs during startup.

Can you check if the RedisMessageListenerContainer works normally when I start it alone?

I don't understand what you mean by this. Can you explan in more detail?

@artembilan
Copy link
Member

I think he means to start Redis subscription manually for that your environment: https://docs.spring.io/spring-data/data-redis/docs/current/reference/html/#redis:pubsub:subscribe.

The point is that starting with version 5.5.7 we have switched RedisLockRegistry to use a Pub/Sub feature for unlocking: #3690.

Since you claim that everything works well without Pub/Sub, then it only might be some configuration on the Redis server: we just use the same connection factory for Pub/Sub without any modifications:

this.redisMessageListenerContainer = new RedisMessageListenerContainer();
setupUnlockMessageListener(connectionFactory);

Any chances that you can use Spring Data 2021.2.0, or at least Spring Data Redis 2.7.0?
Looks like they have changed ListenerContainer error message to include a root cause of the connection problem:

				if (recoveryInterval != BackOffExecution.STOP) {
					logger.error(String.format("Connection failure occurred: %s. Restarting subscription task after %s ms.", ex,
							recoveryInterval), ex);
				}

So, that would give us a little more info about your problem.

We cannot upgrade Spring Integration 5.5.x to Spring Data 2021.2.x, because we have a policy to keep minor versions aligned with minor dependency versions.

@artembilan artembilan added status: waiting-for-reporter Needs a feedback from the reporter and removed status: waiting-for-triage The issue need to be evaluated and its future decided labels May 23, 2022
@hameno
Copy link
Author

hameno commented May 23, 2022

Thanks for looking into it - I'm currently finishing up something else but will try your suggestions soon and comment back

@hameno
Copy link
Author

hameno commented May 24, 2022

I just re-read the official Spring Data documentation and noticed this block:

For environments reporting non-public addresses through the INFO command (for example, when using AWS), use RedisStaticMasterReplicaConfiguration instead of RedisStandaloneConfiguration. Please note that RedisStaticMasterReplicaConfiguration does not support Pub/Sub because of missing Pub/Sub message propagation across individual servers.

See https://docs.spring.io/spring-data/data-redis/docs/current/reference/html/#redis:write-to-master-read-from-replica

Since we use RedisStaticMasterReplicaConfiguration, this simply does not work with ElastiCache Redis.

@artembilan
Copy link
Member

Thanks, @hameno , for investigation!

Well, that does not sound good for our solution then...

@Meteorkor ,

would you mind revising our RedisLockRegistry logic for an optional Pub/Sub?

@Meteorkor
Copy link
Contributor

Meteorkor commented May 24, 2022

@artembilan
How would you like the user to choose the method?

If RedisMessageListenerContainer.start() fails, there seems to be a way to operate in the old way,

Or there seems to be a way to select the lock/unlock method when creating ReidsLockRegistry.

@artembilan
Copy link
Member

Well, I would like to have a "try and fallback if any".

However let's see if a dedicated RedisStandaloneConfiguration for that RedisMessageListenerContainer may work instead!

Looks like Redison has a dedicated non-pub/sub impl instead: https://github.com/redisson/redisson/blob/master/redisson/src/main/java/org/redisson/RedissonSpinLock.java ...

@Meteorkor
Copy link
Contributor

I tested RedisMessageListenerContainer and RedisLockRegistry with "RedisClusterConfiguration" and it was confirmed that they work normally.

@Meteorkor
Copy link
Contributor

I looked up why RedissonSpinLock exists and found this article
https://danilavaratyntsev.medium.com/redisson-spinlocks-299974055d17

Is it better to return to the spinLock method considering the cluster environment?

@artembilan
Copy link
Member

Thank you, @Meteorkor , for looking into this!

I tested RedisMessageListenerContainer and RedisLockRegistry with "RedisClusterConfiguration" and it was confirmed that they work normally.

Does it mean that this even works with the mentioned ElastiCache Redis regardless of the not on the RedisStaticMasterReplicaConfiguration:

 * Configuration class used for setting up {@link RedisConnection} via {@link RedisConnectionFactory} using the provided
 * Master / Replica configuration to nodes know to not change address. Eg. when connecting to
 * <a href="https://aws.amazon.com/documentation/elasticache/">AWS ElastiCache with Read Replicas</a>. <br/>
 * Please also note that a Master/Replica connection cannot be used for Pub/Sub operations.

?

Anyway, according to that article about RedissonSpinLock, it looks like it is better to have a our own similar solution. Perhaps the one we had before.
I guess it could be just enough to have a spinLock = true/false option on our RedisLockRegistry to back off from Pub/Sub in favor of busy-spin algorithm.

WDYT?

@Meteorkor
Copy link
Contributor

Does it mean that this even works with the mentioned ElastiCache Redis regardless of the not on the >RedisStaticMasterReplicaConfiguration:

No, I couldn't test because I couldn't build an environment that can configure RedisStaticMasterReplicaConfiguration.

Perhaps as hameno said, pub/sub doesn't seem to work in RedisStaticMasterReplicaConfiguration.

I just wanted to say that there was no problem with pub/sub operation in RedisClusterConfiguration as well as RedisStandaloneConfiguration.
(I thought that there were rather few environments where pub/sub could not be used)

As you said, it would be good to make it work with spinLock method.

I think it would be better to support the operation without pub/sub as part of the environment where pub/sub cannot be used(spinLock=true).

@artembilan
Copy link
Member

@Meteorkor ,

thank you for confirmation.

Any chances you can contribute the fix?

@artembilan artembilan added this to the 6.0.0-M4 milestone May 31, 2022
@artembilan artembilan added backport 5.5.x and removed status: waiting-for-reporter Needs a feedback from the reporter labels May 31, 2022
@Meteorkor
Copy link
Contributor

Meteorkor commented Jun 1, 2022

I'll fix it and post a PR.

Meteorkor added a commit to Meteorkor/spring-integration that referenced this issue Jun 6, 2022
Meteorkor added a commit to Meteorkor/spring-integration that referenced this issue Jun 6, 2022
Meteorkor added a commit to Meteorkor/spring-integration that referenced this issue Jun 7, 2022
Meteorkor added a commit to Meteorkor/spring-integration that referenced this issue Jun 8, 2022
artembilan pushed a commit that referenced this issue Jun 8, 2022
Fixes #3805

The Redis Pub-Sub doesn't work in all the environment, therefore there has to
be a choice to use old busy-spin algorithm

* Change to select between spinLock method and pub-sub method
* Make spinLock as a default one to let the `RedisLockRegistry` work everywhere
* Fix javadoc, convention, lazy init
* Fix javadoc, convention
* Code clean up and docs for `RedisLockType` feature

**Cherry-pick to 5.5.x**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants