Redis cluster refresh of large clusters keeps I/O threads busy #2045

be-hase · 2022-03-15T14:33:34Z

Bug Report

Current Behavior

For large redis cluster, using enablePeriodicRefresh causes serious performance problems.

The processing time required for DefaultClusterTopologyRefresh.getNodeSpecificViews increases in proportion to the cluster size.
Unfortunately, this is executed in the NIO event loop thread.

See the repository README below for details.
https://github.com/be-hase/lettuce-with-large-cluster

Input Code

I have prepared the code that can reproduce this problem.
https://github.com/be-hase/lettuce-with-large-cluster

Expected behavior/code

Even if we use enablePeriodicRefresh with large redis cluster, the performance will not deteriorate.

Environment

Lettuce version(s): 6.1.6.RELEASE
Redis version: 6.2.0

Possible Solution

Idea 1

Stop running DefaultClusterTopologyRefresh.getNodeSpecificViews on the NIO event loop.

I won't go into details, but if I customized DefaultClusterTopologyRefresh and ran it on another thread, the
performance improved.

My team is considering adopting this method as a workaround.

Idea 2

Looking at the previous framegraph, it seems that the overhead is large in the processing using BitSet.
Why not change to a more primitive method like boolean[] ?

However, I'm not sure if the performance will improve.

Additional context

NONE

The text was updated successfully, but these errors were encountered:

be-hase · 2022-03-16T01:07:32Z

In my actual environment, I use both periodic refresh and adaptive refresh.

Should I use only adaptive refresh in a large cluster?
Are there any disadvantages when using only adaptive refresh ?

be-hase · 2022-03-16T01:42:36Z

Ah, it seems good to use dynamicRefreshSources(false) for a large cluster.

mp911de · 2022-03-16T09:15:20Z

BitSet has a much smaller memory footprint than boolean[] and it is easier in usage. Generally, if you increase the periodic refresh period and rely more on adaptive refresh, then you will likely experience less refreshes.

It would be possible to use a different scheduler in DefaultClusterTopologyRefresh.loadViews(…) as we're using future chaining. What is the underlying cause for the ticket, how did you find out about this issue? Was it motivated by an incident or something similar?

be-hase · 2022-03-16T09:38:14Z

I had this problem when I upgraded from lettuce5 to lettuce6.
We are using a large redis cluster(96 nodes).

After investigating the cause, it was determined that it was due to DefaultClusterTopologyRefresh.getNodeSpecificViews running on the NIO event loop thread.
Asynchronous Cluster Topology Refresh has appeared from lettuce6, so I think it is the influence.

mp911de · 2022-03-16T09:48:18Z

Feel free to submit a pull request to use ClientResources.eventExecutorGroup() in the future chaining in DefaultClusterTopologyRefresh.getNodeSpecificViews

be-hase · 2022-03-16T10:02:37Z

OK. I'll try to submit PR later.

be-hase · 2022-03-17T02:48:10Z

@mp911de
Submited PR. Please check 🙏
#2048

Original pull request: #2048.

Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests. Original pull request: #2048.

Original pull request: #2048.

Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests. Original pull request: #2048.

Original pull request: redis#2048.

Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests. Original pull request: redis#2048.

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Mar 16, 2022

mp911de changed the title ~~Large redis cluster causes serious performance problem due to periodic cluster topology refresh~~ Redis cluster refresh of large clusters keeps I/O threads busy Mar 16, 2022

mp911de added type: enhancement A general enhancement and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 16, 2022

mp911de added this to the 6.1.7 milestone Mar 16, 2022

be-hase mentioned this issue Mar 16, 2022

Use computation thread pool for getNodeSpecificViews #2048

Closed

4 tasks

mp911de pushed a commit that referenced this issue Mar 18, 2022

Use computation thread pool for getNodeSpecificViews #2045

d9775e0

Original pull request: #2048.

mp911de added a commit that referenced this issue Mar 18, 2022

Polishing #2045

65de753

Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests. Original pull request: #2048.

mp911de pushed a commit that referenced this issue Mar 18, 2022

Use computation thread pool for getNodeSpecificViews #2045

ae99229

Original pull request: #2048.

mp911de added a commit that referenced this issue Mar 18, 2022

Polishing #2045

91fb351

Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests. Original pull request: #2048.

mp911de closed this as completed Mar 18, 2022

be-hase mentioned this issue Mar 18, 2022

Reduces overhead of RedisClusterNode.forEachSlot #2058

Merged

4 tasks

GilboaAWS pushed a commit to GilboaAWS/lettuce-core-1 that referenced this issue Apr 3, 2022

Use computation thread pool for getNodeSpecificViews redis#2045

3035c3d

Original pull request: redis#2048.

GilboaAWS pushed a commit to GilboaAWS/lettuce-core-1 that referenced this issue Apr 3, 2022

Polishing redis#2045

bcb8146

Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests. Original pull request: redis#2048.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis cluster refresh of large clusters keeps I/O threads busy #2045

Redis cluster refresh of large clusters keeps I/O threads busy #2045

be-hase commented Mar 15, 2022 •

edited

be-hase commented Mar 16, 2022

be-hase commented Mar 16, 2022

mp911de commented Mar 16, 2022

be-hase commented Mar 16, 2022 •

edited

mp911de commented Mar 16, 2022

be-hase commented Mar 16, 2022

be-hase commented Mar 17, 2022

Redis cluster refresh of large clusters keeps I/O threads busy #2045

Redis cluster refresh of large clusters keeps I/O threads busy #2045

Comments

be-hase commented Mar 15, 2022 • edited

Bug Report

Current Behavior

Input Code

Expected behavior/code

Environment

Possible Solution

Idea 1

Idea 2

Additional context

be-hase commented Mar 16, 2022

be-hase commented Mar 16, 2022

mp911de commented Mar 16, 2022

be-hase commented Mar 16, 2022 • edited

mp911de commented Mar 16, 2022

be-hase commented Mar 16, 2022

be-hase commented Mar 17, 2022

be-hase commented Mar 15, 2022 •

edited

be-hase commented Mar 16, 2022 •

edited