Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded MULTI transactions cause big latency spikes in Redis #149

Open
elboulangero opened this issue Oct 11, 2023 · 0 comments
Open

Unbounded MULTI transactions cause big latency spikes in Redis #149

elboulangero opened this issue Oct 11, 2023 · 0 comments

Comments

@elboulangero
Copy link
Contributor

elboulangero commented Oct 11, 2023

I'm evaluating Mirrorbits for Kali Linux. Kali is distro based on Debian, and we have around 500,000 files in the repository (this number can go up to a million).

So we have deployed Mirrorbits (not in production yet), and I noticed a warning messages in the logs: Renewing lock for %s failed: lock disappeared. Looking at the code, we can see that this lock is in fact a key in redis, with a 10 seconds TTL, renewed every 5 seconds. As it turns out, the lock disappeared because, every now and then, the Redis server was unavailable for more than 5 seconds.

So I did some research, and I recorded and plotted the Redis latency. Here's the result and it's not pretty:

latency-nonginx-normal

As we can see, we have very big latency spikes (up to 9 seconds).

After more investigation, it turns out that this is caused by unbounded MULTI transactions made by Mirrorbits. There are a couple of places in the code where it's done, and often the number of commands in the transaction is a factor of the number of files in the repo. As said above, we have ~ 500k files in the Kali repo. So it's no surprise if these transactions cause Redis to be unresponsive for too long.

I could prepare a patch where these transactions are broken down in batches of 5k files (following recommendation from https://redis.io/docs/manual/pipelining/, cf. IMPORTANT NOTE). The result looks much better:

latency-nonginx-normal-splitmulti-5kblocksize

We still have latency spikes, however 1) they are under a second and 2) they are caused, this time, by expensive commands SDIFF and SINTERSTORE. Optimizing that might be doable, but it can be the topic for another issue and PR.

In the meantime, I propose #148 to at least fix the latency caused by the unbounded MULTI transactions.


For anyone interested, here's how I measured the latency.

In a shell, where the Redis instance is running, run:

for i in {1..3600}; do { echo -n "$(date +%s) "; redis-cli -i 1 --latency; } >> latency.txt; done

And here's the gnuplot script to plot it:

$ cat script.pl 
# gnuplot -c script.plg FILENAME
set title "Redis latency, measured every second during an hour"
set xdata time
set timefmt "%s"
plot ARG1 using 1:($3/1000) with steps title "Latency in seconds"
pause mouse close

Run the script with gnuplot -c script.plg latency.txt. Replace 1:($3/1000) with 1:3 if you want to see milliseconds instead.

I'd be curious to see how it looks like for other instances, depending on the number of files in the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants