-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We randomly see a RedisTimeoutError #6162
Comments
What version is the Redis server? |
Can you include the output from |
Sure. # Server
redis_version:7.2.0
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:0000000000000000000000000000000000000000
redis_mode:standalone
os:Linux 5.4.0-1071-aws x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:7.5.0
process_id:12089385
run_id:f28c63be14220b56c1c14028bab584487d4d0fb1
tcp_port:13144
server_time_usec:1705005037000000
uptime_in_seconds:121430
uptime_in_days:1
hz:10
lru_clock:0
config_file:
# Clients
connected_clients:7
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:5
maxclients:256
cluster_connections:0
# Memory
used_memory:4135064
used_memory_human:3.94M
used_memory_rss:4135064
used_memory_peak:4319784
used_memory_peak_human:4.11M
used_memory_lua:40960
maxmemory_policy:noeviction
mem_fragmentation_ratio:1
mem_allocator:jemalloc-5.3.0
# Persistence
loading:0
rdb_changes_since_last_save:16636
rdb_bgsave_in_progress:0
rdb_last_save_time:1704883611
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
# Stats
total_connections_received:50
total_commands_processed:97020
instantaneous_ops_per_sec:4
total_net_input_bytes:18699852
total_net_output_bytes:223692477
instantaneous_input_kbps:0.45
instantaneous_output_kbps:2.58
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:5
evicted_keys:0
keyspace_hits:16247
keyspace_misses:235563
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
migrate_cached_sockets:0
total_forks:0
total_error_replies:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
total_active_defrag_time:0
current_active_defrag_time:0
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.00
used_cpu_user:0.00
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
used_cpu_sys_main_thread:0.00
used_cpu_user_main_thread:0.00
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=15,expires=11,avg_ttl=28671441902 |
I'm assuming that is your local Redis and not a production Redis. "db0:keys=15" is basically empty. I need to see production. |
This is production. I am using Redis Cloud. |
Well that certainly looks lightly used. I can't explain why you'd see any errors. Make sure the |
Confirmed we are using |
We're having the same problem. A lightly used Heroku instance running Sidekiq with a Redis instance hosted on Redis Cloud. Every few minutes we get new errors in the log:
This is our
From our
Our initializer: Sidekiq.configure_server do |config|
config.redis = { ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
Sidekiq.configure_client do |config|
config.redis = { ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end Our config: :max_retries: 7
:queues:
- default
- low I've contacted Redis Cloud support but they haven't provided any helpful guidance yet. |
Ruby and OpenSSL versions in use? |
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) I also ran
I'm starting to wonder if it was a zombie process. I created a Redis instance on Render and pointed my config there instead and have not been seeing the errors. I'll point back at Redis Cloud and see if the errors come back... |
I would recommend trying OpenSSL 3. Mixing modern versions (Ruby 3.3, redis 7.2) with legacy versions like OpenSSL 1.1 may cause subtle issues. That version is almost 4 years old now. |
As soon as I switched back to Redis Cloud I started getting the errors again:
The only difference is Render has a TLS URL whereas Redis Cloud provides a non-TLS URL. Could this have anything to do with SSL verify mode being set to "none" when a non-TLS URL is being used? |
No. I had the same issue when I tried with Heroku Data for Redis which provides a TLS URL. |
@mperham, my OpenSSL version is |
After upgrading to the heroku-22 stack with OpenSSL 3.0.2, the problem appears to have gone away. Perhaps Redis Cloud is running something on their end that does not play nicely with older versions of OpenSSL, whereas Render and Heroku Redis aren't. @mperham thanks for your help. |
I spoke too soon. The error is back again. It took an hour or so to start happening. I restarted the dyno and it went away. I'll wait another hour and see if it comes back. |
It came back after 30 mins this time. Not sure what's going on. Same logs as before:
|
Experiencing the same or a similar issue here. We're using Heroku redis. Also using heroku-22 stack, and exact same versions as originally reported. Our error message is
which is different to the 3.0 seconds everyone else is reporting Ruby version: 3.2.2 ..however our redis-client version is 0.19.1. I've raised it with heroku support, they've sent me here. It is happening very rarely, so it's really just noise. (Bearing in mind this is happening on an application with very low utilisation) Here is the output from redis-cli info:
Very similar sidekiq config except we're using sidekiq pro
|
Experiencing the same issue on a self-managed server running Ubuntu 22.04. Like others, we're seeing this intermittently. We get a burst of 5-15 exception reports from our app, then the problem disappears for another week or so. Logs show CPU and memory are NOT under load when this problem occurs. Here is the
|
I've noticed that no one posted a backtrace. Can anyone supply a backtrace or is it swallowed in this case? |
@mperham this is what our app reported to Sentry:
|
Ugh, that looks like a bog standard read timeout from an overloaded Redis. I can't see any other hints about what's wrong. I'm still flummoxed. |
we are using sidekiq on same env for long time and this issue started only after update to sidekiq 7. Maybe it is bad new Looks like right now to solve this issue is best approach is not to update to sidekiq 7.
|
Any update on this? We can confirm this started happening as well after upgrading to Sidekiq 7 / Redis 7. We are using Redis Cloud by Redis Labs. |
@casperisfine do you have any advice for tracking down random read timeouts in redis-client? |
Not particularly no. You have to experience it yourself and know TCP / SSL etc to poke at it to figure out what's wrong. Not really possible to do proxy debugging with these things. What's really not clear to me is why it would have started with Sidekiq 7. There was of course a bunch of change in the Redis client, but fundamentally, they don't work very differently. Timeout are unfortunately a common things, especially on public cloud, so I wonder if it's not just a reporting difference? Were previous versions of sidekiq reporting these errors or just swallowing them and retrying? Also Again, super hard to tell from just reports like this, I also don't see any common point in the various reports. I see some TLS, some no-TLS, some |
Good questions. I've also seen compatibility issues with OpenSSL 3 and Ruby 3 along with actual Redis server bugs with OpenSSL 3. Make sure you're upgrading those stack bits to get network fixes. These errors are "below" Sidekiq in the stack and sadly I don't have really any insight into what's causing the issue. |
I also have same problem.
This is the backtrace
but connection error messages was repeated so I downgraded to Sidekiq 6.5.5. I am sorry for not providing any clues for the solution. |
Not sure how much this will help, but figured I would add some details that might give someone some clues? This week I upgraded 3 small Heroku apps to Sidekiq 7.2.4 using Redis Cloud 30MB and 100MB with Redis 7.2.3. All are Ruby 3.3.1, Rails 7.1.3.2. All use 1 worker dyno each with 5 concurrency. App 1:
App 2:
App 3:
|
@mperham I don't remember if we discussed this before. But it just hit me that for many of these reports, it might simply due to This probably participated in handling lots of transient issues with various cloud providers, especially slow ones etc. Maybe it would make sense to set an higher default timeout in Sidekiq, as it's much less latency sensitive as a web server. |
@casperisfine yeah, esp if the CPU is saturated and only switching every 100ms. I will get this into Sidekiq 7.3. |
Hi Mike. Do you have a timeframe for when you'll be pushing 7.3 to RubyGems? |
@pmichaeljones You can pick up the change in 7.x with this: Sidekiq.configure_server do |cfg|
cfg.redis = { timeout: 3, ... }
end The 7.3 milestone is in the issues. |
Forgot to update: Upgraded to
|
That's great, thanks. Based on redis-rb/redis-client#197 (comment) this suggests that increasing the BRPOP timeout from 3 to 4 seconds might be making a difference. With This might be giving credence to the timeout theory in #6162 (comment) and 1c2a37d. Will Sidekiq 7.3, I believe the timeout will actually now be 3+3+2 = 8, so I think we'd want to drop the extra |
redis-client v0.22.2 automatically adds `read_timeout` to the socket read timeout, so we no longer need to manually add this. Prior this commit, the total socket read timeout crept up to 8 seconds: ``` conn.read_timeout + conn.read_timeout + TIMEOUT = 3 + 3 + 2 = 8 ``` Now it is 5 seconds: ``` conn.read_timeout + TIMEOUT = 3 + 2 = 5 ``` This relates to sidekiq#6162.
redis-client v0.22.2 automatically adds `read_timeout` to the socket read timeout, so we no longer need to manually add this. Prior this commit, the total socket read timeout crept up to 8 seconds: ``` conn.read_timeout + conn.read_timeout + TIMEOUT = 3 + 3 + 2 = 8 ``` Now it is 5 seconds: ``` conn.read_timeout + TIMEOUT = 3 + 2 = 5 ``` This relates to sidekiq#6162.
redis-client v0.22.2 automatically adds `read_timeout` to the socket read timeout, so we no longer need to manually add this. Prior this commit, the total socket read timeout crept up to 8 seconds: ``` conn.read_timeout + conn.read_timeout + TIMEOUT = 3 + 3 + 2 = 8 ``` Now it is 5 seconds: ``` conn.read_timeout + TIMEOUT = 3 + 2 = 5 ``` This relates to #6162.
Upgraded to |
Just to add more info: |
upgrade to redis-client (0.22.2) 5 days ago and it looks solved. |
It's been 3 days since I upgraded to |
Unfortunately even after upgrading to |
Same here! |
It is happening to us as well I also contacted Heroku support and was sent here. We're using |
@mperham here is our raw stack trace:
|
@jeremyhaile whats your process concurrency? The only other cause of timeouts could be if your CPUs are way overloaded. Reduce your concurrency. |
< 100 connections (from several servers/pools). Low activity. Yea, I'm not sure what's causing it. Perhaps we need to move off of Heroku. Or, perhaps there's a bug somewhere... |
@jeremyhaile Your environment is totally same except your redis-client version is 0.22.1
and my sidekiq.rb redis config is like this. I upgraded redis-client on Jun 5th and the timeout error hasn't appeared since. |
@kikikiblue thanks for the idea. We updated to 0.22.2 yesterday which seems like it also increases the default read timeout to 4s. Unfortunately we've seen the error 3 times today already, with a 4s timeout error now. Our load is almost nothing today too, as most of our load is during weekdays. |
In my cases any version updates for sidekiq and redis-client gems did not help. Help to change config: SIDEKIQ_REDIS_OPTIONS = {
network_timeout: 3,
pool_timeout: 2,
reconnect_attempts: [0.1, 0.25, 0.75], # I think this is major change, which helped to remove RedisTimeoutError errors
size: Integer(ENV.fetch('RAILS_MAX_THREADS', 10))
}.freeze
Sidekiq.configure_client do |config|
config.redis = SIDEKIQ_REDIS_OPTIONS
end
Sidekiq.configure_server do |config|
config.redis = SIDEKIQ_REDIS_OPTIONS
end |
Oh, right, as mentioned above: For those of you having trouble, can you see if increasing the connection pool size helps? Perhaps you're running into #5929 (comment) Also, do your jobs call |
For reference: Running @le0pard can you confirm if your suggestion does improve the situation? |
@stanhu nope, my code did not use @rafaelbiriba I can repeat, if this was not clear from first message - errors disappear only after I reached this settings for redis in sidekiq (both server and client), any versions update did not help with errors. Now (90 days range in error system filter by redis errors): 4 months ago situation: |
To be clear, the default As I mentioned in #5929 (comment):
By increasing the pool size to 10, I think you're effectively doing what Sidekiq 6 did before. |
only increasing pool did not help, help also |
I know this error was already discussed a lot.
We randomly see a RedisTimeoutError. Our app is deployed on Heroku and uses Redis Cloud.
This is config:
We also checked Redis latency:
3.2.2
7.1.2
7.2
The text was updated successfully, but these errors were encountered: