New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gracefully handle connection errors #2550
Conversation
Unfortunately it's not as easy as that. connection_pool manages the connections and assumes that connections are self-healing. Furthermore, Sidekiq has no use at all for a readonly Redis connection. In a perfect world, redis-rb would reconnect upon any READONLY errors. I'll ponder if there's something I can do to make it better. |
Seems like we'd have to create a shim/proxy around the client object in Edit: I had success using |
I was thinking something like: module Sidekiq
def self.redis
@redis_pool.with do |conn|
begin
yield conn
rescue Redis::BaseError => ex
(conn.disconnect; retry) if ex.message =~ /READONLY/
end
end
end
end |
👍 for |
I'm not sure 100% of Sidekiq's redis operations go through |
gracefully handle connection errors
I have an issue redis-rb#550 open for this and a patch gem specifically for Redis on Elasticache https://github.com/craigmcnamara/redis-elasticache |
3.5.1 was just released with a EC "READONLY" fix
|
These errors can occur during Sidekiq's long-running job fetching command. This uses Redis' blocking BRPOP primitive. On failover in a cluster setup, these commands are interrupted by the server. This error causes the worker threads to be restarted, but as they are bubbled up to the top, they cause a lot of spam in our error logging systems. As related errors from other commands are being handled (see sidekiq#2550 and sidekiq#4495) this way, it seems senbile to also handle this one.
These errors can occur during Sidekiq's long-running job fetching command. This uses Redis' blocking BRPOP primitive. On failover in a cluster setup, these commands are interrupted by the server. This error causes the worker threads to be restarted, but as they are bubbled up to the top, they cause a lot of spam in our error logging systems. As related errors from other commands are being handled (see sidekiq#2550 and sidekiq#4495) this way, it seems senbile to also handle this one.
ref: redis/redis-rb#543
When AWS ElastiCache fails over from one node to another node, redis clients can be left connected to the original node which is now read-only. Sidekiq will then be unable to continue accepting or processing jobs until the connections are reset.
It seems that the best place to handle the exception would be within the sidekiq code as there are reasons why a general redis client might want to continue holding a connection to a read-only redis slave.
the specific error raised when this happens is:
Redis::CommandErrorREADONLY You can't write against a read only slave.
Ideally when this type of error occurs, sidekiq should drop the redis connection and retry the operation.