Would this work to avoid Redis::InheritedError? #364

myronmarston · 2013-09-06T21:32:54Z

Here at Moz we have a large distributed system that uses lots of forked processes (via our background job queue, Qless, which forks a process to give each job a sandbox like resque does) and also uses a lot of redis connections. We have a total of 112 of them, split both across different roles, and sharding on user id within those roles. We've been dealing with Redis::InheritedError by having each job reconnect to the redis connections that it might use. This works, but has also been causing us pain:

We have to frequently add logic to our jobs to have them reconnect to the appropriate redis connections. We understand why it's needed but it's a lot of logic that is only needed because we're forking a process in our workers.
We're running more than 2000 worker processes in production, and having each one reconnect to the redis connections it might use causes extra network load and has led to semi-frequent redis timeouts.

We'd like to come up with a better way to deal with this, where reconnects will happen lazily happen automatically, and wanted to get your feedback on it. Here's what we're currently thinking:

In our application, use a proxy object (probably based on the delegate stdlib) in place of the Redis instances.
Store the Process.pid when the instance is created.
Before forwarding messages to the wrapped Redis instance, check Process.pid against the stored Process.pid, and, if they are different, reconnect to redis and update the pid instance variable.
Include special handling for multi blocks: if the Process.pid changed since the block opened, allow the normal Redis::Inherited error to be raised (rather than reconnecting), since it would change the semantics of subsequent commands if we reconnected in the middle of a multi block.

A few questions about this:

Will this general plan work OK?
Besides multi, are there any other Redis methods that need special handling?
Is there any reason this logic couldn't be in the redis gem itself? If you're open to it being in the gem, I'd be happy to submit a PR adding it.

Thanks!

/cc @proby @benkirzhner

The text was updated successfully, but these errors were encountered:

pietern · 2013-09-20T16:00:07Z

Hi,

As you say, there's not really a way around using a separate connection per process. There is no simple way to coordinate usage of a single connection across processes, and in general you don't want to go there ;-).

The approach that you mention would work fine, though it is almost exactly what the Redis client already does. The main difference is that it raises the InheritedError where you would want to reconnect. One way to solve this is to just rescue this error, reconnect and retry. That would give lazy reconnection with minimal effort. Another, cleaner way to solve this is to make sure the process that gets forked isn't connected to Redis in the first place. That means that there is nothing to inherit for the child processes and gives you lazy reconnection implicitly (since that is the default behavior of the client to begin with). If you're using a Redis connection in your parent process, you can make sure that you use a different instance that is not used from your background jobs, instead of using a global instance.

I hope this helps.

Cheers,
Pieter

andreacfm · 2013-09-26T11:35:23Z

I am a bit confused here. Looking at https://github.com/redis/redis-rb/wiki/redis-rb-on-Phusion-Passenger looks that from v 3 there is no need to reconnect after a fork.
I was used to reconnect both after Resque and Passenger fork.

  PhusionPassenger.on_event(:starting_worker_process) do |forked|
    if forked
      # If the actual cache respond to reconnect go on.
      Rails.cache.reconnect if Rails.cache.respond_to? :reconnect
      # Reconnect Resque Redis instance.
      Resque.redis.client.reconnect
    end
  end

Is this still needed?
Looks like calling a reconnect raise the Redis::InheritedError.

Are you here suggests that a better option may be something like:

....
if forked
      # If the actual cache respond to reconnect go on.
      Rails.cache.redis = Redis.new
      # Reconnect Resque Redis instance.
      Resque.redis = Redis.new
    end
....

What do you think?

myronmarston · 2013-09-26T12:44:50Z

As you say, there's not really a way around using a separate connection per process. There is no simple way to coordinate usage of a single connection across processes, and in general you don't want to go there ;-).

Agreed. I completely understand the need for separate processes to use separate connections.

The approach that you mention would work fine, though it is almost exactly what the Redis client already does. The main difference is that it raises the InheritedError where you would want to reconnect.

That's a huge difference, though; the solution I proposed above (and implemented in our project via a proxy object) transparently handles reconnecting when needed rather than raising InheritedError.

One way to solve this is to just rescue this error, reconnect and retry. That would give lazy reconnection with minimal effort.

I think that would be a ton of effort: we'd have to wrap every place we use redis in rescue-reconnect-retry logic. You're probably thinking of putting the rescue-reconnect-retry at some higher-up place that can handle it in all our jobs (or whatever) automatically, but the problem is that we have over 100 redis connections in this application, and it would be non-trivial for such a bit of code to guess which redis connection to reconnect to. Plus, if we did that, it could involve wasting a lot of work: imagine if some heavy computation was done, then the error was raised: it would retry the heavy computation as well.

Another, cleaner way to solve this is to make sure the process that gets forked isn't connected to Redis in the first place. That means that there is nothing to inherit for the child processes and gives you lazy reconnection implicitly (since that is the default behavior of the client to begin with). If you're using a Redis connection in your parent process, you can make sure that you use a different instance that is not used from your background jobs, instead of using a global instance.

This isn't really feasible for us. We have lots of objects floating around that, when constructed, have been given a redis connection to talk to, and these objects may be used in the child process. We would basically have to re-architect how our application uses redis, which would be non-trivial.

Given that you said that the approach I suggested above would work fine and is almost exactly what the redis client already does (even though the results are really quite different)...is there any reason you wouldn't accept a PR implementing the approach above? FWIW, we've been using it in production for a couple weeks and we have yet to see a problem.

pietern · 2013-09-28T17:35:52Z

I understand your argument and agree that some handling should be added.

Instead of adding a proxy object, however, what do you think of just handling this in the code path where the InheritedError is raised. I'm thinking of an option reconnect_after_fork that, if set to true, will simply reconnect instead of raising. It can default to false to preserve current behavior and have people be aware what is going on when they do fork and try to reuse an existing connection. It is also more lightweight than a proxy object. I think it would be safe to assume people won't fork inside a MULTI context. If they do, they have bigger problems...

What do you think?

AvnerCohen · 2013-10-01T05:57:02Z

+1 for adding this is a part of the gem itself.

What I had in mind is something in the form of:

          if Process.pid != @pid
            reconnect and return if @autoreconnect
            raise InheritedError,
              "Tried to use a connection from a child process without reconnecting. " +
              "You need to reconnect to Redis after forking."
          end

_@autoreconnect_ should than be an optional parameter, it will simply remove the need of controlling the after_fork sequence from the user side and will maintain backwards compatibility, if it works fine for a while, it can probably than be set to be the default behavior with the next version change.

djanowski · 2013-11-18T17:00:44Z

I'm leaning towards including this.

It will be nice to run a Google Trends search for "Redis::InheritedError" a few months after releasing :)

AvnerCohen · 2013-11-18T19:04:59Z

Will a PR help to make this happen? Or maybe an animated gif?
Just let us know..

NaN1488 · 2013-11-28T19:46:48Z

I just added a PR, these are the changes https://github.com/redis/redis-rb/pull/389/files
it works for me :)

pietern · 2014-01-21T00:09:56Z

@djanowski What do you think about removing the InheritedError altogether and always reconnect?

pietern · 2014-04-02T15:08:38Z

Fixed by #414.

This was referenced Sep 28, 2013

Redis::InheritedError on Puma restart #366

Closed

Redis::InheritedError #363

Closed

pietern mentioned this issue Sep 30, 2013

Why not auto reconnect - Tried to use a connection from a child process without reconnecting. You need to reconnect to Redis after forking. #370

Closed

pietern mentioned this issue Jan 21, 2014

Solved the issue 364 via adding auto_reconnect option #389

Closed

saizai mentioned this issue Mar 14, 2014

client: make Redis::Client#ensure_connected handle fork reconnects #414

Merged

pietern closed this as completed Apr 2, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would this work to avoid Redis::InheritedError? #364

Would this work to avoid Redis::InheritedError? #364

myronmarston commented Sep 6, 2013

pietern commented Sep 20, 2013

andreacfm commented Sep 26, 2013

myronmarston commented Sep 26, 2013

pietern commented Sep 28, 2013

AvnerCohen commented Oct 1, 2013

djanowski commented Nov 18, 2013

AvnerCohen commented Nov 18, 2013

NaN1488 commented Nov 28, 2013

pietern commented Jan 21, 2014

pietern commented Apr 2, 2014

Would this work to avoid Redis::InheritedError? #364

Would this work to avoid Redis::InheritedError? #364

Comments

myronmarston commented Sep 6, 2013

pietern commented Sep 20, 2013

andreacfm commented Sep 26, 2013

myronmarston commented Sep 26, 2013

pietern commented Sep 28, 2013

AvnerCohen commented Oct 1, 2013

djanowski commented Nov 18, 2013

AvnerCohen commented Nov 18, 2013

NaN1488 commented Nov 28, 2013

pietern commented Jan 21, 2014

pietern commented Apr 2, 2014