Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would this work to avoid Redis::InheritedError? #364

Closed
myronmarston opened this issue Sep 6, 2013 · 10 comments
Closed

Would this work to avoid Redis::InheritedError? #364

myronmarston opened this issue Sep 6, 2013 · 10 comments

Comments

@myronmarston
Copy link

Here at Moz we have a large distributed system that uses lots of forked processes (via our background job queue, Qless, which forks a process to give each job a sandbox like resque does) and also uses a lot of redis connections. We have a total of 112 of them, split both across different roles, and sharding on user id within those roles. We've been dealing with Redis::InheritedError by having each job reconnect to the redis connections that it might use. This works, but has also been causing us pain:

  • We have to frequently add logic to our jobs to have them reconnect to the appropriate redis connections. We understand why it's needed but it's a lot of logic that is only needed because we're forking a process in our workers.
  • We're running more than 2000 worker processes in production, and having each one reconnect to the redis connections it might use causes extra network load and has led to semi-frequent redis timeouts.

We'd like to come up with a better way to deal with this, where reconnects will happen lazily happen automatically, and wanted to get your feedback on it. Here's what we're currently thinking:

  • In our application, use a proxy object (probably based on the delegate stdlib) in place of the Redis instances.
  • Store the Process.pid when the instance is created.
  • Before forwarding messages to the wrapped Redis instance, check Process.pid against the stored Process.pid, and, if they are different, reconnect to redis and update the pid instance variable.
  • Include special handling for multi blocks: if the Process.pid changed since the block opened, allow the normal Redis::Inherited error to be raised (rather than reconnecting), since it would change the semantics of subsequent commands if we reconnected in the middle of a multi block.

A few questions about this:

  • Will this general plan work OK?
  • Besides multi, are there any other Redis methods that need special handling?
  • Is there any reason this logic couldn't be in the redis gem itself? If you're open to it being in the gem, I'd be happy to submit a PR adding it.

Thanks!

/cc @proby @benkirzhner

@pietern
Copy link
Collaborator

pietern commented Sep 20, 2013

Hi,

As you say, there's not really a way around using a separate connection per process. There is no simple way to coordinate usage of a single connection across processes, and in general you don't want to go there ;-).

The approach that you mention would work fine, though it is almost exactly what the Redis client already does. The main difference is that it raises the InheritedError where you would want to reconnect. One way to solve this is to just rescue this error, reconnect and retry. That would give lazy reconnection with minimal effort. Another, cleaner way to solve this is to make sure the process that gets forked isn't connected to Redis in the first place. That means that there is nothing to inherit for the child processes and gives you lazy reconnection implicitly (since that is the default behavior of the client to begin with). If you're using a Redis connection in your parent process, you can make sure that you use a different instance that is not used from your background jobs, instead of using a global instance.

I hope this helps.

Cheers,
Pieter

@andreacfm
Copy link

I am a bit confused here. Looking at https://github.com/redis/redis-rb/wiki/redis-rb-on-Phusion-Passenger looks that from v 3 there is no need to reconnect after a fork.
I was used to reconnect both after Resque and Passenger fork.

  PhusionPassenger.on_event(:starting_worker_process) do |forked|
    if forked
      # If the actual cache respond to reconnect go on.
      Rails.cache.reconnect if Rails.cache.respond_to? :reconnect
      # Reconnect Resque Redis instance.
      Resque.redis.client.reconnect
    end
  end

Is this still needed?
Looks like calling a reconnect raise the Redis::InheritedError.

Are you here suggests that a better option may be something like:

....
if forked
      # If the actual cache respond to reconnect go on.
      Rails.cache.redis = Redis.new
      # Reconnect Resque Redis instance.
      Resque.redis = Redis.new
    end
....

What do you think?

@myronmarston
Copy link
Author

As you say, there's not really a way around using a separate connection per process. There is no simple way to coordinate usage of a single connection across processes, and in general you don't want to go there ;-).

Agreed. I completely understand the need for separate processes to use separate connections.

The approach that you mention would work fine, though it is almost exactly what the Redis client already does. The main difference is that it raises the InheritedError where you would want to reconnect.

That's a huge difference, though; the solution I proposed above (and implemented in our project via a proxy object) transparently handles reconnecting when needed rather than raising InheritedError.

One way to solve this is to just rescue this error, reconnect and retry. That would give lazy reconnection with minimal effort.

I think that would be a ton of effort: we'd have to wrap every place we use redis in rescue-reconnect-retry logic. You're probably thinking of putting the rescue-reconnect-retry at some higher-up place that can handle it in all our jobs (or whatever) automatically, but the problem is that we have over 100 redis connections in this application, and it would be non-trivial for such a bit of code to guess which redis connection to reconnect to. Plus, if we did that, it could involve wasting a lot of work: imagine if some heavy computation was done, then the error was raised: it would retry the heavy computation as well.

Another, cleaner way to solve this is to make sure the process that gets forked isn't connected to Redis in the first place. That means that there is nothing to inherit for the child processes and gives you lazy reconnection implicitly (since that is the default behavior of the client to begin with). If you're using a Redis connection in your parent process, you can make sure that you use a different instance that is not used from your background jobs, instead of using a global instance.

This isn't really feasible for us. We have lots of objects floating around that, when constructed, have been given a redis connection to talk to, and these objects may be used in the child process. We would basically have to re-architect how our application uses redis, which would be non-trivial.

Given that you said that the approach I suggested above would work fine and is almost exactly what the redis client already does (even though the results are really quite different)...is there any reason you wouldn't accept a PR implementing the approach above? FWIW, we've been using it in production for a couple weeks and we have yet to see a problem.

@pietern
Copy link
Collaborator

pietern commented Sep 28, 2013

I understand your argument and agree that some handling should be added.

Instead of adding a proxy object, however, what do you think of just handling this in the code path where the InheritedError is raised. I'm thinking of an option reconnect_after_fork that, if set to true, will simply reconnect instead of raising. It can default to false to preserve current behavior and have people be aware what is going on when they do fork and try to reuse an existing connection. It is also more lightweight than a proxy object. I think it would be safe to assume people won't fork inside a MULTI context. If they do, they have bigger problems...

What do you think?

@AvnerCohen
Copy link

+1 for adding this is a part of the gem itself.

What I had in mind is something in the form of:

          if Process.pid != @pid
            reconnect and return if @autoreconnect
            raise InheritedError,
              "Tried to use a connection from a child process without reconnecting. " +
              "You need to reconnect to Redis after forking."
          end

_@autoreconnect_ should than be an optional parameter, it will simply remove the need of controlling the after_fork sequence from the user side and will maintain backwards compatibility, if it works fine for a while, it can probably than be set to be the default behavior with the next version change.

@djanowski
Copy link
Collaborator

I'm leaning towards including this.

It will be nice to run a Google Trends search for "Redis::InheritedError" a few months after releasing :)

@AvnerCohen
Copy link

Will a PR help to make this happen? Or maybe an animated gif?
Just let us know..

@NaN1488
Copy link

NaN1488 commented Nov 28, 2013

I just added a PR, these are the changes https://github.com/redis/redis-rb/pull/389/files
it works for me :)

@pietern
Copy link
Collaborator

pietern commented Jan 21, 2014

@djanowski What do you think about removing the InheritedError altogether and always reconnect?

@pietern
Copy link
Collaborator

pietern commented Apr 2, 2014

Fixed by #414.

@pietern pietern closed this as completed Apr 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants