Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi can return nil? #490

Closed
mperham opened this issue Dec 17, 2014 · 11 comments
Closed

multi can return nil? #490

mperham opened this issue Dec 17, 2014 · 11 comments

Comments

@mperham
Copy link
Contributor

mperham commented Dec 17, 2014

I've worked under the assumption that multi always returns an Array of command results. There appears to be an edge case that several Sidekiq users have reported recently where multi returns nil, causing Sidekiq to raise an error.

The only suspicious code I can find is this line.

Specifically the Sidekiq users are reporting this backtrace:

NoMethodError: undefined method `last' for nil:NilClass
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.3.0/lib/sidekiq/processor.rb:120 :in `block (2 levels) in stats`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/connection_pool-2.1.0/lib/connection_pool.rb:58 :in `with`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.3.0/lib/sidekiq.rb:72 :in `redis`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.3.0/lib/sidekiq/processor.rb:113 :in `block in stats`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.3.0/lib/sidekiq/processor.rb:138 :in `retry_and_suppress_exceptions`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.3.0/lib/sidekiq/processor.rb:112 :in `stats`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.3.0/lib/sidekiq/processor.rb:50 :in `process`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26 :in `public_send`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26 :in `dispatch`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:122 :in `dispatch`
[PROJECT_ROOT]/vendor/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60 :in `block in invoke`

from this code:

            result = conn.multi do
              conn.hdel("#{identity}:workers", thread_identity)
              conn.incrby("stat:processed", 1)
              conn.incrby(processed, 1)
            end
            conn.expire(processed, STATS_TIMEOUT) if result.last == 1

I had one of them send me a sample backtrace making a typical Redis call so I could verify there were no monkeypatches (e.g. NewRelic) which might cause invalid returns.

    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:129:in `value'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:121:in `_set'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:63:in `block in finish'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:62:in `each'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:62:in `each_with_index'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:62:in `each'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:62:in `map'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:62:in `finish'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/pipeline.rb:86:in `finish'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/client.rb:133:in `block in call_pipeline'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/client.rb:257:in `with_reconnect'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis/client.rb:131:in `call_pipeline'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis.rb:2094:in `block in multi'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis.rb:37:in `block in synchronize'
    from /opt/rubies/ruby-2.1.2/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis.rb:37:in `synchronize'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/redis-3.0.7/lib/redis.rb:2086:in `multi'
    from (irb):2:in `block in irb_binding'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/connection_pool-2.0.0/lib/connection_pool.rb:58:in `with'
    from /var/app/current/vendor/bundle/ruby/2.1.0/gems/sidekiq-3.2.6/lib/sidekiq.rb:72:in `redis'

This is happening with 3.0.7, 3.1.0 and 3.2.0. What's the right thing to do here? If nil is expected, what does a nil result mean?

@webandtech
Copy link

It tends to happen for us when we spawn a whole lot (100s or 1000s) of small jobs from one large job worker.

@sbleon
Copy link

sbleon commented Dec 17, 2014

@webandtech We do a lot of that as well!

@badboy
Copy link
Contributor

badboy commented Dec 18, 2014

There is in fact the possibility that multi returns nil for the case a WATCH fails. This should be properly documented (this is the same the actual Redis protocol does it).

About the problem with Sidekiq: Does it use WATCH at all? If not, something else must be triggering it and I would look into reproducing it.

@mperham
Copy link
Contributor Author

mperham commented Dec 18, 2014

Turns out I can avoid the return value so I'll do that. See the linked PR if curious.

@mperham mperham closed this as completed Dec 18, 2014
mperham added a commit to sidekiq/sidekiq that referenced this issue Dec 18, 2014
Refactor redis usage to avoid nil result from multi, see redis/redis-rb#490
@webandtech
Copy link

FYI the sidekiq-unique-jobs gem uses WATCH, which I believe is where this was originating. I had some other issues with that gem so ended up removing it from my Gemfile, which eliminated this issue from recurring for me.

@mperham
Copy link
Contributor Author

mperham commented Dec 26, 2014

Could anyone else experiencing this tell me what Redis service they are using? One Sidekiq user says he saw this when rejiggering ElasticCache and ELB. I wonder if it happens when EC transparently fails over due to load?

@mperham
Copy link
Contributor Author

mperham commented Jan 19, 2015

I believe this issue is actually mperham/connection_pool#67 where an ill-timed Timeout.timeout can cause a connection to become corrupt.

@badboy
Copy link
Contributor

badboy commented Jan 19, 2015

Yikes. Read the discussion in the other issue and on Twitter.
Atleast redis-rb itself is only using it in case of the native Ruby driver (which at best is not used).

@antirez
Copy link
Contributor

antirez commented Jan 19, 2015

EXEC can return an array, null (aborted transaction), or an error reply. This is fine. The corrupted connection is vary bad. Still not sure about what the cause of this it is, but is legit to believe that the timeout leaves the reply of the timed out command into the socket. If this is the case, redis-rb should do one of the following.

  1. After a timeout, force reconnection.
  2. Or, remember there is a reply to discard. And, make sure to reset this flag when reconnecting.

@djanowski
Copy link
Collaborator

I went ahead and implemented a possible solution in #502 by keeping counters. Please take a look and provide feedback there.

@djanowski
Copy link
Collaborator

Fixed in 3.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants