Create a class between Resque classes and what they do in redis #1210

davetron5000 · 2014-04-27T17:35:44Z

Problem

I want to write code to access information about what's going on in Resque, but the code needs to work for multiple Resque instances in the same Ruby VM. Because Resque.redis is global, it is very difficult (impossible in some cases) to use the Resque API directly.

Solution

Provide an API that does not rely on a global variable that encapsulates all the ways in which Resque interacts with Redis, namely the names of keys and what sort of data structure is expected in those keys.

Consider a call like this:

decode redis.lpop("queue:#{queue}")

This should mean "decode the job on queue queue", but it actually means "decode whatever is in redis under the key "queue:#{queue}" which just so happens to be how we store queues, but don't worry about that right now, just go in Redis and do it".

With this PR, it turns into this:

decode(@data_store.pop_from_queue(queue))

which is saying "get me the job in queue queue, however that's done, and decode it.

Which means that someone else can do this without knowing how to construct the redis key for queue.

And because that knowledge is now centralized in one class (DataStore) instead of littered throughout the codebase, one could perform these operations on multiple resque queues from the same Ruby VM, e.g. for monitoring:

resques = {
    www: '10.0.3.4:2345',
  admin: '10.1.4.5:8765',
    ops: '10.1.4.5:8766',
}

data_stores = Hash[resques.map { |name,location|
  [name,Resque::DataStore.new(Redis.new(location))]
}]

data_stores[:www].num_failed # => how many are failed in www's Resque
data_stores[:admin].num_failed # => what about admin?
stuck_workers = data_stores[:ops].workers.select { |worker|
  data_stores[:ops].worker_start_time(worker) > 1.hour.ago
}

And so forth.

This is not an ideal design, but it solves the problem without breaking backwards compatibility and is better than what exists now, since it at least centralizes how Resque's data structures are stored in Redis. It could also, in theory, allow a different backing store than Redis.

I hacked a concerning concept to demonstrate which calls were relevant to what—this could be split into further classes. It's also possible that versions of the major objects (Resque, Worker, and Job) could be created to not use a global for redis, but that is for another day.

davetron5000 · 2014-04-27T17:36:17Z

lib/resque.rb

    else
-      @redis = Redis::Namespace.new(:resque, :redis => server)
+      @data_store = Resque::DataStore.new(Redis::Namespace.new(:resque, :redis => server))


I'm not married to this name, and am happy to change it if the overall concept is something that's desired

davetron5000 · 2014-04-27T17:39:38Z

I did not make any changes to resque-web as there appears to be no test coverage for it, and my ultimate goal is to use DataStore to make a better means of monitoring resque anyway.

yaauie · 2014-04-28T22:18:57Z

yes, yes, yes. I love the concept. I'll dive into and review the pull-request in the next day or so.

davetron5000 · 2014-05-02T13:16:37Z

Updated this to start showing what different classes might look like. Will drop a few comments on the diff to explain more

davetron5000 · 2014-05-02T13:22:07Z

lib/resque/data_store.rb

+      @stats_access        = StatsAccess.new(@redis)
+    end
+
+    def_delegators :@queue_access, :push_to_queue,


Basically, this class is an über-class that acts just like Resque.redis used to, thus maintaining backwards-compatibility.

The real "win" is that we've separated the various aspects of resque into different classes (which I will state now are all named horribly—please help me come up with better names):

How do I access jobs in the queues? QueueAccess

How do I find out about failed jobs? FailedQueueAccess

How do I manage the workers? Workers

How do I use the stats? StatsAccess

Each of these new classes could include the method_missing and respond_to? methods here and then be replaced for DataStore in the various resque classes. e.g. Resque could likely just use QueueAccess instead of DataStore; Worker could just use Workers.

Given all of that, a way forward could be:

do that, and add deprecation warnings in method_missing. This warnings would be triggered by third parties who were access Resque.redis directly. Probably a ton of this would exist

Remove method_missing as a breaking change

Allow replacement of any of the implementations of these classes. e.g. you could implement Workers to store worker metadata in a SQL database (or whatever)

I realize I'm hand-waving over a massive amount of work, but it would avoid rewriting the core of resque, I guess.

Selfishly, this still allows me access to many resque queues' metadata from one RubyVM, which is my immediate need.

davetron5000 · 2014-06-12T15:07:51Z

Here is an application that this change enables that would be difficult or impossible to do otherwise: https://github.com/stitchfix/resque-brain

Specific use of this class can be seen here:

https://github.com/stitchfix/resque-brain/blob/master/app/models/resque_instance.rb

steveklabnik · 2015-11-05T14:18:53Z

Hey there! It's been a while, sorry about that.

@hoffmanc and I are going to be working on Resque again, but this PR needs a rebase. If you get the chance, would you mind

reabasing, if you're up for it
if you're not up for it, that's cool, just let us know so we can investigate

Thanks! / sorry 😦

davetron5000 · 2015-11-05T14:27:27Z

Rebased. Let me know if I can help. This code has been running in production since I opened this PR, FWIW, as part of https://github.com/stitchfix/resque-brain

steveklabnik · 2015-11-11T15:26:55Z

I'm going to re-queue a build here, because Travis was just being weird.

@hoffmanc what do you think of this PR?

steveklabnik · 2016-01-15T17:21:20Z

This has a merge conflict, would you mind rebasing please? Sorry for the delay in reviewing.

steveklabnik · 2016-01-15T17:21:51Z

(Also, I'm trying to get people back on master Resque for the next release, are there any other patches which you're using in production?)

davetron5000 · 2016-01-15T18:52:27Z

I don't know if I can. I got some very strange conflicts and it looked there has been a ton of churn. Let me try squashing my commits and opening up a new PR

davetron5000 · 2016-01-15T19:00:20Z

OK, I think this PR is fundamentally incompatible with the Resque::Backend concept. I don't know if that solves my original problem, but I don't see how this PR can go forward in this state.

davetron5000 · 2016-01-22T19:51:20Z

I didn't realize master was not the main branch. Let me take a day or two to see if I can get this rebased

steveklabnik · 2016-01-22T19:58:12Z

Yeah, sorry. I am hoping to switch it soon, along with the next release.

steveklabnik · 2016-02-04T19:10:36Z

Did we ever get this rebased?

corincerami · 2016-02-10T00:54:48Z

I'm very 👍 this PR. Anything that makes the Ruby code we all use easier to read.

davetron5000 · 2016-02-10T13:40:01Z

OK, this is rebased. Seeing if the build passes. Assuming it does or I can fix any issues that come up, before merging this we should all agree that we are confident in the test coverage. I went to great lengths to make this an internal refactor that doesn't break any interfaces, but it's obviously a lot of code to move around.

steveklabnik · 2016-03-10T09:26:28Z

This is still causing a lot of failures, and is somehow out of date again :(

davetron5000 · 2016-03-10T12:47:08Z

rebased

steveklabnik · 2016-03-10T15:50:33Z

Future work will happen on master, yeah. Basically, I made the old master into a two-oh-is-cancelled branch to save all that work, set master to 1-x-stable, and force pushed.

corincerami · 2016-04-02T02:05:10Z

lib/resque/data_store.rb

+      end
+    end
+
+    class QueueAccess


Is there a reason these classes don't get their own files, since they are full fledged classes?

corincerami · 2016-04-02T02:41:35Z

Aha! It seems like the timeouts are being caused because at some points in the tests, Resque.redis = is being passed a Resque::DataStore as an argument, and this case is never handled in the new definition of Resque.redis=. If you add

    when Resque::DataStore
      @data_store = server

to the case statement, the timeouts should go away. It seems like it was creating a new Redis::Namespace with the Resque::DataStore set as the redis argument, and then chaining those many levels down.

Fixing that should make it easier to figure out what is causing the other failures.

corincerami · 2016-04-02T02:50:55Z

I was able to fix the failing spec around Redis.remove with the following change:

      def remove_from_failed_queue(index_in_failed_queue,failed_queue_name=nil)
        failed_queue_name ||= :failed
        hopefully_unique_value_we_can_use_to_delete_job = ""
        @redis.lset(failed_queue_name, index_in_failed_queue, hopefully_unique_value_we_can_use_to_delete_job)
        @redis.lrem(failed_queue_name, 1,                     hopefully_unique_value_we_can_use_to_delete_job)
      end

This was required because in Redis.remove, queue defaults to nil:

      def self.remove(id, queue = nil)
        check_queue(queue)
        data_store.remove_from_failed_queue(id, queue)
      end

Passing an explicit nil argument is different from not including the argument, so nil was used as the queue_name in remove_from_failed_queue instead of the default of :failed.

corincerami · 2016-04-02T02:56:48Z

I believe the failure on line 250 of test/resque_test.rb is because DataStore#all_resque_keys should be:

    def all_resque_keys
      @redis.keys("*").map do |key|
        key.sub("#{Resque.redis.namespace}:", '')
      end
    end

rather than

    def all_resque_keys
      @redis.keys("*").map do |key|
        key.sub("#{redis.namespace}:", '')
      end
    end

corincerami · 2016-04-02T03:07:45Z

The third and final failure I'm seeing is from line 158 of test/worker_test.rb, where previously it read:

Resque.redis.stubs(:get).raises(Redis::CannotConnectError)

it should now read:

Resque.data_store.redis.stubs(:get).raises(Redis::CannotConnectError)

or the equivalent:

Resque.redis.redis.stubs(:get).raises(Redis::CannotConnectError)

although I find the first one clearer.

corincerami · 2016-04-02T03:09:19Z

@davetron5000 I've opened a PR to this branch to fix these test failures, stitchfix#1. If you have better ideas on how to fix these things and want to take a stab yourself, feel free to close that PR; just thought it might make things smoother for you.

davetron5000 · 2016-04-06T23:43:44Z

@chrisccerami thanks so much for your comments and time! Sorry I was on vacation or woud've responded sooner. If you have those improvements handy, push them up and I'll steal from you :)

Also, there's no reason for the classes being in one file other than that they were experimental and I wanted to get some feedback before doing too much. I can/will break them out into their own files.

corincerami · 2016-04-07T01:00:45Z

@davetron5000 I've opened a PR to this PR 😜

stitchfix#1

corincerami · 2016-05-21T22:46:19Z

@davetron5000 if you could merge stitchfix#1 into your branch and then rebase, I think this would be ok to merge.

davetron5000 · 2016-05-23T16:29:43Z

I think I've pushed the rebased branch, but GH is having issues so maybe the web UI isn't showing it?

corincerami · 2016-05-23T16:42:39Z

Yeah, Travis also didn't pick up the changes, likely due to Githubs issues.

davetron5000 · 2016-05-24T19:28:06Z

ok, now looking at this I think I screwed this up. Let me try again

coveralls · 2016-05-24T19:29:46Z

Coverage increased (+48.4%) to 82.413% when pulling 0cc29de on stitchfix:resque-redis-interface into 1deabd9 on resque:1-x-stable.

davetron5000 · 2016-05-24T19:30:27Z

Hmmm. I'm really confused now. Here's what I did:

> git checkout resque-redis-interface
> git reset --hard bcdaa1c
> git fetch resque # which is the canonical repo
> git rebase resque/1-x-stable
> # fix a conflict in multiple.rb
> git push --force origin resque-redis-interface

Not sure what happened. Any ideas? Sorry I usually don't mess up with git like this :(

coveralls · 2016-05-24T19:31:01Z

Coverage increased (+48.4%) to 82.413% when pulling 0cc29de on stitchfix:resque-redis-interface into 1deabd9 on resque:1-x-stable.

corincerami · 2016-05-28T14:54:52Z

@davetron5000 it seems like what happened is you somehow pulled in the changes from master that weren't on 1-x-stable yet. There's a few solution I can see.

You could clean up the commits manually that don't belong in this PR.
We could merge it into 1-x-stable since those other changes have already been merged into master anyway, and we're going to cherry-pick your stuff to the right place.
You could reopen the pull request and point it to master with the proper commits in it.

I feel a little uneasy about 2, but perhaps @steveklabnik has an opinion.

I want to write code to access information about what's going on in Resque, but the code needs to work for multiple Resque instances in the same Ruby VM. Because `Resque.redis` is global, it is very difficult (impossible in some cases) to use the Resque API directly. Provide an API that does not rely on a global variable that encapsulates all the ways in which Resque interacts with Redis, namely the names of keys and what sort of data structure is expected in those keys. Consider a call like this: ```ruby decode redis.lpop("queue:#{queue}") ``` This should mean "decode the job on queue `queue`", but it actually means "decode whatever is in redis under the key `"queue:#{queue}"` which just so happens to be how we store queues, but don't worry about that right now, just go in Redis and do it". With this PR, it turns into this: ```ruby decode(@data_store.pop_from_queue(queue)) ``` which is saying "get me the job in queue `queue`, however that's done, and decode it. Which means that someone _else_ can do this without knowing how to construct the redis key for queue. And because that knowledge is now centralized in one class (`DataStore`) instead of littered throughout the codebase, one could perform these operations on multiple resque queues from the same Ruby VM, e.g. for monitoring: ```ruby resques = { www: '10.0.3.4:2345', admin: '10.1.4.5:8765', ops: '10.1.4.5:8766', } data_stores = Hash[resques.map { |name,location| [name,Resque::DataStore.new(Redis.new(location))] }] data_stores[:www].num_failed # => how many are failed in www's Resque data_stores[:admin].num_failed # => what about admin? stuck_workers = data_stores[:ops].workers.select { |worker| data_stores[:ops].worker_start_time(worker) > 1.hour.ago } ``` And so forth. This is not an ideal design, but it solves the problem without breaking backwards compatibility and is better than what exists now, since it at least centralizes how Resque's data structures are stored in Redis. It could also, in theory, allow a different backing store than Redis. I hacked a `concerning` concept to demonstrate which calls were relevant to what—this could be split into further classes. It's also possible that versions of the major objects (`Resque`, `Worker`, and `Job`) could be created to not use a global for `redis`, but that is for another day.

davetron5000 · 2016-05-28T15:32:31Z

@chrisccerami OK, I think I fixed it. Your confirmation that I pulled in stuff from the wrong branch was helpful. I did a git reset --hard resque/1-x-stable, then cherry-picked my comment, then yours, and now hopefully it's clean :)

coveralls · 2016-05-28T15:33:20Z

Coverage increased (+2.5%) to 36.496% when pulling 8df3f42 on stitchfix:resque-redis-interface into 1deabd9 on resque:1-x-stable.

corincerami · 2016-05-28T15:47:39Z

This looks good to me. I'm going to merge this PR into 1-x-stable now, and then I'll cherry-pick it into master. I believe we're cutting a new release in a couple of days, so I may wait to move it into master until after that point, and this will be in the following release.

davetron5000 · 2016-05-29T15:49:50Z

Awesome, thank you for getting this through!!

Dave

Sent from my iPad

On May 28, 2016, at 11:47 AM, Chris C Cerami notifications@github.com wrote:

Merged #1210.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

fw42 · 2016-06-08T17:21:38Z

This looks very helpful to us (at Shopify) and has been bugging me for a while. Thanks for this contribution!

davetron5000 reviewed Apr 27, 2014
View reviewed changes

davetron5000 reviewed May 2, 2014
View reviewed changes

davetron5000 force-pushed the resque-redis-interface branch from 372fc85 to b29309a Compare November 5, 2015 14:26

davetron5000 force-pushed the resque-redis-interface branch from b29309a to d066f85 Compare January 15, 2016 18:56

davetron5000 closed this Jan 15, 2016

davetron5000 reopened this Jan 22, 2016

davetron5000 force-pushed the resque-redis-interface branch from d066f85 to 5143054 Compare February 10, 2016 13:38

davetron5000 force-pushed the resque-redis-interface branch from 5143054 to 3cbbcd8 Compare March 10, 2016 12:46

corincerami reviewed Apr 2, 2016
View reviewed changes

davetron5000 force-pushed the resque-redis-interface branch 2 times, most recently from bcdaa1c to 0c0f3f8 Compare May 24, 2016 19:27

davetron5000 force-pushed the resque-redis-interface branch from 0c0f3f8 to 0cc29de Compare May 24, 2016 19:29

davetron5000 and others added 2 commits May 28, 2016 11:31

Minor fixes to accomodate new DataStore class

8df3f42

davetron5000 force-pushed the resque-redis-interface branch from 0cc29de to 8df3f42 Compare May 28, 2016 15:31

corincerami merged commit 9f99e10 into resque:1-x-stable May 28, 2016

dylanahsmith mentioned this pull request Jun 16, 2016

1-x-stable / master #1473

Closed

Create a class between Resque classes and what they do in redis #1210

Create a class between Resque classes and what they do in redis #1210

Conversation

davetron5000 commented Apr 27, 2014

Problem

Solution

davetron5000 Apr 27, 2014

Choose a reason for hiding this comment

davetron5000 commented Apr 27, 2014

yaauie commented Apr 28, 2014

davetron5000 commented May 2, 2014

davetron5000 May 2, 2014

Choose a reason for hiding this comment

davetron5000 commented Jun 12, 2014

steveklabnik commented Nov 5, 2015

davetron5000 commented Nov 5, 2015

steveklabnik commented Nov 11, 2015

steveklabnik commented Jan 15, 2016

steveklabnik commented Jan 15, 2016

davetron5000 commented Jan 15, 2016

davetron5000 commented Jan 15, 2016

davetron5000 commented Jan 22, 2016

steveklabnik commented Jan 22, 2016

steveklabnik commented Feb 4, 2016

corincerami commented Feb 10, 2016

davetron5000 commented Feb 10, 2016

steveklabnik commented Mar 10, 2016

davetron5000 commented Mar 10, 2016

steveklabnik commented Mar 10, 2016

corincerami Apr 2, 2016

Choose a reason for hiding this comment

corincerami commented Apr 2, 2016

corincerami commented Apr 2, 2016

corincerami commented Apr 2, 2016

corincerami commented Apr 2, 2016

corincerami commented Apr 2, 2016

davetron5000 commented Apr 6, 2016

corincerami commented Apr 7, 2016

corincerami commented May 21, 2016

davetron5000 commented May 23, 2016

corincerami commented May 23, 2016

davetron5000 commented May 24, 2016

coveralls commented May 24, 2016 • edited Loading

davetron5000 commented May 24, 2016

coveralls commented May 24, 2016 • edited Loading

corincerami commented May 28, 2016

davetron5000 commented May 28, 2016

coveralls commented May 28, 2016 • edited Loading

corincerami commented May 28, 2016

davetron5000 commented May 29, 2016

fw42 commented Jun 8, 2016

coveralls commented May 24, 2016 •

edited

Loading

coveralls commented May 24, 2016 •

edited

Loading

coveralls commented May 28, 2016 •

edited

Loading