Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a class between Resque classes and what they do in redis #1210

Merged
merged 2 commits into from
May 28, 2016

Conversation

davetron5000
Copy link
Contributor

Problem

I want to write code to access information about what's going on in Resque, but the code needs to work for multiple Resque instances in the same Ruby VM. Because Resque.redis is global, it is very difficult (impossible in some cases) to use the Resque API directly.

Solution

Provide an API that does not rely on a global variable that encapsulates all the ways in which Resque interacts with Redis, namely the names of keys and what sort of data structure is expected in those keys.

Consider a call like this:

decode redis.lpop("queue:#{queue}")

This should mean "decode the job on queue queue", but it actually means "decode whatever is in redis under the key "queue:#{queue}" which just so happens to be how we store queues, but don't worry about that right now, just go in Redis and do it".

With this PR, it turns into this:

decode(@data_store.pop_from_queue(queue))

which is saying "get me the job in queue queue, however that's done, and decode it.

Which means that someone else can do this without knowing how to construct the redis key for queue.

And because that knowledge is now centralized in one class (DataStore) instead of littered throughout the codebase, one could perform these operations on multiple resque queues from the same Ruby VM, e.g. for monitoring:

resques = {
    www: '10.0.3.4:2345',
  admin: '10.1.4.5:8765',
    ops: '10.1.4.5:8766',
}

data_stores = Hash[resques.map { |name,location|
  [name,Resque::DataStore.new(Redis.new(location))]
}]

data_stores[:www].num_failed # => how many are failed in www's Resque
data_stores[:admin].num_failed # => what about admin?
stuck_workers = data_stores[:ops].workers.select { |worker|
  data_stores[:ops].worker_start_time(worker) > 1.hour.ago
}

And so forth.

This is not an ideal design, but it solves the problem without breaking backwards compatibility and is better than what exists now, since it at least centralizes how Resque's data structures are stored in Redis. It could also, in theory, allow a different backing store than Redis.

I hacked a concerning concept to demonstrate which calls were relevant to what—this could be split into further classes. It's also possible that versions of the major objects (Resque, Worker, and Job) could be created to not use a global for redis, but that is for another day.

else
@redis = Redis::Namespace.new(:resque, :redis => server)
@data_store = Resque::DataStore.new(Redis::Namespace.new(:resque, :redis => server))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not married to this name, and am happy to change it if the overall concept is something that's desired

@davetron5000
Copy link
Contributor Author

I did not make any changes to resque-web as there appears to be no test coverage for it, and my ultimate goal is to use DataStore to make a better means of monitoring resque anyway.

@yaauie
Copy link
Member

yaauie commented Apr 28, 2014

yes, yes, yes. I love the concept. I'll dive into and review the pull-request in the next day or so.

@davetron5000
Copy link
Contributor Author

Updated this to start showing what different classes might look like. Will drop a few comments on the diff to explain more

@stats_access = StatsAccess.new(@redis)
end

def_delegators :@queue_access, :push_to_queue,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this class is an über-class that acts just like Resque.redis used to, thus maintaining backwards-compatibility.

The real "win" is that we've separated the various aspects of resque into different classes (which I will state now are all named horribly—please help me come up with better names):

  • How do I access jobs in the queues? QueueAccess
  • How do I find out about failed jobs? FailedQueueAccess
  • How do I manage the workers? Workers
  • How do I use the stats? StatsAccess

Each of these new classes could include the method_missing and respond_to? methods here and then be replaced for DataStore in the various resque classes. e.g. Resque could likely just use QueueAccess instead of DataStore; Worker could just use Workers.

Given all of that, a way forward could be:

  • do that, and add deprecation warnings in method_missing. This warnings would be triggered by third parties who were access Resque.redis directly. Probably a ton of this would exist
  • Remove method_missing as a breaking change
  • Allow replacement of any of the implementations of these classes. e.g. you could implement Workers to store worker metadata in a SQL database (or whatever)

I realize I'm hand-waving over a massive amount of work, but it would avoid rewriting the core of resque, I guess.

Selfishly, this still allows me access to many resque queues' metadata from one RubyVM, which is my immediate need.

@davetron5000
Copy link
Contributor Author

Here is an application that this change enables that would be difficult or impossible to do otherwise: https://github.com/stitchfix/resque-brain

Specific use of this class can be seen here:

https://github.com/stitchfix/resque-brain/blob/master/app/models/resque_instance.rb

@steveklabnik
Copy link
Member

Hey there! It's been a while, sorry about that.

@hoffmanc and I are going to be working on Resque again, but this PR needs a rebase. If you get the chance, would you mind

  1. reabasing, if you're up for it
  2. if you're not up for it, that's cool, just let us know so we can investigate

Thanks! / sorry 😦

@davetron5000
Copy link
Contributor Author

Rebased. Let me know if I can help. This code has been running in production since I opened this PR, FWIW, as part of https://github.com/stitchfix/resque-brain

@steveklabnik
Copy link
Member

I'm going to re-queue a build here, because Travis was just being weird.

@hoffmanc what do you think of this PR?

@steveklabnik
Copy link
Member

This has a merge conflict, would you mind rebasing please? Sorry for the delay in reviewing.

@steveklabnik
Copy link
Member

(Also, I'm trying to get people back on master Resque for the next release, are there any other patches which you're using in production?)

@davetron5000
Copy link
Contributor Author

I don't know if I can. I got some very strange conflicts and it looked there has been a ton of churn. Let me try squashing my commits and opening up a new PR

@davetron5000
Copy link
Contributor Author

OK, I think this PR is fundamentally incompatible with the Resque::Backend concept. I don't know if that solves my original problem, but I don't see how this PR can go forward in this state.

@davetron5000
Copy link
Contributor Author

I didn't realize master was not the main branch. Let me take a day or two to see if I can get this rebased

@davetron5000 davetron5000 reopened this Jan 22, 2016
@steveklabnik
Copy link
Member

Yeah, sorry. I am hoping to switch it soon, along with the next release.

@steveklabnik
Copy link
Member

Did we ever get this rebased?

@corincerami
Copy link
Member

I'm very 👍 this PR. Anything that makes the Ruby code we all use easier to read.

@davetron5000
Copy link
Contributor Author

OK, this is rebased. Seeing if the build passes. Assuming it does or I can fix any issues that come up, before merging this we should all agree that we are confident in the test coverage. I went to great lengths to make this an internal refactor that doesn't break any interfaces, but it's obviously a lot of code to move around.

@steveklabnik
Copy link
Member

This is still causing a lot of failures, and is somehow out of date again :(

@davetron5000
Copy link
Contributor Author

rebased

@steveklabnik
Copy link
Member

Future work will happen on master, yeah. Basically, I made the old master into a two-oh-is-cancelled branch to save all that work, set master to 1-x-stable, and force pushed.

end
end

class QueueAccess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these classes don't get their own files, since they are full fledged classes?

@corincerami
Copy link
Member

Aha! It seems like the timeouts are being caused because at some points in the tests, Resque.redis = is being passed a Resque::DataStore as an argument, and this case is never handled in the new definition of Resque.redis=. If you add

    when Resque::DataStore
      @data_store = server

to the case statement, the timeouts should go away. It seems like it was creating a new Redis::Namespace with the Resque::DataStore set as the redis argument, and then chaining those many levels down.

Fixing that should make it easier to figure out what is causing the other failures.

@corincerami
Copy link
Member

I was able to fix the failing spec around Redis.remove with the following change:

      def remove_from_failed_queue(index_in_failed_queue,failed_queue_name=nil)
        failed_queue_name ||= :failed
        hopefully_unique_value_we_can_use_to_delete_job = ""
        @redis.lset(failed_queue_name, index_in_failed_queue, hopefully_unique_value_we_can_use_to_delete_job)
        @redis.lrem(failed_queue_name, 1,                     hopefully_unique_value_we_can_use_to_delete_job)
      end

This was required because in Redis.remove, queue defaults to nil:

      def self.remove(id, queue = nil)
        check_queue(queue)
        data_store.remove_from_failed_queue(id, queue)
      end

Passing an explicit nil argument is different from not including the argument, so nil was used as the queue_name in remove_from_failed_queue instead of the default of :failed.

@corincerami
Copy link
Member

I believe the failure on line 250 of test/resque_test.rb is because DataStore#all_resque_keys should be:

    def all_resque_keys
      @redis.keys("*").map do |key|
        key.sub("#{Resque.redis.namespace}:", '')
      end
    end

rather than

    def all_resque_keys
      @redis.keys("*").map do |key|
        key.sub("#{redis.namespace}:", '')
      end
    end

@corincerami
Copy link
Member

The third and final failure I'm seeing is from line 158 of test/worker_test.rb, where previously it read:

Resque.redis.stubs(:get).raises(Redis::CannotConnectError)

it should now read:

Resque.data_store.redis.stubs(:get).raises(Redis::CannotConnectError)

or the equivalent:

Resque.redis.redis.stubs(:get).raises(Redis::CannotConnectError)

although I find the first one clearer.

@corincerami
Copy link
Member

@davetron5000 I've opened a PR to this branch to fix these test failures, stitchfix#1. If you have better ideas on how to fix these things and want to take a stab yourself, feel free to close that PR; just thought it might make things smoother for you.

@davetron5000
Copy link
Contributor Author

@chrisccerami thanks so much for your comments and time! Sorry I was on vacation or woud've responded sooner. If you have those improvements handy, push them up and I'll steal from you :)

Also, there's no reason for the classes being in one file other than that they were experimental and I wanted to get some feedback before doing too much. I can/will break them out into their own files.

@corincerami
Copy link
Member

@davetron5000 I've opened a PR to this PR 😜

stitchfix#1

@corincerami
Copy link
Member

@davetron5000 if you could merge stitchfix#1 into your branch and then rebase, I think this would be ok to merge.

@davetron5000
Copy link
Contributor Author

I think I've pushed the rebased branch, but GH is having issues so maybe the web UI isn't showing it?

@corincerami
Copy link
Member

Yeah, Travis also didn't pick up the changes, likely due to Githubs issues.

@davetron5000 davetron5000 force-pushed the resque-redis-interface branch 2 times, most recently from bcdaa1c to 0c0f3f8 Compare May 24, 2016 19:27
@davetron5000
Copy link
Contributor Author

ok, now looking at this I think I screwed this up. Let me try again

@coveralls
Copy link

coveralls commented May 24, 2016

Coverage Status

Coverage increased (+48.4%) to 82.413% when pulling 0cc29de on stitchfix:resque-redis-interface into 1deabd9 on resque:1-x-stable.

@davetron5000
Copy link
Contributor Author

Hmmm. I'm really confused now. Here's what I did:

> git checkout resque-redis-interface
> git reset --hard bcdaa1c
> git fetch resque # which is the canonical repo
> git rebase resque/1-x-stable
> # fix a conflict in multiple.rb
> git push --force origin resque-redis-interface

Not sure what happened. Any ideas? Sorry I usually don't mess up with git like this :(

@coveralls
Copy link

coveralls commented May 24, 2016

Coverage Status

Coverage increased (+48.4%) to 82.413% when pulling 0cc29de on stitchfix:resque-redis-interface into 1deabd9 on resque:1-x-stable.

@corincerami
Copy link
Member

@davetron5000 it seems like what happened is you somehow pulled in the changes from master that weren't on 1-x-stable yet. There's a few solution I can see.

  1. You could clean up the commits manually that don't belong in this PR.
  2. We could merge it into 1-x-stable since those other changes have already been merged into master anyway, and we're going to cherry-pick your stuff to the right place.
  3. You could reopen the pull request and point it to master with the proper commits in it.

I feel a little uneasy about 2, but perhaps @steveklabnik has an opinion.

davetron5000 and others added 2 commits May 28, 2016 11:31
I want to write code to access information about what's going on in
Resque, but the code needs to work for multiple Resque instances in the
same Ruby VM.  Because `Resque.redis` is global, it is very difficult
(impossible in some cases) to use the Resque API directly.

Provide an API that does not rely on a global variable that encapsulates
all the ways in which Resque interacts with Redis, namely the names of
keys and what sort of data structure is expected in those keys.

Consider a call like this:

```ruby
decode redis.lpop("queue:#{queue}")
```

This should mean "decode the job on queue `queue`", but it actually
means "decode whatever is in redis under the key `"queue:#{queue}"`
which just so happens to be how we store queues, but don't worry about
that right now, just go in Redis and do it".

With this PR, it turns into this:

```ruby
decode(@data_store.pop_from_queue(queue))
  ```
  which is saying "get me the job in queue `queue`, however that's done,
  and decode it.

  Which means that someone _else_ can do this without knowing how to
  construct the redis key for queue.

  And because that knowledge is now centralized in one class
  (`DataStore`) instead of littered throughout the codebase, one could
  perform these operations on multiple resque queues from the same Ruby
  VM, e.g. for monitoring:

  ```ruby
  resques = {
        www: '10.0.3.4:2345',
               admin: '10.1.4.5:8765',
                   ops: '10.1.4.5:8766',
  }

data_stores = Hash[resques.map { |name,location|
    [name,Resque::DataStore.new(Redis.new(location))]
}]

data_stores[:www].num_failed # => how many are failed in www's Resque
data_stores[:admin].num_failed # => what about admin?
stuck_workers = data_stores[:ops].workers.select { |worker|
    data_stores[:ops].worker_start_time(worker) > 1.hour.ago
}
```
And so forth.

This is not an ideal design, but it solves the problem without breaking
backwards compatibility and is better than what exists now, since it at
least centralizes how Resque's data structures are stored in Redis.  It
could also, in theory, allow a different backing store than Redis.

I hacked a `concerning` concept to demonstrate which calls were relevant
to what—this could be split into further classes.  It's also possible
that versions of the major objects (`Resque`, `Worker`, and `Job`) could
be created to not use a global for `redis`, but that is for another day.
@davetron5000
Copy link
Contributor Author

@chrisccerami OK, I think I fixed it. Your confirmation that I pulled in stuff from the wrong branch was helpful. I did a git reset --hard resque/1-x-stable, then cherry-picked my comment, then yours, and now hopefully it's clean :)

@coveralls
Copy link

coveralls commented May 28, 2016

Coverage Status

Coverage increased (+2.5%) to 36.496% when pulling 8df3f42 on stitchfix:resque-redis-interface into 1deabd9 on resque:1-x-stable.

@corincerami
Copy link
Member

This looks good to me. I'm going to merge this PR into 1-x-stable now, and then I'll cherry-pick it into master. I believe we're cutting a new release in a couple of days, so I may wait to move it into master until after that point, and this will be in the following release.

@corincerami corincerami merged commit 9f99e10 into resque:1-x-stable May 28, 2016
@davetron5000
Copy link
Contributor Author

Awesome, thank you for getting this through!!

Dave

Sent from my iPad

On May 28, 2016, at 11:47 AM, Chris C Cerami notifications@github.com wrote:

Merged #1210.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@fw42
Copy link
Member

fw42 commented Jun 8, 2016

This looks very helpful to us (at Shopify) and has been bugging me for a while. Thanks for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants