Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resque dyno on Heroku. #917

Merged
merged 2 commits into from Nov 23, 2020
Merged

Add resque dyno on Heroku. #917

merged 2 commits into from Nov 23, 2020

Conversation

eddierubeiz
Copy link
Contributor

Just adding a worker dyno in the procfile, per https://devcenter.heroku.com/articles/ruby-resque-pool.

@eddierubeiz eddierubeiz self-assigned this Nov 21, 2020
@jrochkind
Copy link
Contributor

Nice. Note that even after we add worker to Procfile, if we don't actually want any workers (yet), we can just set the count to zero, eg heroku dyno:scale worker=0

The discussion on how we set worker counts makes me realize we'll have to rethink things -- currently we are saying to run (eg) 9 "regular" workers and 1 "on_demand_derivatives" worker on our single host for workers.

But under heroku we want to run a smaller standard-2x dyno, but have more than one of them. So we could say each standard-2x can run 1 'regular' and 1 'on-demand' and then scale out to 4 of those -- but that's now 50/50 allocation, which isn't the same.

We could have more than worker type -- "on_demand_derivatives_worker" which only runs workers on that queue, and there's only one dyno; and a "regular" worker which scales out. (Would have to change our resque_pool.yml -- probably have two of them, and use different ones in each type of dyno!)

Or we could change our resque_pool setup to say run one worker which does only "regular", and another worker which does on_demand_derivatives, mailers, default, meaning on_demand_derivatives if there's something in the queue, otherwise do the usual work.

We can talk this out if it doesn't make sense. We'll have to think about it. Not necessarily a pre-req for this PR, may result in future work.

@eddierubeiz
Copy link
Contributor Author

One really good reason to have the on-demand stuff on its own queue is that those are the only bg jobs that could be requested 24/7. The others are used only by staff and thus could be turned off nights and weekends.

@jrochkind
Copy link
Contributor

It's really hard to find this in resque docs, but I think if you tell a resque worker to process queues A,B,C, that means: First check queue A, if there are any jobs in it at all, do that job; if it's empty check B and do any job you find there; if B is empty check C and do any job you find there.

So if we start a resque worker with on_demand_derivatives, default (ignoring mailers for the sake of a clearer explanation), that means "If there are ANY on_demand_derivatives jobs in the queue, do them, until there are no more. But if it's empty you are free to work on default jobs too."

Right now, we have a (single) worker started that works only "on_demand_derivatives", if there's nothing in that queue (and usually there isn't), it just sits there idle, even if there's plenty of "default" work to be done.

So I don't think there's a downside of telling it to also work on default if there's no on_demand_derivatives work.

The main difference would be the proportion of workers working on each when there is both kinds of work at once. Let's say we have a bunch of people asking for on-demand-derivatives AND a bunch of ingest happenign, right now.

Currently, we'd have 9 jobs work on ingest, and 1 working on on-demand-derivatives.

But if we imagine instead we have (eg) 5 dynos, each of which has one default worker and one on_demand_derivatives, default worker, then we end up with 5 workers working on ingest, and 5 working on on-demand-derivatives. We would have changed the proportion of workers willing to work on-demand-derivatives, when there's a queue of both on-demand-dervatives and other (mainly ingest) work.

@eddierubeiz
Copy link
Contributor Author

Yeah -- I would definitely assign the "worker counts and assigning workers to different queues" to a separate issue and PR.

@jrochkind
Copy link
Contributor

For now I might just set heroku config ON_DEMAND_JOB_WORKER_COUNT to 0, to not have any workers working that, as we instead focus on ingest workers.

And we should set REGULAR_JOB_WORKER_COUNT to, I dunno, 2 or 3? That's part of what we'll experiment with, how high we can make this on a standard-2x dyno (or a standard-1x dyno even?) before we run out of RAM.

Do you want to go make this heroku config changes on our scihist-digicoll app?

@jrochkind jrochkind merged commit 1ace256 into master Nov 23, 2020
@jrochkind jrochkind deleted the add_resque_dyno branch November 23, 2020 17:56
@eddierubeiz
Copy link
Contributor Author

Done!

@eddierubeiz
Copy link
Contributor Author

(See https://dashboard.heroku.com/apps/scihist-digicoll/settings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants