Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Client can monopolize server with poorly-defined job subject #12

Open
carlthuringer opened this issue Apr 27, 2016 · 2 comments
Open

Comments

@carlthuringer
Copy link
Contributor

We had an issue today where a mistake caused blocking jobs to be queued onto the same workflow. When the size of this workflow grew, it caused a resource contention issue on the table, and on the JVM as GC began to intensify. In the end, the entire server was monopolized by repeated calls to ScheduleNextNode for the same workflow.

While we cannot prevent clients from behaving badly and creating massive backlogs of workflows, we can reduce the potential for one job to monopolize the worker pool.

A co-worker found: https://github.com/mhenrixon/sidekiq-unique-jobs. What do you think?

@courtneyb
Copy link

I like this idea. Would we want to make the type of locking configurable by backbeat clients? I think we should make sure to include client_id as a unique arg so multiple clients couldn't step on each other with workflows like: { name: "update", subject: {record_uuid: some_uuid } }.

Since we're turning backbeat server into a gem, do you think there are clients that would prefer to add and configure 'sidekiq-unique-jobs' independently of backbeat? Or would we always include it and pass all the configuration through backbeat's configs?

@carlthuringer
Copy link
Contributor Author

The more I think about this, the more I think it's a step too far. If we have a performance issue around thousands of blocking jobs under one workflow, then we should address that performance issue rather than constructing arbitrary rules around who gets to use how much of the queue, dynamic partitioning, or other strategies for sharing a limited resource.

I'm considering waiting before implementing some heavy-handed resource protection. Sidekiq is really fast. Perhaps all we need is to enforce a timeout on client callbacks and address the performance issue with having thousands of blocking jobs on one workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants