-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Que 1.x design compatibility #46
Comments
Hi @bgentry Thanks for the thought. Actually, Rihanna's design differs slightly from Que 0.x in that it already has the concept of a single process (and pg connection) that dispatches jobs to multiple workers. So in that sense it already works similarly to Que 1.x and can have many more jobs running concurrently than there are active database connections. The next stage for Rihanna in my mind is one step further than Que. It is an architecture that is made possible via distributed Erlang, and that is to have a cluster-wide dispatcher. This enables enormous throughput improvements because it removes lock contention from the database entirely. In the imagined future model, you have one lock which is the global consumer lock. Only one connection (cluster) may hold this lock at any one time. The global consumer is a singleton within an Erlang cluster. Once it has the lock, it can consume jobs via polling/NOTIFY locklessly since it is the only consumer. It can dispatch these jobs within the Erlang cluster to workers as appropriate. You can imagine this single consumer reading jobs in batches of, say, 1,000 at a time, and dispatching jobs from the in-memory buffer to processes as necessary to fulfil some sort of max concurrency limit. Each process is ephemeral and dies once the job is complete (efficient garbage collection) and sends back a message to the dispatcher which is able to remove the jobs from the database table. I believe there are staggering performance gains possible with this model (I suspect several orders of magnitude) since by far the largest performance bottleneck is database lock contention. The only downside is that it requires the worker nodes to be in the same Erlang cluster. |
Hmm interesting design idea. The other caveat I can see with it is it would effectively eliminate many deployment setups from being able to use this addon, as it's not compatible with Heroku or other infrastructure where nodes directly communicate as part of a cluster. Not suggesting you should base your design on that so much as highlighting that it will limit the target audience a bit. Thanks for explaining the current architecture in more detail! ✌️ |
Correct, it would not be compatible with Heroku. I would suggest these users can fallback to the old method which does not require a cluster and will continue to work just fine. |
The number of reductions before GC is (or was) 2000 so you'd need much smaller batches to not hit GC in this hot path. I suspect the GC here isn't the bottleneck, so larger batches are fine. |
@lpil I meant GC in the process that runs one particular job. However agree you are right that this is not the hot path. |
Que v1.x (currently in beta) has had significant design changes over the v0.x design. The changes are summarized in the changelog
The most relevant portion is this:
There are some major benefits to these design changes (thanks @chanks 👏), in particular the removal of the requirement to open a dedicated connection and hold it for the entire duration of working a job. The v4 schema migration also illustrates the underlying changes to the Postgres schema that enable these upgrades.
Have you looked into this or thought about what might be involved in migrating Rihanna to this design?
The text was updated successfully, but these errors were encountered: