-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v4 migration: App level notifications, queue status tracking, pause/resume #301
Conversation
053c288
to
b69cc03
Compare
Nice, I made some race conditions 😆 Well actually they appear to be just in the tests as a result of the |
b69cc03
to
16b2f05
Compare
e2b098b
to
59acac6
Compare
55d08f6
to
5396448
Compare
@brandur alright, I think your fix in #311 did the trick on all the Some notes: we did get a single flaky failure on this run on the I have to break for food but will pick back up on those shortly. Feel free to review meanwhile if you're available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!!! Lots of useful new stuff in here. River's going to be looking pretty feature complete with this in ...
179cf03
to
1da690b
Compare
bf44a3b
to
93fa962
Compare
* Make job args non-null * Make job metadata non-null * Drop insert notification trigger and function as this will be done at the application level going forward. * Add 'pending' state
93fa962
to
6236fce
Compare
15cde97
to
2e27de9
Compare
There are just a few items left here, mainly the name of the pubsub topic(s) we'll use for job and queue controls. Please lmk your thoughts on all the unresolved items! 🙏 |
2e27de9
to
dc110d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are just a few items left here, mainly the name of the pubsub topic(s) we'll use for job and queue controls. Please lmk your thoughts on all the unresolved items! 🙏
Thanks for getting through all that! I added a couple more tiny comments, but looks solid.
Changelog looks great.
Let's get this shipped.
CHANGELOG.md
Outdated
- Add `pending` job state. This is currently unused, but will be used to build higher level functionality for staging jobs that are not yet ready to run (for some reason other than their scheduled time being in the future). Pending jobs will never be run or deleted and must first be moved to another state by external code. [PR #301](https://github.com/riverqueue/river/pull/301). | ||
- Queue status tracking, pause and resume. [PR #301](https://github.com/riverqueue/river/pull/301). | ||
|
||
A useful operational lever is the ability to pause and resume a queue without shutting down clients. In addition to pause/resume being a feature request from #54, as part of the work on River's UI it's been useful to list out the active queues so that they can be displayed and manipulated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the super nit, but mind linking up 54 since it won't be in the rendered changelog.
A useful operational lever is the ability to pause and resume a queue without shutting down clients. In addition to pause/resume being a feature request from #54, as part of the work on River's UI it's been useful to list out the active queues so that they can be displayed and manipulated. | |
A useful operational lever is the ability to pause and resume a queue without shutting down clients. In addition to pause/resume being a feature request from [#54](https://github.com/riverqueue/river/pull/54), as part of the work on River's UI it's been useful to list out the active queues so that they can be displayed and manipulated. |
@@ -7,6 +7,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
## [Unreleased] | |||
|
|||
⚠️ Version 0.5.0 contains a new database migration, version 4. This migration is backward compatible with any River installation running the v3 migration. Be sure to run the v4 migration prior to deploying the code from this release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to put together a little upgrade guide on the website because people's familiarity with the last time they migrated may have become long atrophied at this point (i.e. how to use the CLI, the fact that the CLI will need to be updated separately to know about the new migration, etc.).
8dcf9f4
to
1442de8
Compare
Instead of emitting notifications for all inserts with a database trigger, use application-level notification logic with a per-queue debouncer. This enables us to drastically reduce the amount of notifications emitted to 1 per inserting client per cooldown interval per queue. The `JobScheduler` has been reworked to utilize this same mechanism. In addition, the scheduler has been updated to use a "look ahead" concept, meaning any jobs that are scheduled before its next planned run should be preemptively marked as available (while preserving their `ScheduledAt` time). This makes jobs available for scheduled work sooner and without delay, with the caveat that a notification may not be emitted right when a job is scheduled. On queues without any ongoing insert activity, this could potentially increase the latency before a scheduled job gets picked up by a worker, which is a worthwhile tradeoff for the performance optimization we gain on high throughput queues. This can be counteracted if desired by lowering the client's polling interval.
This is used for both controlling jobs and controlling queues, so a more generalized name is appropriate.
1442de8
to
f5332c7
Compare
Small bit of follow up from #301.
Small bit of follow up from #301.
…mn to single value With the addition of #301, and more specifically the schema-based namespacing that it brings in around listen/notify, we're fully moving towards a world where the recommendation for running multiple Rivers in a single database is very definitive: isolate them by schema. The elector's had a long-standing parameter called "instance name" that's stored into the `river_leader` table, and which was prospectively going to be used for River namespacing, but something we never made use of it. In a world of schema isolation, each schema will have its own `river_leader` table, and no kind of additional namespacing is needed within the table. I was originally going to try and approach herein we drop `name` out of `river_leader` completely, but looking more closely, it's the table's primary key. We could add a new column that'd act as a primary key instead (e.g. imagine a boolean primary key column with a check constraint that makes sure it's always true), but nothing we'd add would be that much better. Instead, I elected to give `name` a default value of `default` (matching the previous default instance name), and add a check constraint verifying that it's always `default`, making it effectively a single row table. The nice part about this approach is that we can put these changes into the V4 migration in #301, and we won't require any additional changes in any future migration. With `name` now constrained to a single value, we can simplify all the `river_leader`-based queries by removing their name parameters, then remove instance name completely from elector code and drivers, giving us a thorough overall cleanup.
…mn to single value With the addition of #301, and more specifically the schema-based namespacing that it brings in around listen/notify, we're fully moving towards a world where the recommendation for running multiple Rivers in a single database is very definitive: isolate them by schema. The elector's had a long-standing parameter called "instance name" that's stored into the `river_leader` table, and which was prospectively going to be used for River namespacing, but something we never made use of it. In a world of schema isolation, each schema will have its own `river_leader` table, and no kind of additional namespacing is needed within the table. I was originally going to try and approach herein we drop `name` out of `river_leader` completely, but looking more closely, it's the table's primary key. We could add a new column that'd act as a primary key instead (e.g. imagine a boolean primary key column with a check constraint that makes sure it's always true), but nothing we'd add would be that much better. Instead, I elected to give `name` a default value of `default` (matching the previous default instance name), and add a check constraint verifying that it's always `default`, making it effectively a single row table. The nice part about this approach is that we can put these changes into the V4 migration in #301, and we won't require any additional changes in any future migration. With `name` now constrained to a single value, we can simplify all the `river_leader`-based queries by removing their name parameters, then remove instance name completely from elector code and drivers, giving us a thorough overall cleanup.
…mn to single value (#325) With the addition of #301, and more specifically the schema-based namespacing that it brings in around listen/notify, we're fully moving towards a world where the recommendation for running multiple Rivers in a single database is very definitive: isolate them by schema. The elector's had a long-standing parameter called "instance name" that's stored into the `river_leader` table, and which was prospectively going to be used for River namespacing, but something we never made use of it. In a world of schema isolation, each schema will have its own `river_leader` table, and no kind of additional namespacing is needed within the table. I was originally going to try and approach herein we drop `name` out of `river_leader` completely, but looking more closely, it's the table's primary key. We could add a new column that'd act as a primary key instead (e.g. imagine a boolean primary key column with a check constraint that makes sure it's always true), but nothing we'd add would be that much better. Instead, I elected to give `name` a default value of `default` (matching the previous default instance name), and add a check constraint verifying that it's always `default`, making it effectively a single row table. The nice part about this approach is that we can put these changes into the V4 migration in #301, and we won't require any additional changes in any future migration. With `name` now constrained to a single value, we can simplify all the `river_leader`-based queries by removing their name parameters, then remove instance name completely from elector code and drivers, giving us a thorough overall cleanup.
Tees up version `v0.5.0`, which mainly contains the changes in #301, but is notably because it also contains the first ever new database migration beyond the original line.
There's a lot in this PR, all of it somewhat related and dependent on the new v4 database migration.
v4 migration summary
river_job.args
non-null (there should be no null values in here anyway)river_job.metadata
non-null (there should be no null values in here anyway)finalized_at
constraintpending
job state. Currently unused, but will be used to build higher level functionality by staging jobs that are not yet ready to run (for some reason other than their scheduled time being in the future).river_queue
table for queue state tracking, pause/resume.I believe all of these are totally safe to perform on an actively running River installation with no interruptions. We should carefully consider this though. The removal of insert triggers does increase new job execution latency until a subsequent deploy of updated River clients with app-level notification logic, but this is a minor and temporary degradation.
Application level insert notifications
The initial design for River utilized a trigger on job insert that issued notifications (
NOTIFY
) so that listening clients could quickly pick up the work if they were idle. While this is good for lowering latency, it does have the side effect of emitting a large amount of notifications any time there are lots of jobs being inserted. This adds overhead, particularly to high-throughput installations.To improve this situation and reduce overhead in high-throughput installations, the notifications have been refactored to be emitted at the application level. A client-level debouncer ensures that these notifications are not emitted more often than they could be useful. If a queue is due for an insert notification (on a particular Postgres schema), the notification is piggy-backed onto the insert query within the transaction. While this has the impact of increasing insert latency for a certain percentage of cases, the effect should be small.
Additionally, the initial release of River did not properly scope notification topics within the global
LISTEN/NOTIFY
namespace. If two River installations were operating on the same Postgres database but within different schemas (search paths), their notifications would be emitted on a shared topic name. This is no longer the case and all notifications are prefixed with a{schema_name}.
string.Queue status tracking, pause and resume
A useful operational lever is the ability to pause and resume a queue without shutting down clients. In addition to pause/resume being a feature request from #54, as part of my work on River's UI I've encountered the need to list out the active queues so that they can be displayed and manipulated.
A new
river_queue
table is introduced in the migration for this purpose. Upon startup, every producer in each RiverClient
will make anUPSERT
query to the database to either register the queue as being active, or if it already exists it will instead bump the timestamp to keep it active. This query will be run periodically as long as theClient
is alive, even if the queue is paused. A separate query will delete/purge any queues which have not been active in awhile (currently fixed to 24 hours). This currently happens independently in each producer, but it really doesn't need to (see TODO below).This table provides a place to pause and resume a single queue by name, or all queues using the special
*
value. Each producer will watch for notifications on the relevantLISTEN/NOTIFY
topic unless operating in poll-only mode, in which case they will periodically poll for changes to this record in the database.TODOs
rivertype.JobStates
to includepending
stateCloses #54.