Skip to content

feat(postgres): Change the Postgres Adapter to be Partition Aware#591

Merged
george-sentry merged 24 commits intomainfrom
george/push-taskbroker/partition-store-by-partition
Apr 14, 2026
Merged

feat(postgres): Change the Postgres Adapter to be Partition Aware#591
george-sentry merged 24 commits intomainfrom
george/push-taskbroker/partition-store-by-partition

Conversation

@george-sentry
Copy link
Copy Markdown
Member

@george-sentry george-sentry commented Apr 10, 2026

Linear

Completes STREAM-868

Description

Currently, taskworkers pull tasks from taskbrokers via RPC. This approach works, but has some drawbacks. Therefore, we want taskbrokers to push tasks to taskworkers instead. Read this page on Notion for more information.

We are also moving from SQLite to Postgres. Now, brokers will share a single store with multiple other brokers. To prevent contention between brokers and unexpected results due to conflicting upkeep tasks, we will make each broker only touch tasks that came from the Kafka partitions it's currently responsible for.

evanh and others added 21 commits January 13, 2026 16:39
This adds a postgres storage adapter for the taskbroker, as well as providing a way to choose
between the adapters in the configuration. This adapter will also work with AlloyDB.

In postgres, the keyword `offset` is reserved, so that column is called `kafka_offset` in the PG
tables and converted to `offset`.

The tests were updated to run with both the SQLite and Postgres adapter using the rstest crate. The
`create_test_store` function was updated to be the standard for all tests, and to allow choosing
between a SQLite and Postgres DB.

A `remove_db` function was added to the trait and the existing adapters, since the tests create a
unique PG database on every run that should be cleaned up.

The `create_test_store` function was updated to be the standard for all tests, and to allow choosing
between an SQLite and Postgres DB.
Have the postgres adapter only fetch and do upkeep on activations that are part of the partition
that the consumer is assigned. The broker can still update tasks outside its partitions, in case a
worker is connected to a broker that is then rebalanced. Change the consumer to pass the partitions
to the store whenever partitions are assigned.

This was originally tested with PARTITION BY, but that requires manually keeping track of the
partition tables which isn't a desired behaviour.
…eorge/push-taskbroker/partition-store-by-partition
@george-sentry george-sentry force-pushed the george/push-taskbroker/partition-store-by-partition branch from 896acfe to 895e119 Compare April 10, 2026 21:56
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Apr 10, 2026

@george-sentry george-sentry marked this pull request as ready for review April 10, 2026 22:06
@george-sentry george-sentry requested a review from a team as a code owner April 10, 2026 22:06
Comment thread src/store/postgres_activation_store.rs Outdated
Comment thread src/store/postgres_activation_store.rs
Comment thread src/store/postgres_activation_store.rs Outdated
Comment thread src/store/postgres_activation_store.rs
Comment thread src/kafka/consumer.rs
}
query_builder.push(")");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty partition list causes queries to match all rows

High Severity

add_partition_condition silently becomes a no-op when partitions is empty, causing all partition-scoped queries (upkeep, counts, fetches) to operate on ALL rows instead of NO rows. Since the upkeep task in upkeep.rs runs independently of the consumer lifecycle and shares the same store, after a partition revocation (which calls assign_partitions(vec![])) the upkeep loop continues running and will modify/delete/count tasks belonging to other brokers' partitions. This directly undermines the PR's goal of partition isolation.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f97d2ad. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get the same behavior before partitions are assigned, since the list will be empty in that case too. Is this desired?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will break anything, just slow the queries down briefly. I'm inclined to say it's OK. Another option would be to do checks for the partitions and not run the queries if there are no partitions assigned but I don't know if that would be better.

Copy link
Copy Markdown
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

…eorge/push-taskbroker/partition-store-by-partition
Comment thread src/store/postgres_activation_store.rs
Comment thread src/store/postgres_activation_store.rs
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 51c1889. Configure here.

Comment thread src/kafka/consumer.rs
tpl == revoked,
"Revoked TPL should be equal to the subset of TPL we're consuming from"
);
activation_store.assign_partitions(vec![]).unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partitions cleared before actors shutdown causes unscoped queries

Medium Severity

assign_partitions(vec![]) is called before handles.shutdown() in both the Revoke and Shutdown handlers. Since add_partition_condition is a no-op when the partition list is empty, any upkeep queries still running during the up-to-4-second shutdown window will execute without partition filtering, operating on ALL rows in the shared database. This defeats partition isolation and could cause issues like duplicate deadletter messages from handle_failed_tasks if another broker is concurrently assigned those partitions. Swapping the order — shutting down actors first, then clearing partitions — would eliminate this race window.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 51c1889. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evan thinks this is fine.

Comment thread src/kafka/consumer.rs
Comment on lines 396 to 401
tpl == revoked,
"Revoked TPL should be equal to the subset of TPL we're consuming from"
);
activation_store.assign_partitions(vec![]).unwrap();
handles.shutdown(CALLBACK_DURATION).await;
metrics::gauge!("arroyo.consumer.current_partitions").set(0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Clearing the partition filter via assign_partitions(vec![]) before handles.shutdown() creates a race condition where upkeep tasks can operate on all partitions, not just assigned ones.
Severity: HIGH

Suggested Fix

The partition filter should be cleared after the actor handles have been successfully shut down. Move the activation_store.assign_partitions(vec![]).unwrap(); call to after the handles.shutdown(CALLBACK_DURATION).await; call. This ensures that no upkeep operations can run in the intermediate state where the broker has no assigned partitions but its tasks are still active.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/kafka/consumer.rs#L396-L401

Potential issue: During partition revocation, the code calls
`activation_store.assign_partitions(vec![])` to clear the partition list before waiting
for actor handles to shut down via `handles.shutdown(CALLBACK_DURATION)`. An independent
upkeep task runs periodically. If this task executes during the shutdown window, it will
operate with an empty partition list. Because the `add_partition_condition()` method
omits the partition filter when the list is empty, upkeep queries like
`handle_claim_expiration` will run against all partitions in the database. This can
cause a broker to incorrectly modify tasks that have been reassigned to another broker,
violating partition isolation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evan thinks this is fine.

@george-sentry george-sentry merged commit e7783d9 into main Apr 14, 2026
32 of 33 checks passed
@george-sentry george-sentry deleted the george/push-taskbroker/partition-store-by-partition branch April 14, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants