Skip to content

fix(replication): add proper handling for pool restarts during replication#915

Draft
meskill wants to merge 1 commit intomainfrom
fix/replication-pool-shutdown
Draft

fix(replication): add proper handling for pool restarts during replication#915
meskill wants to merge 1 commit intomainfrom
fix/replication-pool-shutdown

Conversation

@meskill
Copy link
Copy Markdown
Contributor

@meskill meskill commented Apr 20, 2026

No description provided.

port = addr.port,
database = %addr.database_name,
user = %addr.user,
"pool offline: connection pool shut down"
Copy link
Copy Markdown
Collaborator

@levkk levkk Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to log that. This "shutdown" isn't actually shutdown, it just destroys this object. We replace it with a brand new one atomically. This could make people think we are shutting down causing panic :D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intention was to add more logs that will help with tracing the similar issues.

I can drop it or change the severity/message

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup makes sense. You can set this to trace level for sure.

// Validate all tables support replication before committing to
// what can be a multi-hour copy. A table with no primary key or
// unique replica-identity index cannot be replicated correctly.
for tables in self.tables.values() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, we should of done it long time ago.

/// Check that the table supports replication.
///
/// Requires at least one column with a replica identity flag. Tables with
/// REPLICA IDENTITY FULL or NOTHING have no identity columns and fail here
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would double check that. If this is true, we need to have a special query to detect REPLICA IDENTITY FULL and use that as the key.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, the added tests validate that. I'll investigate if we can identify this actually

/// the copy. Instead, only the cluster reference inside the existing
/// publisher is updated so that subsequent pool.get() calls target the
/// live pool rather than a stale, potentially-offline one.
pub(crate) async fn refresh_before_replicate(&mut self) -> Result<(), Error> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have code to do this already somewhere. If not, we should re-use this function wherever we run these 3 statements.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 17 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...og/src/backend/replication/logical/orchestrator.rs 0.00% 7 Missing ⚠️
...nd/replication/logical/publisher/publisher_impl.rs 0.00% 6 Missing ⚠️
pgdog/src/backend/databases.rs 25.00% 3 Missing ⚠️
pgdog/src/admin/copy_data.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@levkk levkk linked an issue Apr 21, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Resharding] pool is shut down

2 participants