fix(replication): add proper handling for pool restarts during replication#915
fix(replication): add proper handling for pool restarts during replication#915
Conversation
| port = addr.port, | ||
| database = %addr.database_name, | ||
| user = %addr.user, | ||
| "pool offline: connection pool shut down" |
There was a problem hiding this comment.
You don't want to log that. This "shutdown" isn't actually shutdown, it just destroys this object. We replace it with a brand new one atomically. This could make people think we are shutting down causing panic :D
There was a problem hiding this comment.
the intention was to add more logs that will help with tracing the similar issues.
I can drop it or change the severity/message
There was a problem hiding this comment.
Yup makes sense. You can set this to trace level for sure.
| // Validate all tables support replication before committing to | ||
| // what can be a multi-hour copy. A table with no primary key or | ||
| // unique replica-identity index cannot be replicated correctly. | ||
| for tables in self.tables.values() { |
There was a problem hiding this comment.
This is great, we should of done it long time ago.
| /// Check that the table supports replication. | ||
| /// | ||
| /// Requires at least one column with a replica identity flag. Tables with | ||
| /// REPLICA IDENTITY FULL or NOTHING have no identity columns and fail here |
There was a problem hiding this comment.
I would double check that. If this is true, we need to have a special query to detect REPLICA IDENTITY FULL and use that as the key.
There was a problem hiding this comment.
it is, the added tests validate that. I'll investigate if we can identify this actually
| /// the copy. Instead, only the cluster reference inside the existing | ||
| /// publisher is updated so that subsequent pool.get() calls target the | ||
| /// live pool rather than a stale, potentially-offline one. | ||
| pub(crate) async fn refresh_before_replicate(&mut self) -> Result<(), Error> { |
There was a problem hiding this comment.
I think we have code to do this already somewhere. If not, we should re-use this function wherever we run these 3 statements.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
No description provided.