-
Notifications
You must be signed in to change notification settings - Fork 664
Improve database fallbacks #4316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
55b37b5
to
9561bbf
Compare
comments should all be addressed. I've removed the warning for now. we can still add the metrics thing you mentioned later. |
Could you add a metric as part of this PR? Otherwise we have no visibility when this happens. |
|
||
fn db_read_prefer_primary(&self) -> Result<DieselPooledConn<'_>, PoolError> { | ||
match ( | ||
self.app().primary_database.get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just noticed that this returns a read/write connection, not a read-only one. We should make sure connections returned by this method are read-only even when using the primary database, otherwise we might start relying on those being read/write accidentally, causing bugs during failover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we could do is store a ReadOnlyConnection
unit struct in the connection extensions, indicating whether that connection is currently read-only (unit struct in the extensions) or not. Then, whenever we get a connection from the database we check whether the presence of the struct is correct (present for a RO conn, missing for a RW conn), and otherwise issue the SQL query to switch the connection mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other notes that come through mind:
- We need to make sure this doesn't get into an inconsistent state, so we should also have a
LastSwitchFailed
or something to indicate whether the previous attempt to switch between RO and RW failed, indicating it should be performed unconditionally in the next attempt. - This could replace the "read-only" section in the pool.
- Do we need to cache whether the connection is RW or RO, or can we just send the query to switch the connection mode every time we checkout the connection from the pool? That could be way simpler and less failure prone, but I don't know what the performance impact of it would be off my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while I generally like the idea of having a type safe way to distinguishing between read-only and writeable database connections, this request is increasing the complexity of this PR by multiple orders of magnitude. I guess ideally this sort of thing would be supported by Diesel itself, instead of us having to monkey patch something like this on top.
I'd appreciate it if we could merge this as is for now and then work on further improvements afterwards, instead of blocking this on achieving absolute perfection first. https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While in general I agree, for something this crucial to our reliability I'd like to get it right before we merge it.
src/db.rs
Outdated
Some(Ok(connection)) => Ok(connection), | ||
|
||
// Replica is not available, but primary might be available | ||
Some(Err(PoolError::UnhealthyPool)) => self.app().primary_database.get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also return a read-only connection.
20b9c27
to
633187e
Compare
…wnload` endpoint
633187e
to
a2e4234
Compare
Thinking more about it, this should also have a chaosproxy test to make sure the fallback actually happens as we want. |
feel free to continue this PR, if you want. my goal was to incrementally improve the codebase, but all these additional requirements just to get a small improvement merged make it impossibly hard to contribute. |
As suggested on Discord (see https://discord.com/channels/442252698964721669/835156566746595386/923210089085665381), this PR:
db_read()
anddb_write()
db_read_prefer_primary()
methodThis should improve our resilience a little bit in case of database issues.