Reorganize the server so that each request gets its own DataSource,
instead of using a single DataSource for every request.
Currently, the behavior doesn't change because we do in fact use
the same DataSource for every request. But this paves the way
to having a pool of health-checked DB connections, while still
having each request work with a single connection.
Reviewed-by: Julie Qiu <firstname.lastname@example.org>
That approach will work well assuming a failure mode where one database becomes completely unavailable. If instead all the databases become 5% unavailable, e.g. because of a bug that overloads them all, they won't find a database that works all the time. Depending on the exact characteristics of the failure, how fast the instances get killed, and how fast they restart, that could end up making a minor outage into a very large one.
There's also some risk with pointing each instance to a single database; after an outage on one replica, all the frontend instances will be pointed at the other replica.
Since the failures we've seen so far are (AFAIK) total outages of one replica, this might be fine. But an approach that doesn't crash would be better in all respects, except perhaps for development ease.