New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26302: Avoid long-lived connections in Database #482
Conversation
Database connections are now short-lived by default, they are created either for the duration of a query or transaction. Additionally `session()` context can be used for persistent connection that does not start transaction, this is needed to manage lifetime of temporary tables. Re-enabled connection pool use in database engines, this will need special care in fork-based mutiprocessing.
8e3bb2c
to
f7a6128
Compare
associated with a Session class. | ||
""" | ||
def __init__(self, db: Database): | ||
self._db = db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the Session
object hold its Connection
, instead of having Database
have an optional Connection
that's really associated with a particular Session
? I bet that would avoid some None
-checking, and it'd be a tiny step forward for thread/async safety (in that if we resolved other issues, there could be multiple parallel sessions associated with a single Database
).
Is the problem that we would have then have to move the transaction
interface here, too, and that's more disruptive (i.e. it's a big step towards your "solution 3")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would require full "solution 3", basically any method that needs a connection should be moved into Session class. I agree that having Connection here is the most natural and safer way which reduces unintentional sharing but it would be a big change in API and require manual session management in all methods that did not do it before.
# TODO: if _engine is used to make a table then it uses separate | ||
# connection and should not interfere with current transaction | ||
assert self._session_connection is None or not self._session_connection.in_transaction(), \ | ||
"Table creation interrupts transactions." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it may be that this already wasn't a problem; it was only with Oracle where I was certain that DDL statements would break transactions (and it may be that with Oracle, using a separate connection for DDL doesn't save you; I'm not sure). I left this assert in after we dropped Oracle because it seemed like a good idea to leave room for future databases (maybe even Oracle again) where that might be a problem, and without the check it would be very easy to write important code that would be hard to use with such a database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I beleive that in Postgres CREATE TABLE is transactional, but I know at least one other RDBMS (mysql) that does not support it, in mysql CREATE TABLE implicitly commits transaction. In general DDL statements are not required to be transaction-safe so it is probably worth keeping this assert just in case.
Co-authored-by: Jim Bosch <jbosch@astro.princeton.edu>
4a2fab6
to
683ceb2
Compare
Database connections are now short-lived by default, they are created
either for the duration of a query or transaction. Additionally
session()
context can be used for persistent connection that does not start
transaction, this is needed to manage lifetime of temporary tables.
Re-enabled connection pool use in database engines, this will need special
care in fork-based mutiprocessing.