Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-37249-v24: backport transaction-level-pooling compatibility to v24 #771

Merged
merged 15 commits into from
Jan 14, 2023

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Jan 12, 2023

Checklist

  • ran Jenkins

Apparently "#region ... #endregion" is used to trigger some special
linter behavior, and I don't want to fight the battle of learning to
disable it.
This was probably a bug that we've never happened to trigger.
@TallJimbo TallJimbo changed the base branch from main to v24.0.x January 12, 2023 19:25
This addresses a long-standing TODO comment in Database.query (dating
back to the introduction Session and our first real attempt to reduce
connection contention).  In some cases this might seem to be
transforming lazy iteration over database results into aggressive
fetching, via calls to fetchall(), but for at least PostgreSQL it seems
we were already doing _some_ aggressive fetching before without
realizing it, see e.g.

https://docs.sqlalchemy.org/en/14/core/connections.html#using-server-side-cursors-a-k-a-stream-results

So what we're really doing in the new fetchalls() here is invoking
SQLAlchemy's client-side row processing more aggressively.   I don't
think that's a big change, and in exchange we can now guarantee that
we never return iterators to user code that are responsible for closing
a connection when the user is "done" with them - a problematic
definition for any garbage-collected entity, but a particular problem
for iterators that might not ever be exhausted.

This also include some typing adjustments for SQLAlchemy objects.
Those aren't actually checked yet, since typing support in SQLAlchemy
won't be out until it's 2.0 release.
Temporary tables are now always created inside transactions, and via
the temporary_table context-manager-returning method, which has been
moved to Database from Session.  Session has been removed.

This should bring us into compatibility with pgbouncer
transaction-level pooling - we won't care if a connection we hold
actually gets multiplexed onto a different database connection, as long
as it doesn't happen during transactions.
Each of the gazillion small queries we run a butler startup was
previously grabbing a new connection and then returning it.  That was
probably wasn't too bad, since it should have just been getting them
from the pool (for PostgreSQL) or opening a local file (for SQLite),
but this should still be better in both cases.

It should also remove a lot of rollbacks, since SQLAlchemy emits those
whenever connections are returned to the pool.  SQLAlchemy's docs
say those shouldn't matter in terms of performance, but they're still
noisy.
Now that we're careful to consistently use the same connection, we
shouldn't ever need more.
Making SQLite temp tables inside transactions leads to very long
exclusive locks during QG generation, causing timeouts in (at least)
ci_cpp.
We now have public final session() and transaction() methods that
return nothing and protected _session() and _transaction() methods
that return connections, reducing the need to assert on
self._session_connection being not None and avoiding returning things
to users that they don't need.
@codecov
Copy link

codecov bot commented Jan 13, 2023

Codecov Report

Base: 84.52% // Head: 84.56% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (c517410) compared to base (dde7253).
Patch coverage: 95.50% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           v24.0.x     #771      +/-   ##
===========================================
+ Coverage    84.52%   84.56%   +0.03%     
===========================================
  Files          243      243              
  Lines        31854    31903      +49     
  Branches      5428     5464      +36     
===========================================
+ Hits         26926    26978      +52     
+ Misses        3749     3748       -1     
+ Partials      1179     1177       -2     
Impacted Files Coverage Δ
python/lsst/daf/butler/registry/tests/_registry.py 99.07% <ø> (ø)
...thon/lsst/daf/butler/registry/bridge/monolithic.py 83.49% <70.00%> (-1.05%) ⬇️
...n/lsst/daf/butler/registry/interfaces/_database.py 87.71% <89.65%> (+0.49%) ⬆️
python/lsst/daf/butler/registries/sql.py 81.23% <100.00%> (+0.15%) ⬆️
python/lsst/daf/butler/registry/attributes.py 100.00% <100.00%> (ø)
...thon/lsst/daf/butler/registry/collections/_base.py 87.74% <100.00%> (+0.32%) ⬆️
...n/lsst/daf/butler/registry/databases/postgresql.py 81.46% <100.00%> (+2.27%) ⬆️
...ython/lsst/daf/butler/registry/databases/sqlite.py 84.61% <100.00%> (+0.21%) ⬆️
.../butler/registry/datasets/byDimensions/_manager.py 90.06% <100.00%> (+0.20%) ⬆️
.../butler/registry/datasets/byDimensions/_storage.py 84.54% <100.00%> (+0.14%) ⬆️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TallJimbo TallJimbo merged commit 01021c5 into v24.0.x Jan 14, 2023
@TallJimbo TallJimbo deleted the tickets/DM-37249-v24 branch January 14, 2023 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant