DM-37249: make connection usage compatible with pgbouncer transaction-level pooling #763

TallJimbo · 2022-12-09T17:17:39Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2022-12-09T17:31:45Z

Codecov Report

Base: 85.22% // Head: 85.24% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (48683ac) compared to base (7963c4d).
Patch coverage: 94.96% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #763      +/-   ##
==========================================
+ Coverage   85.22%   85.24%   +0.02%     
==========================================
  Files         261      261              
  Lines       34555    34599      +44     
  Branches     5810     5851      +41     
==========================================
+ Hits        29448    29495      +47     
+ Misses       3862     3861       -1     
+ Partials     1245     1243       -2

Impacted Files	Coverage Δ
python/lsst/daf/butler/registry/tests/_registry.py	`98.78% <ø> (ø)`
...ython/lsst/daf/butler/registry/obscore/pgsphere.py	`88.00% <33.33%> (ø)`
...thon/lsst/daf/butler/registry/bridge/monolithic.py	`82.69% <70.00%> (-0.99%)`	⬇️
...n/lsst/daf/butler/registry/interfaces/_database.py	`87.37% <89.65%> (+0.29%)`	⬆️
python/lsst/daf/butler/registries/sql.py	`81.75% <100.00%> (+0.16%)`	⬆️
python/lsst/daf/butler/registry/attributes.py	`100.00% <100.00%> (ø)`
...thon/lsst/daf/butler/registry/collections/_base.py	`89.13% <100.00%> (+0.24%)`	⬆️
...n/lsst/daf/butler/registry/databases/postgresql.py	`82.05% <100.00%> (+2.37%)`	⬆️
...ython/lsst/daf/butler/registry/databases/sqlite.py	`86.39% <100.00%> (+0.18%)`	⬆️
.../butler/registry/datasets/byDimensions/_manager.py	`90.33% <100.00%> (+0.14%)`	⬆️
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

andy-slac

Looks good, small bunch of questions/comments.

andy-slac · 2022-12-12T19:20:40Z

python/lsst/daf/butler/registry/databases/postgresql.py

@@ -123,6 +123,7 @@ def transaction(
            if not self.isWriteable():
                with closing(self._session_connection.connection.cursor()) as cursor:
                    cursor.execute("SET TRANSACTION READ ONLY")
+                    cursor.execute("SET TIME ZONE 0")


At USDF default server timezone is UTC so this probably does nothing. I do not remember why do we need to set time zone explicitly?

I'm afraid I don't recall either, but I also don't want to assume all PostgreSQL instances we work with are configured like that.

python/lsst/daf/butler/registry/databases/postgresql.py

python/lsst/daf/butler/registry/interfaces/_database.py

python/lsst/daf/butler/registry/databases/postgresql.py

ktlim

Haven't read interfaces/_database.py, which is arguably the most important part, but some comments on the rest.

python/lsst/daf/butler/registry/datasets/byDimensions/_storage.py

python/lsst/daf/butler/registry/dimensions/table.py

ktlim · 2022-12-12T15:39:46Z

python/lsst/daf/butler/registry/databases/postgresql.py

                with closing(self._session_connection.connection.cursor()) as cursor:
                    cursor.execute("SET TRANSACTION READ ONLY")
+                    cursor.execute("SET TIME ZONE 0")
            else:
                with closing(self._session_connection.connection.cursor()) as cursor:
                    # Make timestamps UTC, because we didn't use TIMESTAMPZ for


Should we have included this with the longer-observation-reason schema change? Or (because it possibly requires modifying column values) would that have been riskier?

I think we probably could have included it. But I'd defer to @andy-slac on that. The real reason it wasn't included was because this had fallen off of everyone's radar completely (it predates even the big suite of schema changes we made to address visit-exposure relationships a while back). I've created DM-37322 to try to keep that from happening again.

ktlim · 2022-12-12T15:49:11Z

python/lsst/daf/butler/registry/databases/sqlite.py

+            # with our own that does, and tell SQLite to try to acquire a lock
+            # as soon as we start a transaction (this should lead to more
+            # blocking and fewer deadlocks).
+            if writeable:


Can't there still be transactions on read-only databases to ensure consistency between multiple SELECTs? Or is this not a problem?

It's not something we've ever needed, at least not outside the context of doing a write as well in the same transaction.

Last night I discovered that we actually do have a use case for read-only transactions for consistency: at butler startup we fetch a lot of information about dataset types and collections, and it's possible for those queries to be made inconsistent by a concurrent writer (and on main we don't have a transaction around that at all). I discovered this while running ci_cpp a lot to try to reproduce an unrelated race condition; I think it's unusual in that it tends to initialize Butler instances (for running PipelineTasks) while also registering dataset types and collections (when starting to run PipelineTasks from a different pipeline), all with SQLite.

So I've pushed a modified version of this commit that just emits BEGIN for read-only databases, rather than BEGIN IMMEDIATE; that should start a read-only transaction when followed (as it will be) by a SELECT query. The overall behavior should then be:

Multiple transactions in read-only databases can occur at the same time, as they only acquire a shared lock on the database file.

Any read-write transactions will block until any concurrent read-only or read-write transactions complete.

An ongoing read-write transaction will block any read-only transactions from starting.

And I've also wrapped those butler startup fetches in a transaction context.

TallJimbo

Just pushed two new commits and a number of fixups to address review comments. The new changes were spurred my one-off testing revealing that we're both using more connections than we need per Butler instance and interacting with the SQLAlchemy pool an awful lot during Butler startup (resulting in those ubiquitous rollbacks in the logs).

TallJimbo · 2022-12-12T21:02:16Z

python/lsst/daf/butler/registry/databases/postgresql.py

@@ -123,6 +123,7 @@ def transaction(
            if not self.isWriteable():
                with closing(self._session_connection.connection.cursor()) as cursor:
                    cursor.execute("SET TRANSACTION READ ONLY")
+                    cursor.execute("SET TIME ZONE 0")


I'm afraid I don't recall either, but I also don't want to assume all PostgreSQL instances we work with are configured like that.

TallJimbo · 2022-12-12T21:08:38Z

python/lsst/daf/butler/registry/databases/postgresql.py

                with closing(self._session_connection.connection.cursor()) as cursor:
                    cursor.execute("SET TRANSACTION READ ONLY")
+                    cursor.execute("SET TIME ZONE 0")
            else:
                with closing(self._session_connection.connection.cursor()) as cursor:
                    # Make timestamps UTC, because we didn't use TIMESTAMPZ for


I think we probably could have included it. But I'd defer to @andy-slac on that. The real reason it wasn't included was because this had fallen off of everyone's radar completely (it predates even the big suite of schema changes we made to address visit-exposure relationships a while back). I've created DM-37322 to try to keep that from happening again.

TallJimbo · 2022-12-13T01:50:57Z

python/lsst/daf/butler/registry/databases/sqlite.py

+            # with our own that does, and tell SQLite to try to acquire a lock
+            # as soon as we start a transaction (this should lead to more
+            # blocking and fewer deadlocks).
+            if writeable:


It's not something we've ever needed, at least not outside the context of doing a write as well in the same transaction.

python/lsst/daf/butler/registry/dimensions/table.py

python/lsst/daf/butler/registry/databases/postgresql.py

python/lsst/daf/butler/registry/interfaces/_database.py

TallJimbo · 2022-12-13T03:24:20Z

Just pushed two new commits ...

... and already reverted one of them, because apparently it can cause deadlocks in the tests when those are run in parallel (I suspect due to the artificial way we create tables in PostgreSQL for those tests, but it's still a problem). I'll look into a better fix tomorrow.

TallJimbo · 2022-12-13T17:38:06Z

Just pushed two new commits ...

... and already reverted one of them, because apparently it can cause deadlocks in the tests when those are run in parallel (I suspect due to the artificial way we create tables in PostgreSQL for those tests, but it's still a problem). I'll look into a better fix tomorrow.

Revert has now been replaced with a fixup commit that solves the test problem by switching from a transaction to just a connection session around table inspection.

Apparently "#region ... #endregion" is used to trigger some special linter behavior, and I don't want to fight the battle of learning to disable it.

This was probably a bug that we've never happened to trigger.

This addresses a long-standing TODO comment in Database.query (dating back to the introduction Session and our first real attempt to reduce connection contention). In some cases this might seem to be transforming lazy iteration over database results into aggressive fetching, via calls to fetchall(), but for at least PostgreSQL it seems we were already doing _some_ aggressive fetching before without realizing it, see e.g. https://docs.sqlalchemy.org/en/14/core/connections.html#using-server-side-cursors-a-k-a-stream-results So what we're really doing in the new fetchalls() here is invoking SQLAlchemy's client-side row processing more aggressively. I don't think that's a big change, and in exchange we can now guarantee that we never return iterators to user code that are responsible for closing a connection when the user is "done" with them - a problematic definition for any garbage-collected entity, but a particular problem for iterators that might not ever be exhausted. This also include some typing adjustments for SQLAlchemy objects. Those aren't actually checked yet, since typing support in SQLAlchemy won't be out until it's 2.0 release.

Temporary tables are now always created inside transactions, and via the temporary_table context-manager-returning method, which has been moved to Database from Session. Session has been removed. This should bring us into compatibility with pgbouncer transaction-level pooling - we won't care if a connection we hold actually gets multiplexed onto a different database connection, as long as it doesn't happen during transactions.

Each of the gazillion small queries we run a butler startup was previously grabbing a new connection and then returning it. That was probably wasn't too bad, since it should have just been getting them from the pool (for PostgreSQL) or opening a local file (for SQLite), but this should still be better in both cases. It should also remove a lot of rollbacks, since SQLAlchemy emits those whenever connections are returned to the pool. SQLAlchemy's docs say those shouldn't matter in terms of performance, but they're still noisy.

Now that we're careful to consistently use the same connection, we shouldn't ever need more.

Making SQLite temp tables inside transactions leads to very long exclusive locks during QG generation, causing timeouts in (at least) ci_cpp.

TallJimbo · 2022-12-15T16:36:11Z

@andy-slac , could you take a look at the commits I've added since your last review (i.e. the ones after "Add changelog entry.")?

andy-slac

Looks good, a couple of minor comments.

andy-slac · 2022-12-15T17:31:49Z

python/lsst/daf/butler/registry/databases/postgresql.py

@@ -97,7 +97,7 @@ def __init__(

    @classmethod
    def makeEngine(cls, uri: str, *, writeable: bool = True) -> sqlalchemy.engine.Engine:
-        return sqlalchemy.engine.create_engine(uri)
+        return sqlalchemy.engine.create_engine(uri, pool_size=1)


Would be interesting to detect situation when there is actually more than one connection (in case we break something in the future) on some future ticket. Maybe Pool events could be used for this.

andy-slac · 2022-12-15T17:40:36Z

python/lsst/daf/butler/registry/interfaces/_database.py

+            - `is_new` (`bool`) whether this is a new (outermost) transaction;
+            - `connection` (`sqlalchemy.engine.Connection`) the connection.


Double backticks for is_new and connection?

andy-slac · 2022-12-15T17:42:00Z

python/lsst/daf/butler/registry/interfaces/_database.py

+                "interrupt the active transaction context has been requested."
+            )
+            savepoint = savepoint or connection.in_nested_transaction()
+            trans: sqlalchemy.engine.Transaction


Does this need | None?

Yup, good catch. I'm sure MyPy is not complaining only because it sees all the SQLAlchemy types as Any (for now).

We now have public final session() and transaction() methods that return nothing and protected _session() and _transaction() methods that return connections, reducing the need to assert on self._session_connection being not None and avoiding returning things to users that they don't need.

TallJimbo force-pushed the tickets/DM-37249 branch 2 times, most recently from 40e4c87 to ecb445c Compare December 9, 2022 17:19

TallJimbo force-pushed the tickets/DM-37249 branch from ecb445c to 8d91dbb Compare December 9, 2022 17:53

TallJimbo marked this pull request as ready for review December 12, 2022 15:17

andy-slac approved these changes Dec 12, 2022

View reviewed changes

ktlim reviewed Dec 12, 2022

View reviewed changes

TallJimbo commented Dec 13, 2022

View reviewed changes

TallJimbo force-pushed the tickets/DM-37249 branch from 4110fcc to b4dbbea Compare December 13, 2022 03:20

TallJimbo force-pushed the tickets/DM-37249 branch from 44cf87a to b2761e7 Compare December 13, 2022 16:03

TallJimbo added 10 commits December 14, 2022 20:58

Reword comment to avoid pylance false-positive.

fe6b0ad

Apparently "#region ... #endregion" is used to trigger some special linter behavior, and I don't want to fight the battle of learning to disable it.

Set PostgreSQL timezone even in read-only transactions.

f13010c

This was probably a bug that we've never happened to trigger.

Comment on compatibility with pgbouncer transaction pooling.

bb77fc6

Remove workaround in favor of SQLAlchemy 1.4+ method call.

2d4e083

Add limit(1) to queries where only one result is needed.

1211931

Add changelog entry.

209dc9a

Set connection pool size to one for PostgreSQL.

0420af9

Now that we're careful to consistently use the same connection, we shouldn't ever need more.

TallJimbo force-pushed the tickets/DM-37249 branch from 2dd5c7f to 0420af9 Compare December 15, 2022 02:04

TallJimbo added 2 commits December 14, 2022 21:41

Use transaction to ensure consistency in Registry initialization.

1cafa6d

Only wrap PostgreSQL temp tables in transactions.

d56e250

Making SQLite temp tables inside transactions leads to very long exclusive locks during QG generation, causing timeouts in (at least) ci_cpp.

TallJimbo force-pushed the tickets/DM-37249 branch from 22b26e8 to d56e250 Compare December 15, 2022 03:26

andy-slac approved these changes Dec 15, 2022

View reviewed changes

TallJimbo force-pushed the tickets/DM-37249 branch from 2e93b9c to 2129149 Compare December 16, 2022 02:21

Only start read-only transactions in read-only SQLite databases.

48683ac

TallJimbo force-pushed the tickets/DM-37249 branch from 2129149 to 48683ac Compare December 16, 2022 02:31

TallJimbo merged commit 4bcfb78 into main Dec 16, 2022

TallJimbo deleted the tickets/DM-37249 branch December 16, 2022 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-37249: make connection usage compatible with pgbouncer transaction-level pooling #763

DM-37249: make connection usage compatible with pgbouncer transaction-level pooling #763

TallJimbo commented Dec 9, 2022 •

edited

codecov bot commented Dec 9, 2022 •

edited

andy-slac left a comment

andy-slac Dec 12, 2022

TallJimbo Dec 12, 2022

ktlim left a comment

ktlim Dec 12, 2022

TallJimbo Dec 12, 2022

ktlim Dec 12, 2022

TallJimbo Dec 13, 2022

TallJimbo Dec 15, 2022

TallJimbo left a comment

TallJimbo Dec 12, 2022

TallJimbo Dec 12, 2022

TallJimbo Dec 13, 2022

TallJimbo commented Dec 13, 2022

TallJimbo commented Dec 13, 2022

TallJimbo commented Dec 15, 2022 •

edited

andy-slac left a comment

andy-slac Dec 15, 2022

andy-slac Dec 15, 2022

andy-slac Dec 15, 2022

TallJimbo Dec 16, 2022

		- `is_new` (`bool`) whether this is a new (outermost) transaction;
		- `connection` (`sqlalchemy.engine.Connection`) the connection.

DM-37249: make connection usage compatible with pgbouncer transaction-level pooling #763

DM-37249: make connection usage compatible with pgbouncer transaction-level pooling #763

Conversation

TallJimbo commented Dec 9, 2022 • edited

Checklist

codecov bot commented Dec 9, 2022 • edited

Codecov Report

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktlim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo commented Dec 13, 2022

TallJimbo commented Dec 13, 2022

TallJimbo commented Dec 15, 2022 • edited

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo commented Dec 9, 2022 •

edited

codecov bot commented Dec 9, 2022 •

edited

TallJimbo commented Dec 15, 2022 •

edited