New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit BEGIN/COMMIT statements for empty transactions #32647

Merged
merged 1 commit into from Aug 23, 2018

Conversation

@eugeneius
Member

eugeneius commented Apr 19, 2018

Fixes #17937.

If a transaction is opened and closed without any queries being run, we can safely omit the BEGIN and COMMIT statements, as they only exist to modify the connection's behaviour inside the transaction. This removes the overhead of those statements when saving a record with no changes, which makes workarounds like save if changed? unnecessary.

This implementation wraps the raw database connection in a proxy which buffers begin_transaction calls, and materializes them the next time the connection is used, or discards them if end_transaction is called first. This approach makes new connection usage inside the adapters behave correctly by default. Third party adapters can use the begin_transaction/end_transaction APIs to opt in to the new behaviour, but will still work even if they don't.

This implementation buffers transactions inside the transaction manager and materializes them the next time the connection is used. For this to work, the adapter needs to guard all connection use with a call to materialize_transactions. Because of this, adapters must opt in to get this new behaviour by implementing supports_lazy_transactions?.

If raw_connection is used to get a reference to the underlying database connection, the behaviour is disabled and transactions are opened eagerly, as we can't know how the connection will be used. However when the connection is checked back into the pool, we can assume that the application won't use the reference again and reenable lazy transactions. This prevents a single raw_connection call from disabling lazy transactions for the lifetime of the connection.

@rails-bot

This comment has been minimized.

Show comment
Hide comment
@rails-bot

rails-bot Apr 19, 2018

r? @pixeltrix

(@rails-bot has picked a reviewer for you, use r? to override)

rails-bot commented Apr 19, 2018

r? @pixeltrix

(@rails-bot has picked a reviewer for you, use r? to override)

Show outdated Hide outdated activerecord/lib/active_record/connection_adapters/mysql2_adapter.rb
@yahonda

This comment has been minimized.

Show comment
Hide comment
@yahonda

yahonda Apr 20, 2018

Contributor

@eugeneius Let me take a look at if it "breaks" Oracle enhanced adapter. If it does I am happy to update Oracle enhanced adapter.

Contributor

yahonda commented Apr 20, 2018

@eugeneius Let me take a look at if it "breaks" Oracle enhanced adapter. If it does I am happy to update Oracle enhanced adapter.

@eugeneius

This comment has been minimized.

Show comment
Hide comment
@eugeneius

eugeneius Apr 20, 2018

Member

Thanks @yahonda! 🙌 I've tried to make the implementation backwards compatible, so please let me know if it does break - it might mean that I made an incorrect assumption about the adapter interface.

Member

eugeneius commented Apr 20, 2018

Thanks @yahonda! 🙌 I've tried to make the implementation backwards compatible, so please let me know if it does break - it might mean that I made an incorrect assumption about the adapter interface.

@eugeneius

This comment has been minimized.

Show comment
Hide comment
@eugeneius

eugeneius Apr 20, 2018

Member

This test failure was legitimate: https://travis-ci.org/rails/rails/jobs/368924286#L1171-L1175

Executing a prepared statement doesn't touch the connection object (at least on the SQLite adapter), but does require the connection's transaction state to be up to date.

I've fixed the problem by materializing transactions in exec_query before prepared statements are executed, but that weakens the "it just works" property that I hoped to get from the proxy approach.

😕

Member

eugeneius commented Apr 20, 2018

This test failure was legitimate: https://travis-ci.org/rails/rails/jobs/368924286#L1171-L1175

Executing a prepared statement doesn't touch the connection object (at least on the SQLite adapter), but does require the connection's transaction state to be up to date.

I've fixed the problem by materializing transactions in exec_query before prepared statements are executed, but that weakens the "it just works" property that I hoped to get from the proxy approach.

😕

@kbrock

This comment has been minimized.

Show comment
Hide comment
@kbrock

kbrock Apr 22, 2018

Contributor

Thanks @eugeneius This turned out very nicely

/cc @Fryguy

Contributor

kbrock commented Apr 22, 2018

Thanks @eugeneius This turned out very nicely

/cc @Fryguy

@eugeneius

This comment has been minimized.

Show comment
Hide comment
@eugeneius

eugeneius Apr 22, 2018

Member

I've moved the materialize_transactions call from exec_query into a proxy wrapping the statement object (more proxies!), which feels a bit more robust.

Member

eugeneius commented Apr 22, 2018

I've moved the materialize_transactions call from exec_query into a proxy wrapping the statement object (more proxies!), which feels a bit more robust.

@jeremy jeremy requested review from matthewd, kamipo and sgrif Apr 22, 2018

@sgrif

sgrif approved these changes Apr 22, 2018

This feels like a lot of complexity for very little gain, but the implementation seems fine.

Show outdated Hide outdated ...lib/active_record/connection_adapters/abstract/lazy_transaction_proxy.rb

yahonda added a commit to yahonda/oracle-enhanced that referenced this pull request Apr 23, 2018

@yahonda

This comment has been minimized.

Show comment
Hide comment
@yahonda

yahonda Apr 23, 2018

Contributor

Oracle enhanced adapter needs some update to support this pull request. Here are CI results for this pull request https://travis-ci.org/yahonda/oracle-enhanced/builds/370051142 .

I am still having some jet lags due after a long flight, I'll take a look at this result in detail tomorrow.

Contributor

yahonda commented Apr 23, 2018

Oracle enhanced adapter needs some update to support this pull request. Here are CI results for this pull request https://travis-ci.org/yahonda/oracle-enhanced/builds/370051142 .

I am still having some jet lags due after a long flight, I'll take a look at this result in detail tomorrow.

Show outdated Hide outdated ...lib/active_record/connection_adapters/abstract/lazy_transaction_proxy.rb
Show outdated Hide outdated ...lib/active_record/connection_adapters/abstract/lazy_transaction_proxy.rb
@eugeneius

This comment has been minimized.

Show comment
Hide comment
@eugeneius

eugeneius Apr 23, 2018

Member

@yahonda can you try triggering another build? I renamed LazyTransactionProxy#raw_connection to LazyTransactionProxy#proxied_connection so it wouldn't clash with this method.

Member

eugeneius commented Apr 23, 2018

@yahonda can you try triggering another build? I renamed LazyTransactionProxy#raw_connection to LazyTransactionProxy#proxied_connection so it wouldn't clash with this method.

@matthewd

This comment has been minimized.

Show comment
Hide comment
@matthewd

matthewd Apr 23, 2018

Member

A few thoughts from an initial read-through:

Yay! ❤️ As I mentioned in Pittsburgh, it's awesome to have an implementation to look at -- this one's been sitting on the "nice to have" list for a while.

My gut feeling is that this "should" be closer to the transaction manager rather than the connection -- particularly the deferral stack, though the connection of course needs to be responsible for triggering the materialize.

Failing that, I also wonder about putting the functionality on the abstract connection itself rather than a proxy. I don't mind the code complexity needed for more thorough bookkeeping, but the proxy feels architecturally heavyweight.

Side-thought that would probably just distract us to pursue right now, but I'll mention anyway: this has some interesting parallels (deferring work on the connection until we need to run a real statement, and then potentially giving that some special handling) with #28200 and friends.

Finally, the big one, because it's about behaviour rather than implementation: I believe it's important that savepoints don't materialize the transaction -- that they too get deferred. To me, the ideal version of this change allows us to create transactions more freely, which means we can default requires_new to true, eliminating some very surprising behaviours where changes escape a transaction block. Reducing a no-op save from two queries to zero is great, of course.. I just see some other shinies in the distance.

Member

matthewd commented Apr 23, 2018

A few thoughts from an initial read-through:

Yay! ❤️ As I mentioned in Pittsburgh, it's awesome to have an implementation to look at -- this one's been sitting on the "nice to have" list for a while.

My gut feeling is that this "should" be closer to the transaction manager rather than the connection -- particularly the deferral stack, though the connection of course needs to be responsible for triggering the materialize.

Failing that, I also wonder about putting the functionality on the abstract connection itself rather than a proxy. I don't mind the code complexity needed for more thorough bookkeeping, but the proxy feels architecturally heavyweight.

Side-thought that would probably just distract us to pursue right now, but I'll mention anyway: this has some interesting parallels (deferring work on the connection until we need to run a real statement, and then potentially giving that some special handling) with #28200 and friends.

Finally, the big one, because it's about behaviour rather than implementation: I believe it's important that savepoints don't materialize the transaction -- that they too get deferred. To me, the ideal version of this change allows us to create transactions more freely, which means we can default requires_new to true, eliminating some very surprising behaviours where changes escape a transaction block. Reducing a no-op save from two queries to zero is great, of course.. I just see some other shinies in the distance.

@yahonda

This comment has been minimized.

Show comment
Hide comment
@yahonda

yahonda Apr 24, 2018

Contributor

CI is green now with Oracle enhanced adapter. As a nature of Oracle database transaction management begin and end statements are not executed to start and finish transaction. Then some of newly added tests get "failed" but it is okay for me.

Contributor

yahonda commented Apr 24, 2018

CI is green now with Oracle enhanced adapter. As a nature of Oracle database transaction management begin and end statements are not executed to start and finish transaction. Then some of newly added tests get "failed" but it is okay for me.

@eugeneius

This comment has been minimized.

Show comment
Hide comment
@eugeneius

eugeneius Apr 24, 2018

Member

Thanks for the detailed feedback (as always) @matthewd!

I originally wrote a patch that tracked deferred transactions in the transaction manager and materialised them from the adapter, just as you suggested. I wanted to explore the proxy idea instead because I was concerned about the maintenance burden of guarding all connection usage, but if that's an acceptable tradeoff to make for a simpler design, I'll dig up the other implementation and switch to it.

You make a good point that once we have a hook that checks some property of the connection every time it's used and are committed to maintaining that invariant, #28200 seems tantalisingly within reach.

I agree that savepoints should be in scope here, as well as "fixture" transactions; the latter is actually what motivated me to work this issue (10 connection pools * 30,000 tests = lots of empty transactions). Adding support for them breaks dozens of tests that assert that an exact set of queries are executed though, so I wanted to get feedback on the core of the idea before I fixed them all up.

Member

eugeneius commented Apr 24, 2018

Thanks for the detailed feedback (as always) @matthewd!

I originally wrote a patch that tracked deferred transactions in the transaction manager and materialised them from the adapter, just as you suggested. I wanted to explore the proxy idea instead because I was concerned about the maintenance burden of guarding all connection usage, but if that's an acceptable tradeoff to make for a simpler design, I'll dig up the other implementation and switch to it.

You make a good point that once we have a hook that checks some property of the connection every time it's used and are committed to maintaining that invariant, #28200 seems tantalisingly within reach.

I agree that savepoints should be in scope here, as well as "fixture" transactions; the latter is actually what motivated me to work this issue (10 connection pools * 30,000 tests = lots of empty transactions). Adding support for them breaks dozens of tests that assert that an exact set of queries are executed though, so I wanted to get feedback on the core of the idea before I fixed them all up.

@kamipo kamipo added the activerecord label May 1, 2018

@eugeneius

This comment has been minimized.

Show comment
Hide comment
@eugeneius

eugeneius May 1, 2018

Member

I've pushed an alternative implementation that guards all connection use in the adapters instead of wrapping the raw connection in a proxy. The previous implementation is available here.

This version defers savepoints as well as transactions, and is also enabled for "fixture" transactions.

Member

eugeneius commented May 1, 2018

I've pushed an alternative implementation that guards all connection use in the adapters instead of wrapping the raw connection in a proxy. The previous implementation is available here.

This version defers savepoints as well as transactions, and is also enabled for "fixture" transactions.

@matthewd

😍

I definitely prefer this approach. Thanks for preserving the previous one for comparison, too.

@sgrif are you more/less/equally inclined toward this version? It spreads out the materialize_transactions calls, but to places we already need to care about for other similar reasons (logging, connection locking, future #28200)... that aside, the logic feels more semantically contained to me.

I also think this makes my other future nice-to-have more accessible: if we're three unmaterialized transactions deep when we run a statement, we only need to send a single BEGIN. (If that transaction ends up rolled back, we send a ROLLBACK. If it ends up committed, we send nothing, and mark its parent as having been materialized.) But that's for another time.

return if @materializing_transactions
return unless @has_unmaterialized_transactions
@connection.lock.synchronize do

This comment has been minimized.

@matthewd

matthewd May 2, 2018

Member

@matthewd
connection.begin_isolated_db_transaction(options[:isolation])
def materialize!
if isolation_level
connection.begin_isolated_db_transaction(isolation_level)

This comment has been minimized.

@matthewd

matthewd May 2, 2018

Member

I suspect (at least some?) non-default isolation levels will need to be eagerly materialized... for some, the timing of the BEGIN relative to other connections is relevant, IIRC?

@matthewd

matthewd May 2, 2018

Member

I suspect (at least some?) non-default isolation levels will need to be eagerly materialized... for some, the timing of the BEGIN relative to other connections is relevant, IIRC?

This comment has been minimized.

@eugeneius

eugeneius May 2, 2018

Member

I don't think it's necessary, based on my reading of the relevant documentation:

Repeatable read

MySQL doesn't establish a snapshot until a read is performed, so delaying the BEGIN until the first statement should have no observable effect:

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read

Consistent reads within the same transaction read the snapshot established by the first read.

The PostgreSQL documentation is less clear, but appears to describe the same behaviour:

https://www.postgresql.org/docs/10/static/transaction-iso.html#XACT-REPEATABLE-READ

a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction

Serializable

Both MySQL and PostgreSQL define the SERIALIZABLE isolation level in terms of REPEATABLE READ, with some additional constraints:

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html#isolevel_serializable

This level is like REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to SELECT ... FOR SHARE

The transaction will be materialized before this constraint applies.

https://www.postgresql.org/docs/10/static/transaction-iso.html#XACT-SERIALIZABLE

this isolation level works exactly the same as Repeatable Read except that it monitors for conditions which could make execution of a concurrent set of serializable transactions behave in a manner inconsistent with all possible serial (one at a time) executions of those transactions.

It's hard to imagine how dodging a conflict by delaying the BEGIN TRANSACTION statement could be undesirable, if the resulting transactions are serializable.

@eugeneius

eugeneius May 2, 2018

Member

I don't think it's necessary, based on my reading of the relevant documentation:

Repeatable read

MySQL doesn't establish a snapshot until a read is performed, so delaying the BEGIN until the first statement should have no observable effect:

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html#isolevel_repeatable-read

Consistent reads within the same transaction read the snapshot established by the first read.

The PostgreSQL documentation is less clear, but appears to describe the same behaviour:

https://www.postgresql.org/docs/10/static/transaction-iso.html#XACT-REPEATABLE-READ

a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction

Serializable

Both MySQL and PostgreSQL define the SERIALIZABLE isolation level in terms of REPEATABLE READ, with some additional constraints:

https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html#isolevel_serializable

This level is like REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to SELECT ... FOR SHARE

The transaction will be materialized before this constraint applies.

https://www.postgresql.org/docs/10/static/transaction-iso.html#XACT-SERIALIZABLE

this isolation level works exactly the same as Repeatable Read except that it monitors for conditions which could make execution of a concurrent set of serializable transactions behave in a manner inconsistent with all possible serial (one at a time) executions of those transactions.

It's hard to imagine how dodging a conflict by delaying the BEGIN TRANSACTION statement could be undesirable, if the resulting transactions are serializable.

@@ -80,6 +80,8 @@ class AbstractAdapter
attr_reader :schema_cache, :owner, :logger, :prepared_statements, :lock
alias :in_use? :owner
set_callback :checkin, :after, :enable_lazy_transactions!

This comment has been minimized.

@matthewd

matthewd May 2, 2018

Member

🌟

@matthewd
@kamipo

This comment has been minimized.

Show comment
Hide comment
@kamipo

kamipo Aug 6, 2018

Member

How much does this change improve the performance of real apps (real use cases)?
I suspect that there is very few empty transactions.

Member

kamipo commented Aug 6, 2018

How much does this change improve the performance of real apps (real use cases)?
I suspect that there is very few empty transactions.

@matthewd

This comment has been minimized.

Show comment
Hide comment
@matthewd

matthewd Aug 6, 2018

Member

@kamipo I think this is valuable for two use cases: 1) bulk updates, where many/most records aren't actually changed: each save is a pair of DB round-trips, with no benefit (but we have to start the transaction before the before_save, which means it has to happen before we know there's nothing to write), and 2) nested transactions: we currently have very confusing behaviour around transaction do because it defaults to do-nothing if you're already in an outer transaction, but that does surprising things if there's a rollback. I think this change will allow us to default to using proper nested transactions a lot more often, because they become free unless they're used. (And most of the code changes here will be amortized with #28200 -- that needs to hook the same places, so they'll end up sharing the churn cost.)

Basically, I think single empty transactions are likely quite rare, and also uninteresting (a small fixed cost on an unusual request is not much of a concern) -- but there are reasonable-enough patterns that can lead to lots of empty transactions, and while the caller could work around them, it's something we're in a better position to solve.

The two specific use cases I know of are 1) ManageIQ, which carried a custom patch since Rails 2.2 (they lost it during an upgrade, but last I heard it's still an ongoing source of pain), and 2) @jeremy's point in #17937 (comment) about not-actual-writes forcing a switch to the primary DB (likely to become a wider complaint as we improve built-in support for such configurations).

Member

matthewd commented Aug 6, 2018

@kamipo I think this is valuable for two use cases: 1) bulk updates, where many/most records aren't actually changed: each save is a pair of DB round-trips, with no benefit (but we have to start the transaction before the before_save, which means it has to happen before we know there's nothing to write), and 2) nested transactions: we currently have very confusing behaviour around transaction do because it defaults to do-nothing if you're already in an outer transaction, but that does surprising things if there's a rollback. I think this change will allow us to default to using proper nested transactions a lot more often, because they become free unless they're used. (And most of the code changes here will be amortized with #28200 -- that needs to hook the same places, so they'll end up sharing the churn cost.)

Basically, I think single empty transactions are likely quite rare, and also uninteresting (a small fixed cost on an unusual request is not much of a concern) -- but there are reasonable-enough patterns that can lead to lots of empty transactions, and while the caller could work around them, it's something we're in a better position to solve.

The two specific use cases I know of are 1) ManageIQ, which carried a custom patch since Rails 2.2 (they lost it during an upgrade, but last I heard it's still an ongoing source of pain), and 2) @jeremy's point in #17937 (comment) about not-actual-writes forcing a switch to the primary DB (likely to become a wider complaint as we improve built-in support for such configurations).

@kamipo

This comment has been minimized.

Show comment
Hide comment
@kamipo

kamipo Aug 6, 2018

Member

I see. Thanks for the clarification, it would help to understand the benefit for future readers (and me).

Member

kamipo commented Aug 6, 2018

I see. Thanks for the clarification, it would help to understand the benefit for future readers (and me).

Show outdated Hide outdated activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
Omit BEGIN/COMMIT statements for empty transactions
If a transaction is opened and closed without any queries being run, we
can safely omit the `BEGIN` and `COMMIT` statements, as they only exist
to modify the connection's behaviour inside the transaction. This
removes the overhead of those statements when saving a record with no
changes, which makes workarounds like `save if changed?` unnecessary.

This implementation buffers transactions inside the transaction manager
and materializes them the next time the connection is used. For this to
work, the adapter needs to guard all connection use with a call to
`materialize_transactions`. Because of this, adapters must opt in to get
this new behaviour by implementing `supports_lazy_transactions?`.

If `raw_connection` is used to get a reference to the underlying
database connection, the behaviour is disabled and transactions are
opened eagerly, as we can't know how the connection will be used.
However when the connection is checked back into the pool, we can assume
that the application won't use the reference again and reenable lazy
transactions. This prevents a single `raw_connection` call from
disabling lazy transactions for the lifetime of the connection.

@matthewd matthewd merged commit 123fe0c into rails:master Aug 23, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@matthewd

This comment has been minimized.

Show comment
Hide comment
@matthewd

matthewd Aug 23, 2018

Member

Sorry I left this sitting for ages, and thanks again!

❤️ 💚 💙 💛 💜

Member

matthewd commented Aug 23, 2018

Sorry I left this sitting for ages, and thanks again!

❤️ 💚 💙 💛 💜

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 23, 2018

Member
Member

jeremy commented Aug 23, 2018

@kbrock

This comment has been minimized.

Show comment
Hide comment
@kbrock

kbrock Aug 23, 2018

Contributor

Thanks so much Matthew and Eugene and all others who took a stab at this and helped it go forward

Contributor

kbrock commented Aug 23, 2018

Thanks so much Matthew and Eugene and all others who took a stab at this and helped it go forward

@Nondv

This comment has been minimized.

Show comment
Hide comment
@Nondv

Nondv Aug 23, 2018

Wow. I was just wondering about omiting empty transactions and found out that it was merged several hours ago.

Thank you, rails community. You are awesome <3

Nondv commented Aug 23, 2018

Wow. I was just wondering about omiting empty transactions and found out that it was merged several hours ago.

Thank you, rails community. You are awesome <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment