Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument Active Record transactions #49192

Merged
merged 1 commit into from
Sep 12, 2023

Conversation

ipc103
Copy link
Contributor

@ipc103 ipc103 commented Sep 7, 2023

Motivation / Background

This Pull Request adds events to track when Active Record-managed transactions occur. Tracking Active Record-managed transactions seems to be a common need, but there's currently not a great way to do it. Here's a few examples we've seen:

Detail

This PR adds a new transaction.active_record event that should provide a more reliable solution for these various use cases. It includes the connection in the payload (useful, for example, in differentiating transactions to different databases),

Additional information

  • This instrumentation needs to start and finish at fairly specific times:
    • start on materialize
    • finish after committing or rolling back, but before the after_commit or after_rollback callbacks
    • finish and start again when the transaction restarts (at least for real transactions—we've done it for savepoints as well but I'm not certain we should)
    • ensure it finishes if commit and rollback fail (e.g. if the connection goes away) To make all that work, this commit uses the lower-level #build-handle API instead of #instrument.
  • If this change gets merged we're also planning to add details about what type of transaction it is (savepoint or real) and what the outcome is (commit, rollback, restarted, errored).

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@ipc103 ipc103 force-pushed the transaction-instrumentation branch 3 times, most recently from 64a1da7 to 4a1b6fd Compare September 8, 2023 16:21
Tracking Active Record-managed transactions seems to be a common need,
but there's currently not a great way to do it. Here's a few examples
I've seen:

* GitHub has custom transaction tracking that monkey patches the Active
  Record `TransactionManager` and `RealTransaction`. We use the tracking
  to prevent opening a transaction to one database cluster inside a
  transaction to a different database cluster, and to report slow
  transactions (we get slow transaction data directly from MySQL as well,
  but it's still helpful to report from the application with backtraces to
  help track them down).
* https://github.com/palkan/isolator tracks transactions to prevent
  non-atomic interactions like external network calls inside a
  transaction. The gem works by subscribing to `sql.active_record`, then
  piecing together the transactions by looking for `BEGIN`, `COMMIT`,
  `SAVEPOINT`, etc., but this is unreliable:
    - palkan/isolator#65
    - palkan/isolator#64
* It looks like GitLab patches `TransactionManager` and `RealTransaction`
  to track nested savepoints. See palkan/isolator#46

This commit adds a new `transaction.active_record` event that should
provide a more reliable solution for these various use cases. It
includes the connection in the payload (useful, for example, in
differentiating transactions to different databases), but if this change
gets merged we're also planning to add details about what type of
transaction it is (savepoint or real) and what the outcome is (commit,
rollback, restarted, errored).

This instrumentation needs to start and finish at fairly specific times:
  - start on materialize
  - finish after committing or rolling back, but before the after_commit
    or after_rollback callbacks
  - finish and start again when the transaction restarts (at least for
    real transactions—we've done it for savepoints as well but I'm not
    certain we should)
  - ensure it finishes if commit and rollback fail (e.g. if the
    connection goes away)
To make all that work, this commit uses the lower-level `#build-handle`
API instead of `#instrument`.

Co-authored-by: Ian Candy <ipc103@github.com>
@rafaelfranca rafaelfranca merged commit 514021f into rails:main Sep 12, 2023
4 checks passed
@ipc103 ipc103 deleted the transaction-instrumentation branch September 12, 2023 13:17
composerinteralia added a commit to composerinteralia/rails that referenced this pull request Sep 13, 2023
Following up on rails#49192, this commit
adds the transaction `outcome` to the payload, helpful for collecting
stats on how many transactions commit, rollback, restart, or (perhaps
most interestingly) are incomplete because of an error.

The one quirk here is that we have to modify the payload on finish. It's
not the only place this sort of thing happens (instrument mutates the
payload with exceptions, for example), but it does mean we need to dup
the payload we initialize with to avoid mutating it for other tracking.

Co-authored-by: Ian Candy <ipc103@github.com>
@fxn
Copy link
Member

fxn commented May 29, 2024

For practical uses, a pity the transaction or some identifier or something is not available in the payload. I am pondering to add a (lazy) UUID to transactions to be able to group queries in logs.

@zzak
Copy link
Member

zzak commented May 30, 2024

I also felt weird the whole connection object was made available, but I don't have a clear reason as to why.
But I guess the transaction is not a public API. 🤔

@fxn
Copy link
Member

fxn commented May 30, 2024

Right, in 7.2 it is going to be. There is now ActiveRecord::Transaction, current_transaction, transaction calls receive a transaction object as block parameter, etc.

I have the transaction UUID in place, and will look at this event now.

@fxn
Copy link
Member

fxn commented May 30, 2024

PR passing the transaction object here #51955.

@composerinteralia
Copy link
Member

composerinteralia commented Jun 3, 2024

I also felt weird the whole connection object was made available, but I don't have a clear reason as to why.

GitHub mainly uses it to differentiate transactions by connection (e.g. using the #connection_class), which gives us a way to detect and prevent nesting transactions to different databases. It looks like the isolator gem is also using the connection for similar purposes.

FWIW sql.active_record also passes the whole connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants