Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple transactional fixtures and active connections #50999

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

casperisfine
Copy link
Contributor

@casperisfine casperisfine commented Feb 7, 2024

Ref: #50793

Transactional fixtures are currently tightly coupled with the pool active connection. It assumes calling pool.connection will memoize the checked out connection and leverage that to start a transaction on it and ensure all subsequent accesses will get the same connection.

To allow to remove checkout caching (or make it optional), we first must decouple transactional fixtures to not rely on it.

The idea is to behave similarly, but store the connection in the pool as a special "pinned" connection, and not as the regular active connection.

This allows to always return the same pinned connection, but without necessarily assigning it as the active connection.

Additionally, this pinning impact all threads and fibers, so that all threads have a consistent view of the database state.

FYI @matthewd

@casperisfine casperisfine force-pushed the refactor-transactional-fixtures branch 7 times, most recently from c52280f to 58a01b8 Compare February 7, 2024 15:22
Copy link
Member

@matthewd matthewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the conceptual inversion here... pool.lock_thread = true always felt like it made the approach sound hackier than it is, even if imperfect.

connection = checkout_and_verify(acquire_connection(checkout_timeout))
connection.lock_thread = @lock_thread
connection
return @pinned_connection if @pinned_connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings on a pin overriding an explicit checkout 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't have it initially, I can try removing it, but I think it was necessary to pass some tests (I've been fighting with some state leaks in test that predate my change but are made more apparent now).

But overall this "pin" feature is really meant as a private API for transactional fixtures only. I'm even tempted to look at instead replacing the connection pool by a different implementation when you use transaction fixtures.

But it's already the third major refactoring I'm doing to get rid of the checkout cache, so I'm not super keen on going deeper in the rabbit hole...

def unpin_connection! # :nodoc:
raise "There isn't a pinned connection" unless @pinned_connection

yield @pinned_connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This yield feels a bit weird... it's strictly safer than callers assuming they hold a pin, doing things, then hitting the raise... but maybe that's okay / their problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure what you mean. The raise is mostly here for me to find issues with the change (and it did find some). Since this is an internal only API it's really meant as some form of assertion. If you unpin twice, something weird likely happened.

@casperisfine casperisfine force-pushed the refactor-transactional-fixtures branch from 58a01b8 to 27a7f5f Compare February 7, 2024 18:52
@casperisfine
Copy link
Contributor Author

For tomorrow:

  • The Active Record suite still suffer from a handful of test leaks
  • test/application/test_runner_test.rb seem to consistently get stuck on CI.

@casperisfine
Copy link
Contributor Author

Alright, I think I got rid of the leaks by detecting when the transaction was committed and properly reseting state when that happen. This is likely a very good thing even for apps using this, it just need to be cleaned up.

The railties test/application/test_runner_test.rb test still appear to get stuck on CI, so now digging into that.

@casperisfine
Copy link
Contributor Author

Alright, the stuck test is: ApplicationTests::TestRunnerTest#test_run_in_parallel_with_threads

What is happening is that it sets up: parallelize(workers: 2, with: :threads) and run:

          require "test_helper"

          class ParallelTest < ActiveSupport::TestCase
            Q1 = Queue.new
            Q2 = Queue.new
            test "one" do
              assert_equal "x", Q1.pop # blocks until two runs

              Q2 << "y"
            end

            test "two" do
              Q1 << "x"

              assert_equal "y", Q2.pop # blocks until one runs
            end
          end

Which dead locks because only one thread can reasonably have transactional fixtures enabled (and realistically do DB operations on the same DB).

So the test was passing before, because the "pinned connection" (the active one really) was the only pinned one. Now we essentially lock the whole pool.

The doc states:

     # The default parallelization method is to fork processes. If you'd like to
     # use threads instead you can pass <tt>with: :threads</tt> to the +parallelize+
     # method. Note the threaded parallelization does not create multiple
     # databases and will not work with system tests.
     #
     #   parallelize(workers: :number_of_processors, with: :threads)
     #

So not quite sure what to do here, I'll think about it. Either way I think there a bunch of things I can cleanup and extract from this PR, starting with the leak handling etc.

casperisfine pushed a commit to Shopify/rails that referenced this pull request Feb 8, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
casperisfine pushed a commit to Shopify/rails that referenced this pull request Feb 8, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
casperisfine pushed a commit to Shopify/rails that referenced this pull request Feb 8, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
casperisfine pushed a commit to Shopify/rails that referenced this pull request Feb 8, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
@casperisfine casperisfine force-pushed the refactor-transactional-fixtures branch 2 times, most recently from 89be1c7 to b6cd3a2 Compare February 8, 2024 11:25
@rails-bot rails-bot bot added the railties label Feb 8, 2024
@casperisfine casperisfine force-pushed the refactor-transactional-fixtures branch 3 times, most recently from ca426d4 to b8c1bd0 Compare February 8, 2024 12:29
@casperisfine casperisfine force-pushed the refactor-transactional-fixtures branch 3 times, most recently from 2c503fc to b34ba9f Compare February 8, 2024 16:11
@casperisfine
Copy link
Contributor Author

Alright, I finally figured the last flaky test, mostly need to cleanup now, and add some coverage.

@casperisfine casperisfine deleted the refactor-transactional-fixtures branch February 12, 2024 08:54
Ridhwana pushed a commit to Ridhwana/rails that referenced this pull request Mar 7, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
Ridhwana pushed a commit to Ridhwana/rails that referenced this pull request Mar 7, 2024
Extracted from: rails#50999

- Make fixtures setup and teardown methods private.
- Don't run adapter thread safety tests with sqlite3_mem
- Make foreign_key_tests more resilient to leaked state
- Use `exit!` in fork to avoid `at_exit` side effects.
- Disable transactional fixtures in tests that do a lot of low level
  assertions on connections or connection pools.
viralpraxis pushed a commit to viralpraxis/rails that referenced this pull request Mar 24, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
viralpraxis pushed a commit to viralpraxis/rails that referenced this pull request Mar 24, 2024
Extracted from: rails#50999

- Make fixtures setup and teardown methods private.
- Don't run adapter thread safety tests with sqlite3_mem
- Make foreign_key_tests more resilient to leaked state
- Use `exit!` in fork to avoid `at_exit` side effects.
- Disable transactional fixtures in tests that do a lot of low level
  assertions on connections or connection pools.
fractaledmind pushed a commit to fractaledmind/rails that referenced this pull request May 13, 2024
Extracted from: rails#50999

Some tests may use the connections in ways that cause the fixtures
transaction to be committed or rolled back. The typical case being
doing schema change query in MySQL, which automatically commits the
transaction. But ther eare more subtle cases.

The general idea here is to ensure our transaction is correctly
rolling back during teardown. If it fails, then we assume something
might have mutated some of the inserted fixtures, so we invalidate
the cache to ensure the next test will reset them.

This issue is particularly common in Active Record's own test suite
since transaction fixtures are enabled by default but we have
many tests create tables and such.

We could treat this case as an error, but since we can gracefully
recover from it, I don't think it's worth it.
fractaledmind pushed a commit to fractaledmind/rails that referenced this pull request May 13, 2024
Extracted from: rails#50999

- Make fixtures setup and teardown methods private.
- Don't run adapter thread safety tests with sqlite3_mem
- Make foreign_key_tests more resilient to leaked state
- Use `exit!` in fork to avoid `at_exit` side effects.
- Disable transactional fixtures in tests that do a lot of low level
  assertions on connections or connection pools.
@palkan
Copy link
Contributor

palkan commented Jun 5, 2024

@casperisfine Hey! I'm super late to the party, but somehow I missed this change :(

(Don't want to open an issue/PR without discussing first, so...)

Context: this change breaks TestProf's before_all (used by tons of apps including, e.g., Discourse) in a way that it would be hard and unpleasant to fix without a decent amount of monkey-patching. Sure, I can do that but maybe it's not too late tweak the implementation.

What are the concerns?

First, it's a bit surprising (to me) that we hid begin_transaction in the #pin_connection!. Previously, test_fixtures.rb was directly responsible for that, and now it's implicit (yeah, it's an internal feature, no one is expected to read through it, but anyway). So, there is no way to pin connection outside of the setup phase and let test_fixtures do its trick (like we do now).

Although pinning is only used in test_fixtures.rb now, I'm not sure that making it transactional by default (and without an option to avoid transactions) would make senses in other scenarios (though I can't imagine one).

My proposal is to move begin_transaction back to test_fixtures.rb and make #pin_connection! responsible only for... well, pinning (as they state). Rolling back transaction in the #unpin_connection! should stay, of course; but I would add explicit #rollback_transaction to test_fixtures.rb, too.

Secondly, I would propose making pinning idempotent, i.e., calling #pin_transaction! twice shouldn't raise—why do we raise at all? (Probably, due the transaction management, so, see above). Same for #unpin_transaction!. Since #pin_transaction! also accepts the lock_thread argument, we may consider raising if @pinned_connection.lock_thread != lock_thread—that makes sense.

I'm happy to work on this changes if there is a green light from the team.

@byroot
Copy link
Member

byroot commented Jun 5, 2024

Although pinning is only used in test_fixtures.rb now, I'm not sure that making it transactional by default (and without an option to avoid transactions) would make senses in other scenarios

If you don't need a transaction, you don't need pining at all, or am I missing something?

@palkan
Copy link
Contributor

palkan commented Jun 5, 2024

If you don't need a transaction, you don't need pining at all, or am I missing something?

Yeah, at least in the known scenarios (test_fixtures.rb, our before_all).

The problem is with nested transactions. Prior to this change, pinning management and transactions management were independent, so we could pin connections and start transactions in any order, and, which is more important, double-pinning wasn't a problem.

We had:

begin # test_fixtures
pin  # test_fixtures
run test
rollback  # test_fixtures
unpin  # test_fixtures

With before_all (context-wide transaction):

begin # before_all
pin  # before_all

savepoint # test_fixtures
pin  # test_fixtures — this pin is no-op since the connection is already pinned, and that's fine—that's exactly what we need
run test
release  # test_fixtures
unpin  # test_fixtures

savepoint # test_fixtures
pin  # test_fixtures
run test 2
release  # test_fixtures
unpin  # test_fixtures

rollback  # before_all

Now, we cannot simply #pin_connection! in the wrapping context. That would break test_fixtures. Not only because it would raise an exception ("already pinned"), we can replace raise with return; but then we need to bring back the begin_transaction somehow.

@byroot
Copy link
Member

byroot commented Jun 5, 2024

I'm sorry, I still don't understand why TestProf would need to concern itself with connection pinning. I'll try to read the code tomorrow.

@byroot
Copy link
Member

byroot commented Jun 5, 2024

      # Make sure ActiveRecord uses locked thread.
      # It only gets locked in `before` / `setup` hook,
      # thus using thread in `before_all` (e.g. ActiveJob async adapter)
      # might lead to leaking connections

Is this the reason?

@palkan
Copy link
Contributor

palkan commented Jun 5, 2024

Thanks!

I'm trying to recall why we added pinning in the first place. Here is the commit (ActiveJob async?) where we first introduced it and here is the corresponding issue (though not too many details, just some "threads vs before_all").

@palkan
Copy link
Contributor

palkan commented Jun 5, 2024

(Oops, I was wiring my long response...).

Yeah, it all goes down to threads. And of the potential causes of threads involved into the setup phase is AJ async (via AR, callbacks, and all that typical Rails stuff).

@byroot
Copy link
Member

byroot commented Jun 6, 2024

I forgot to ask for some repro steps so I could see if there was a better solution in test-prof -_-.

I found the test case that does Active Record queries from a background thread in before_all, removed the lock_thread assignment, and it straight out can't find the users table, I presume because by default it uses a in-memory SQLite. If I use DB=sqlite-file it's not failing though.

So I'm wondering if what you need isn't just a way to trigger db setup early?

@matthewd
Copy link
Member

matthewd commented Jun 6, 2024

My gut feeling is that you basically want to lift setup_transaction_fixtures to much earlier and then swap it to only create a nested transaction when called later (+ corresponding for teardown) -- that is, you want the pin, but you also want the pool config mangling and the late-connection subscriber.


If double-pinning was the only concern, I think you could amend your previous timeline by inserting an unpin:

begin  # before_all
pin    # before_all
(user-supplied before_all hook)
unpin  # before_all

savepoint # test_fixtures
pin       # test_fixtures
...

That doesn't feel great to me, conceptually, but given the fact unpins already occur between the rest of the tests after the first one, it doesn't seem newly bad.

(From a quick re-read of unpin_connection!, I think this wouldn't work today because of our checkin call... but I think that's just a bug: we should only be doing that if [after our rollback] there is not still a transaction open, or perhaps only if it was a fresh checkout in pin_connection.)


Getting beyond what's practical for 7.2 right now, I do wonder if there's a worthwhile object to be teased out here, separating the shared-connection management from the actual fixture population/retrieval. It seems like it would be pretty useful for you to have an easy place to patch in to the subscriber, for example, to avoid your "need to ensure the connection classes are loaded" instruction.

@palkan
Copy link
Contributor

palkan commented Jun 6, 2024

I presume because by default it uses a in-memory SQLite. If I use DB=sqlite-file it's not failing though.

Good catch. I've updated the test to fail and better demonstrate the problem with no lock_thread = true: test-prof/test-prof@58f38d0

@byroot
Copy link
Member

byroot commented Jun 6, 2024

I've updated the test to fail and better demonstrate the problem with no lock_thread = true

Right, so I think this better show that what you are really after it to trigger fixtures insertion / connection pinning earlier. So perhaps that's what we should provide?

@palkan
Copy link
Contributor

palkan commented Jun 6, 2024

you could amend your previous timeline by inserting an unpin:

Currently, #unpin_connection! does rollback_transaction, so we can't just unpin (and already mentioned checkin).

What I can probably do is mark the connection as unpinned (like, remove the instance variable and let the #pin_connection! from test_fixtures do the trick). Maybe, that could work (though, again, it's mangling with AR internals—maintainer hell in the long term 😁).

but I think that's just a bug: we should only be doing that if [after our rollback] there is not still a transaction open, or perhaps only if it was a fresh checkout in pin_connection.

Yeah, looks like a bug if we assume that pinning can be used with outer transactions.

@palkan
Copy link
Contributor

palkan commented Jun 6, 2024

it to trigger fixtures insertion / connection pinning earlier. So perhaps that's what we should provide?

Yeah, kind of. And I still need the current transaction-wrapping done by fixtures.

P.S. Added threaded scenarios to the fixtures scenario: test-prof/test-prof@8fe54fa (currently playing with it and the current Rails main).

@matthewd
Copy link
Member

matthewd commented Jun 6, 2024

unpin_connection! does rollback_transaction

Oh right 🤦 -- ummm... start an extra layer of transaction right before you unpin? 😅

@byroot
Copy link
Member

byroot commented Jun 6, 2024

@palkan, about the initial refactoring you suggested, I don't quite pitcture it myself at this stage, but I'm not hell bent on pinning and the begin_transaction being tied together. So if you think you can come up with a refactor that makes your life easier without making the Rails side worse, I'm totally open to merge it.

As for why pin/unpin raise an exception if called twice, it's because during development I ran into issues caused by the code being called multiple times, so if you want to remove that, we'll need some tests that asserts it's behaving as expected when used "recursively".

@palkan
Copy link
Contributor

palkan commented Jun 6, 2024

As for why pin/unpin raise an exception if called twice, it's because during development I ran into issues caused by the code being called multiple times, so if you want to remove that, we'll need some tests that asserts it's behaving as expected when used "recursively".

That's what I ended up with—removing raise ... from #pin_connection!; all tests pass, so it seems to be enough: test-prof/test-prof@fd2bc8b

So if you think you can come up with a refactor that makes your life easier without making the Rails side worse, I'm totally open to merge it.

👌

@palkan
Copy link
Contributor

palkan commented Jun 6, 2024

Okay, here we go: #52037

@byroot @matthewd Thanks for your help! See your there 🙂

palkan added a commit to test-prof/test-prof that referenced this pull request Jun 8, 2024
palkan added a commit to test-prof/test-prof that referenced this pull request Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants