Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer a simple counter-based lock. #47257

Closed
wants to merge 2 commits into from
Closed

Conversation

ioquatix
Copy link
Contributor

@ioquatix ioquatix commented Feb 5, 2023

This allows acquire_read_lock and release_read_lock to be called from different fibers/threads (which can be the case for falcon).

See 9a4c1e2 which introduced the current implementation.

Motivation / Background

This is causing problems as outlined in detail here: socketry/falcon#166

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.

@@ -2,9 +2,61 @@

module ActionView
class CacheExpiry
class ExecutionLock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be possible to contribute this kind of lock into the concurrent-ruby itself? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about it, but I don't think it's enough of a general implementation to make sense to upstream to concurrent-ruby. That being said, we could ask? It's quite specific to the use case of the cache layer.

This allows `acquire_read_lock` and `release_read_lock` to be called from
different fibers/threads (which can be the case for falcon).
@ioquatix
Copy link
Contributor Author

ioquatix commented Feb 6, 2023

This PR specifically includes tests for the faulty behaviour in the linked issue, so it definitely fixes that specific issue and can prevent regressions. Additionally, I wrote a few more tests to confirm the correct behaviour of the execution lock.

@ioquatix ioquatix requested a review from simi February 6, 2023 03:02

execution_lock.with_write_lock do
assert_equal 1, execution_lock.write_count
assert_equal 1, execution_lock.read_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the read_count always be 0 inside the with_write_lock block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because you can upgrade a read lock to a write lock (the reentrant part of the original design).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, having the write lock is considered having both and here it's tracked explicitly in both counts.

end

# Wait for both the threads to be waiting for the lock:
Thread.pass until reader.status == "sleep" && writer.status == "sleep"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the reader ever wait/sleep? It can acquire the read lock straight away and the same for the rest of its body, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right, I should ensure the write lock has acquired first.

@eregon
Copy link
Contributor

eregon commented Feb 6, 2023

Which test would fail with the previous lock and falcon? test_invoke_cache_expiry_on_different_fibers?

@matthewd
Copy link
Member

matthewd commented Feb 6, 2023

The lack of safety checks here makes me nervous (e.g. release_read_lock when no such lock is held). While a basic "make sure it can't go negative" would be a slight improvement, it seems generally bad to not ensure we are releasing the same locks we're acquiring.

This also feels more like a symptom of a more general possible concern. Is there something we should be doing to ensure both calls occur in the same execution context? Beyond any locks like this, it seems like it would also break thread/fiber variables, and thus IsolatedExecutionState and all of its dependents.

@eregon
Copy link
Contributor

eregon commented Feb 6, 2023

Indeed, I also wonder about the need to change this.
Maybe ActionView::CacheExpiry::Executor's run and complete should be a single method taking a block, to ensure the complete part is done on the same Fiber?
What's the need to call them on different Fibers/Threads?

@ioquatix
Copy link
Contributor Author

ioquatix commented Feb 6, 2023

The lack of safety checks here makes me nervous (e.g. release_read_lock when no such lock is held). While a basic "make sure it can't go negative" would be a slight improvement, it seems generally bad to not ensure we are releasing the same locks we're acquiring.

I agree, but this is the design of Rail's Executor mechanism (which I assume we would prefer not to change). In addition, this code path is only used in development environments so the risk is low.

This also feels more like a symptom of a more general possible concern. Is there something we should be doing to ensure both calls occur in the same execution context? Beyond any locks like this, it seems like it would also break thread/fiber variables, and thus IsolatedExecutionState and all of its dependents.

There is no guarantee that the app.call(env) and response.close will occur on the same thread or fiber by Rack. Falcon may choose to stream the response body in a separate fiber if it makes sense. I don't know if this breaks other assumptions in Rails but all the tests are passing, and multiple things break (in the Rails test suite) if you don't get this behaviour right. We've also got positive reports from users and my own testing confirms this fix.

Maybe ActionView::CacheExpiry::Executor's run and complete should be a single method taking a block, to ensure the complete part is done on the same Fiber?

It's impossible to change this due to the way it's used and designed.

https://github.com/rails/rails/blob/4208cf2234988969a485a21233e932a8cb120ccc/actionpack/lib/action_dispatch/middleware/executor.rb

Personally, I don't think the executor model should be taking long-lived locks, it's a bad design IMHO, but I also understand what is trying to be achieved.

The long lived lock can cause problems in many ways... the design of the caching layer to depend on that seems like a mistake to me. Rather than reloading before each request, why not just not doing any caching if it's not requested for example. There are no other use case in Rails like this one so if we could change this behaviour it would be vastly simpler.

What's the need to call them on different Fibers/Threads?

Sometimes async-http needs to move a response body into a separate fiber, e.g. to turn a response body which only responds to #each into one that is streaming, a little bit similar to how Enumerator sometimes creates an internal Fiber.

- Ensure test writer thread is waiting before reader.
@jhawthorn
Copy link
Member

jhawthorn commented Feb 8, 2023

I think we should not do this. This reloader really should not have its own lock in the first place (it's not significantly enough different from other things we need to reload). I'll open a PR shortly to move us to reusing the same reloader infrastructure as elsewhere.

It's possible that will work around the immediate error, but I agree with the concerns about other issues. In general I think everything expects the request to remain on the same thread/fiber (DB connection, CurrentAttributes, much more) it may just be that this is the most obvious way in which it fails.

@ioquatix
Copy link
Contributor Author

ioquatix commented Feb 9, 2023

Thanks for the review. I'll close this PR. Can you please link me to your proposed fix? Thanks!

@ioquatix ioquatix closed this Feb 9, 2023
@eregon
Copy link
Contributor

eregon commented Feb 9, 2023

There is no guarantee that the app.call(env) and response.close will occur on the same thread or fiber by Rack. Falcon may choose to stream the response body in a separate fiber if it makes sense.

I think it may be more OK to write/stream the response in a separate fiber, but in terms of structured concurrency and safe handling of resources, it seems important/valuable that whatever "opens" the resource closes it on the same Fiber.
So you could have F1 opens, F2 streams, F3 streams, F1 closes.

@ioquatix
Copy link
Contributor Author

I think it may be more OK to write/stream the response in a separate fiber, but in terms of structured concurrency and safe handling of resources, it seems important/valuable that whatever "opens" the resource closes it on the same Fiber.
So you could have F1 opens, F2 streams, F3 streams, F1 closes.

Can you elaborate on why that's important?

In my case, passing the stream between different clients and servers (in the case of a proxy etc) is fairly useful. The server that's writing the response may not be the same as the one that generated it, etc.

@matthewd
Copy link
Member

Can you elaborate on why that's important?

That's this bit:

it would also break thread/fiber variables, and thus IsolatedExecutionState and all of its dependents

@ioquatix ioquatix deleted the execution-lock branch February 10, 2023 08:15
@eregon
Copy link
Contributor

eregon commented Feb 10, 2023

Yeah, and I also meant it does break structured concurrency, i.e., it's unstructured concurrency if the "parent" Fiber which "opens" the resource doesn't close it, and Fibers inside do not end before that parent Fiber.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants