-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpectedly and often getting Concurrent::IllegalOperationError #166
Comments
I will take a look. |
Hey, @ioquatix , any progress on this? Is there anything I can do to improve the bug report? I would love to be able to use Falcon rather than Puma. |
If you have time to make a minimal repro within just Async and Concurrent ruby that would be awesome. |
The 2nd issue reported is probably the same issue as socketry/async#157 |
@ioquatix, behavior does seem alike. But possibly not identical. I don't have the debug gem in my project, the removal of which seems to be the solution for the other issue. |
My understanding is the debug gem introduces autoload which can cause the described hang. |
@ioquatix, it seems reasonable. Not sure what autoload it introduces because I will have to go through a lot of gems to detect that. |
We need to fix the problem in Ruby. |
Just chiming in here to say that I'm experiencing this also. I was getting it a lot, and then I upgraded to ruby 3.2 preview 2 and that seemed to eliminate the problem. But then I started working on a page that deals with a lot of ActiveStorage previews, and my problems came back. It may be relevant that I'm using the DiskStorage with ActiveStorage. I haven't noticed the hanging in production which uses S3 for storage. It's not consistent, but I usually get 2-3 page loads before it hangs and I have to kill the server and restart it. Here's my error output from falcon:{
"time":"2022-10-16T17:30:47-05:00",
"severity":"error",
"class":"Protocol::Rack::Adapter::Rack2",
"oid":30660,
"pid":29941,
"subject":"#<Protocol::Rack::Adapter::Rack2:0x000000010682aa58>",
"message":"Cannot release a read lock which is not held",
"error":{
"kind":"Concurrent::IllegalOperationError",
"message":"Cannot release a read lock which is not held",
"stack":"/Users/will/.gem/ruby/3.2.0/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:244:in `release_read_lock'\n/Users/will/.gem/ruby/3.2.0/gems/actionview-7.0.3/lib/action_view/cache_expiry.rb:21:in `complete'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/execution_wrapper.rb:37:in `before'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:423:in `block in make_lambda'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:199:in `block (2 levels) in halting'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:687:in `block (2 levels) in default_terminator'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:686:in `catch'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:686:in `block in default_terminator'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:200:in `block in halting'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:595:in `block in invoke_before'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:595:in `each'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:595:in `invoke_before'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/callbacks.rb:106:in `run_callbacks'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/execution_wrapper.rb:143:in `complete'\n/Users/will/.gem/ruby/3.2.0/gems/activesupport-7.0.3/lib/active_support/execution_wrapper.rb:137:in `complete!'\n/Users/will/.gem/ruby/3.2.0/gems/actionpack-7.0.3/lib/action_dispatch/middleware/executor.rb:15:in `block in call'\n/Users/will/.gem/ruby/3.2.0/gems/rack-2.2.4/lib/rack/body_proxy.rb:29:in `close'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-rack-0.1.6/lib/protocol/rack/body.rb:45:in `block in wrap'\n<internal:kernel>:90:in `tap'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-rack-0.1.6/lib/protocol/rack/body.rb:44:in `wrap'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-rack-0.1.6/lib/protocol/rack/response.rb:66:in `wrap'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-rack-0.1.6/lib/protocol/rack/adapter/generic.rb:108:in `call'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-http-0.23.10/lib/protocol/http/middleware.rb:50:in `call'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-rack-0.1.6/lib/protocol/rack/rewindable.rb:75:in `call'\n/Users/will/.gem/ruby/3.2.0/gems/protocol-http-0.23.10/lib/protocol/http/middleware.rb:50:in `call'\n/Users/will/.gem/ruby/3.2.0/gems/async-http-0.59.2/lib/async/http/server.rb:67:in `block in accept'\n/Users/will/.gem/ruby/3.2.0/gems/async-http-0.59.2/lib/async/http/protocol/http1/server.rb:62:in `each'\n/Users/will/.gem/ruby/3.2.0/gems/async-http-0.59.2/lib/async/http/server.rb:56:in `accept'\n/Users/will/.gem/ruby/3.2.0/gems/async-io-1.34.0/lib/async/io/server.rb:32:in `block in accept_each'\n/Users/will/.gem/ruby/3.2.0/gems/async-io-1.34.0/lib/async/io/socket.rb:73:in `block in accept'\n/Users/will/.gem/ruby/3.2.0/gems/async-2.0.3/lib/async/task.rb:255:in `block in schedule'\n"
}
} If there's anything else I can do to help shed some light on this, I'm game. |
Thanks your report was really useful. |
So, I played around with this and could reproduce the the same error/failure. ruby-concurrency/concurrent-ruby#962 (comment) What we need to know is whether the Rails code does the same thing i.e. does it execute the acquire and release on different threads. If anyone can reproduce the issue, can you monkeypatch the Rails file module ActionView
class CacheExpiry
class Executor
def initialize(watcher:)
@execution_lock = Concurrent::ReentrantReadWriteLock.new
@cache_expiry = ViewModificationWatcher.new(watcher: watcher) do
clear_cache
end
end
def run
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
@cache_expiry.execute_if_updated
Console.logger.info(self, "Acquire read lock", thread: Thread.current, backtrace: Thread.current.backtrace)
@execution_lock.acquire_read_lock
end
end
def complete(_)
Console.logger.info(self, "Release read lock", thread: Thread.current, backtrace: Thread.current.backtrace)
@execution_lock.release_read_lock
end
private
def clear_cache
Console.logger.info(self, "With write lock", thread: Thread.current, backtrace: Thread.current.backtrace)
@execution_lock.with_write_lock do
ActionView::LookupContext::DetailsKey.clear
end
end
end |
@ekampp I re-read your initial issue description, and it turns out there are two issues - the first one which was recently reproduced by @willcosgrove and the 2nd one which is a bug in Ruby autoload. The autoload bug is fixed by forcing Rails to load everything at boot. The 2nd issue is solved by Ruby 3.2 or |
After looking at the backtrace for a while and reading the Executor documentation, I think I see part of the problem. The assumption is that the request is generated on the same thread as the response.
In order for this to have failed, some how the |
Okay, looking at the code for the CacheExpiry, there looks to be a background thread polling the watcher. Following this, I could make a synthetic repro: #!/usr/bin/env ruby
require 'concurrent'
require 'async'
lock = Concurrent::ReentrantReadWriteLock.new
Q = 0.01
watcher = Thread.new do
loop do
lock.with_write_lock do
sleep Q
end
sleep Q
end
end
Async do |task|
2.times do |i|
task.async do
loop do
lock.with_read_lock do
sleep Q
end
sleep Q
end
end
end
end It seems like the problem is in concurrent-ruby now. |
Okay, I was able to propose a fix: ruby-concurrency/concurrent-ruby#962 (comment) |
Thanks for digging into this! I applied your monkey patch listed here ruby-concurrency/concurrent-ruby#962 (comment) and I got some new behavior. I still get errors, but the server doesn't hang. Before, when this would happen, the server would completely hang up, and not serve any other requests. This still errored, but the server would still serve subsequent requests. I refreshed several times, still with sporadic errors, and then it eventually settled into a steady state of no errors. I recorded a screencast to try and capture the behavior. Pay no attention to the contents of my silly little demo app 😇 https://wco.sg/v6LpIc You can see that it starts out loading with sporadic errors (which show up as images not loading, the images are served through the app from active storage). It eventually gets to a point where I cannot get it to error. Then I kill and restart the server and the behavior starts over. |
I also experienced the hanging behaviour in my test case, after the concurrent ruby lock fails, it stops working for everything else too. I'm just fixing another bug, once I'm done, I'll review your updates in more detail! Thanks. |
Just chiming back in to report that I'm unfortunately still experiencing this even after updating concurrent-ruby to 1.2.0. Stacktrace:
If there's any other information I can provide to help debug this, let me know. |
Hmmm, thanks for this, I'll take a look. |
I can't get the above test script to fail with v1.2.0 of concurrent-ruby, so it must be something else. |
Ah, great, I was able to reproduce this with the default Rails website |
I did the monkey thing and patched it: class Executor
def initialize(watcher:)
@execution_lock = Concurrent::ReentrantReadWriteLock.new
@cache_expiry = ViewModificationWatcher.new(watcher: watcher) do
clear_cache
end
end
def run
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
@cache_expiry.execute_if_updated
Console.logger.info(self, "run:acquire_read_lock")
@execution_lock.acquire_read_lock
end
end
def complete(_)
Console.logger.info(self, "complete:release_read_lock")
@execution_lock.release_read_lock
end
private
def clear_cache
Console.logger.info(self, "clear_cache:with_write_lock")
@execution_lock.with_write_lock do
ActionView::LookupContext::DetailsKey.clear
end
end
end
We can see the |
It looks like something we might be able to fix in Acquire
Release
|
Okay the root cause is not just that the concurrent-ruby lock wasn't working, but that it's too specific. In the case of Rails, it's not so much a lock as it is a barrier to prevent the cache being cleared while there are requests in flight. I made a PR to rails to simplify this to use a counter, which should in theory fix the issue. Local tests look okay to me, but maybe you can monkey patch it into your stack and see what happens? |
@ioquatix it looks like that might have fixed it for me! I haven't been able to reproduce it since patching in your Rails change, and I was previously able to get it to break in under a minute of just refreshing my app. I did run into a new issue though 😄 Looks like something may be off with # config/application.rb
# https://guides.rubyonrails.org/configuring.html#config-active-support-isolation-level
config.active_support.isolation_level = :fiber Thanks so much for digging into this! 🙏 |
Unfortunately I realised the current implementation is not re-entrant for the write lock mechanism. Back to the drawing board. |
Okay, I managed to figure out how to implement the |
Okay, apparently the correct way to fix this is here: rails/rails#47347 Hopefully that's sufficient, I'll test it after it's merged. |
Okay, I've confirmed this is fixed by the above PR which is now merged. Please open a new issue if you see the problem again. |
@ioquatix, thank you for looking into this, and persistently working to fix it! 🤝 |
Here is the request log from Rails 7.0 and Postgres 14:
Or alternatively:
They seem to appear when requests are coming in simultaneously from the same client.
Any thoughts?
Let me know if there's anything else that I need to provide.
Thanks 🤝
The text was updated successfully, but these errors were encountered: