-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread safety problem in template loading: "can't add a new key into hash during iteration" #24627
Comments
Looking at the code here, I'm wondering if the issue has something to do with the rather clever nested hashmap thing that's going on there. It seems that the series of proc defaults is constructing an ever deeper set of nested maps. Since @data[key][name][prefix][partial][locals] To this: @data[[key, name, prefix, partial, locals]] It's also possible that this is actually a bug in concurrent-ruby. |
Also of note: it seems that this is a regression. This issue did not occur with earlier versions of Rails, it might be due to the switch from |
cc @matthewd |
Looks like those |
No, looks like the default blocks should be protected by the write mutex. |
I think the iteration is occurring on a different thread, which I imagine would make this a concurrent-ruby issue. @jdantonio @thedarkone should C::Map protect us from this problem (one thread manipulating it while another iterates)? It feels like it should.. but I also may be completely misreading what's going on. |
This shouldn't be happening (obviously 😢). @matthewd you are correct in assuming the iteration must be happening on a diff thread, I can't image it to be an in issue of nested On the other hand, the iteration from the other thread should also never cause this! Since the reported issue is happening on MRI, where I also just torture tested iteration with insertion on MRI and couldn't reproduce 😿.
@jnicklas can you make sure you're not using some code that perhaps pries open PS: Rails doesn't iterate the said |
Only obvious spot where we iterate is when we The other option is that we're being rehashed (that also iterates). That happens when we're duped, so probably e.g. (Confirm, C::Map should give the iterating thread something like a share lock to protect iteration.) |
@jeremy wow, thanks, I did not know that. I should dig into MRI's sources to investigate. Share lock is problematic - one of the design goals of I always assumed that
This is indeed a disaster as I also assumed that |
Maybe it is the same problem with rails/sprockets#242? |
👍 Looks like
If nobody else is iterating, it does just clear the table. Otherwise it also iterates and marks each entry deleted.
I'm not sure the disaster is occurring 😁 Just looking for points where we could have someone else iterating since there are so few possibilities here. |
@rafaelfranca Looks like we aren't handling concurrent cache access at all in the Sprockets loader :) |
I've created a ticket over at concurrent-ruby ruby-concurrency/concurrent-ruby#528. @jeremy, if you have the time, can you take a look at what I have there? |
@thedarkone Thank you very much for jumping on this so quickly! Your insight is always extremely valuable. I'll continue the conversation on the c-r thread. |
@thedarkone Looking good @ ruby-concurrency/concurrent-ruby#529 We can bump concurrent-ruby dep to 1.0.2 once that's merged & released. |
This needn't block 5.0.0 release candidate, though it is highly desirable to coincide with release since it's a regression from apps' point of view. Removing milestone. |
It seems that the fix pushed to concurrent-ruby did resolve this issue for us. |
Good to hear! I'm out of town for work right now. I'll publish a patch release this weekend. |
c-r 1.0.2 has been released. It has the aforementioned fix. |
Bumped to c-r 1.0.2 @ 06dc3fb |
…sh during iteration" Resolved by ruby-concurrency/concurrent-ruby#529 Fixes #24627.
Thanks for the hard work on this fix! I see it's been folded into the Rails 5 release, but with a similar upgrade to c-r 1.0.2 with Rails 4.2, I'm still experiencing the same issue. Any guidance? |
Guidance: troubleshooting continues 😊 We're doing s bit of shadow boxing
|
@arktisklada if you are talking about the sprockets issue, it is not related to this one. |
@rafaelfranca thank you ❤️! I just spent an hour digging through MRI's @arktisklada rails/sprockets#242 has nothing to do with this issue or |
@thedarkone & @rafaelfranca you're absolutely right, it's the exact same error message but a different backtrace. It's from Sprockets, and will continue the dialog there. Need to pay a little more attention to detail before sending someone off into a code hole |
We're still seeing this a ~dozen times a day. Reopening pending 0 exceptions. |
@jeremy to confirm, you are seeing this (non-sprocket) error on a Rails app with c-r 1.0.2? |
@thedarkone Correct. I'll narrow it down. |
@jeremy if you are able to deploy some temporary logging in the form of: rescue RuntimeError => e # "can't add a new key into hash during iteration" is a RuntimeError
if e.message == "can't add a new key into hash during iteration"
# lets see if we are able to pin down the culprit
Thread.list.each do |t|
puts t.backtrace.inspect
end
end
raise this could help track down the "iterator" thread? |
Tracked this down to a separate issue with hash modification during iteration in Jbuilder. All good, @thedarkone. |
This issue still occurs randomly from time to time, I don't know why but everything work fine until it happens without any reason ! it's rare but it happens |
@jimchild49 have you been able to workaround it? |
I've seen a similar error, but on activemodel, I've been trying to repro but no luck
currently in rails |
Got the same issue on on Rails 5.2.0.rc2 It happens randomly and a page reload fixes it until it happens again. |
@thedarkone : I can reproduce this pretty regularly. It requires 2 simultaneous connections requiring asset recompilation (indicated by threads number 3 and 5 in my stacktrace file) Here's the stacktrace after applying your temporary logging: |
@enrico, seems my logging isn't enogh :(. I can't tell from your Could you please try additionally capturing the rescue RuntimeError => e # "can't add a new key into hash during iteration" is a RuntimeError
if e.message == "can't add a new key into hash during iteration"
# lets see if we are able to pin down the culprit
puts 'e.backtrace:'
puts e.backtrace.inspect
puts 'Threads:'
Thread.list.each do |t|
puts t.backtrace.inspect
end
end
raise |
@thedarkone , here's the new exception.txt I changed the rescue clause like this (wrote this to a file in addition to console, since I was afraid of missing the console log): rescue_from RuntimeError do |e|
if e.message == "can't add a new key into hash during iteration"
File.open('exception.txt','a') do |line|
# lets see if we are able to pin down this exception's culprit
line.puts '*' * 128
line.puts 'e.backtrace:'
line.puts e.backtrace.inspect
line.puts '*' * 128
line.puts 'Threads:'
puts '*' * 128
puts 'e.backtrace:'
puts e.backtrace.inspect
puts '*' * 128
puts 'Threads:'
Thread.list.each do |t|
line.puts "Thread #{t.name}:"
line.puts t.backtrace.inspect
line.puts '*' * 128
puts "Thread #{t.name}:"
puts t.backtrace.inspect
puts '*' * 128
end
end
exception = true
end
raise
end |
@thedarkone, I have one more stacktrace for you . I changed the code to get a newline format of the backtrace. File.open('exception.txt','a') do |line|
# lets see if we are able to pin down this exception's culprit
line.puts '*' * 40
line.puts 'e.backtrace:'
e.backtrace.each do |l|
line.puts l
end
line.puts ''
line.puts '****** Threads:'
Thread.list.each do |t|
line.puts "******* Thread #{t.name}:"
t.backtrace.each do |l|
line.puts l
end
end
end It looks to me like these are the 2 threads you should focus on:
and
|
@enrico I've had a look at the backtraces, I think the error stems from multiple threads simultaneously going into The conflict can also stem from other accesses to The whole sprockets code base is not very thread safe. I'm not sure how you'd want to proceed from here, as a simple band-aid I'd suggest a simple mutex on Sprockets compilation, by changing diff --git a/lib/sprockets/cached_environment.rb b/lib/sprockets/cached_environment.rb
index a6ac3690..5c025282 100644
--- a/lib/sprockets/cached_environment.rb
+++ b/lib/sprockets/cached_environment.rb
@@ -14,10 +14,12 @@ module Sprockets
def initialize(environment)
initialize_configuration(environment)
+ @load_monitor = Monitor.new
+
@cache = environment.cache
@stats = Hash.new { |h, k| h[k] = _stat(k) }
@entries = Hash.new { |h, k| h[k] = _entries(k) }
- @uris = Hash.new { |h, k| h[k] = _load(k) }
+ @uris = Hash.new { |h, k| @load_monitor.synchronized { h[k] = _load(k) } }
@processor_cache_keys = Hash.new { |h, k| h[k] = _processor_cache_key(k) }
@resolved_dependencies = Hash.new { |h, k| h[k] = _resolve_dependency(k) } (this only solves some issues and only for cached env that is supposed to be used in production) |
@thedarkone ,there's a typo, it should be In any case, this is not really an issue for me, since I do precompile all of my assets before a deploy. I just happen to see this issue often in development since I have some JS code that will cause a simultaneous page refresh from multiple browser tabs, increasing the likelihood of this bug showing up on my dev box after I've changed some scss files. In the meantime, I've patched my local copy of cached_environments.rb as you suggested and will report my findings later... I do think this bug should stay open, though. |
@thedarkone : so after patching my local copy of cached_environments.rb, I thought the problem was fixed. Normally I would see the exception on a pretty consistent basis, but it had disappeared until today, so I think it did help...I'm attaching another stack trace file in case it sheds some more light on the problem... |
@enrico the last backtrace doesn't help :(, notice how (per its backtrace) the other thread ( So what must be happening is either: the business of throwing exception, obtaining its backtrace, or getting other threads' backtraces -- allows other threads to make progress. This means that when we do get around to asking other threads for their current location via |
hope it will help:
|
I'm honestly still not sure how to trigger this error, however the root cause looks like two threads trying to mutate the same values from the cache. rails/rails#24627 The issue should be mitigated by copying the set to a local copy via `dup` and then mutating that. To accomplish this, two helper methods have been added.
I'm honestly still not sure how to trigger this error, however the root cause looks like two threads trying to mutate the same values from the cache. rails/rails#24627 The issue should be mitigated by copying the set to a local copy via `dup` and then mutating that. To accomplish this, two helper methods have been added.
I'm honestly still not sure how to trigger this error, however the root cause looks like two threads trying to mutate the same values from the cache. rails/rails#24627 The issue should be mitigated by copying the set to a local copy via `dup` and then mutating that. To accomplish this, two helper methods have been added.
Steps to reproduce
This is a thread safety, as such it cannot be reliably reproduced.
Expected behavior
Template loading and compilation is performed lazily, even when
config.eager_load = true
. Since this needs to work in production where multiple threads might try to load a template at the same time, it needs to be thread safe.Actual behavior
Template loading is not thread safe, occasionally causing the following error:
See full trace here: https://gist.github.com/jnicklas/21b570f19c50d958f5024566dfd052f0
System configuration
Rails version: 5.0.0.beta3 (rev db9bc80)
Ruby version: ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-darwin15]
The text was updated successfully, but these errors were encountered: