Rails5, concurrency in action method, deadlock, docs needed including for executor/reloader #26847

Open
jrochkind opened this Issue Oct 20, 2016 · 21 comments

Projects

None yet

3 participants

@jrochkind
Contributor
jrochkind commented Oct 20, 2016 edited

I sometimes need/want to use some basic threaded concurrency inside a rails action --
execute some things concurrently, wait for them all to complete before returning
the response.

Here is a basic isolated fake example, as used in this demo app:

  # action method
  def example
    futures = 3.times.collect do
      Concurrent::Future.execute do
          SomeWorker.new.value
        end
    end

    @example_values = futures.collect(&:value)
  end

This is a useful thing to do you really do need to wait for those "SomeWorker"
results to return a response, but they are a bit slow, and can be performed
concurrently/in parallel. (Even with MRI GIL this is effective when they are
slow because of IO).

In Rails 4.x, under default configuration in an app, in development mode,
this would mostly work fine, but would occasionally raise a weird
exception involving "class definition has changed" or something, I'm afraid I
forget the exact exception class/message and can't find it.

In Rails 4.x, if you correct set up the under-documented and inter-related
relevant config keys -- config.eager_load, config.auto_load and
config.cache_classes -- you could make it work entirely correctly, probably
by giving up development-mode class reloading though.

In Rails 4.x, it works fine under default production configuration, and as
far as I know does in Rails 5 too.

But in development mode, tn Rails 5.0, due to new autoloading logic, the "failure mode" has changed --
if you don't configure things correct, instead of (Rails 4) getting mostly working
but occasionally weird exception -- you get (Rails 5) the worker thread hanging
forever (presumably a deadlock).

My reproduction/demonstration app has config/development.rb set up to use ENV vars
to make testing under various config combinations more convenient.

Findings:

  1. Under default generated app config

    config.cache_classes = false
    config.eager_load = false
    config.auto_load => not explicitly set
    

    Rails worker thread hangs forever on first request.

  2. CONF_EAGER_LOAD=false CONF_CACHE_CLASSES=true

    Same, hangs forever on first request.

  3. CONF_EAGER_LOAD=true CONF_CACHE_CLASSES=false rails serve

    Works fine on initial requests. However, if you change
    the source of a file on disk, that gets referenced in
    one of the child threads, on next request the worker
    thread will hang forever.

  4. CONF_EAGER_LOAD=true CONF_CACHE_CLASSES=true

    This is the magic configuration that makes it work, at the cost of
    giving up on development-mode class reloading, you need to restart
    the app to pick up any changes.

Additional irrelevant config or logic:

  • The setting for config.auto_load is irrelevant. In each of the above 4 cases,
    whether config.auto_load was set to true, false, or unset (use default) made
    no difference, config.eager_load and config.cache_classes were the only
    settings that seemed to matter for this behavior.

  • I was aware of Rails.application.reloader from seeing it used in a
    sidekiq change. Without really
    understanding it, I tried wrapping my child threads like so:

    Concurrent::Future.execute do
      Rails.application.reloader.wrap do
        SomeWorker.new.value
      end
    end
    

    That made no difference either, all observed behavior was the same in
    above 4 configuration combinations, with or without this wrap. I don't
    know if there's a more useful way to use the reloader for this use case,
    or if it's just irrelevant.

Issues for Rails team:

  1. Is this a bug?

    I suspect the answer will be "no" -- it is no longer supported to use
    threads like this without turning off development-mode class reloading
    entirely.

    This is a bit annoying, as I much preferred having it mostly working
    with occasional exceptions that made me restart the app, instead of
    needing to turn off dev-mode reloading to get it to work at all, and
    always having to restart the app to pick up changes.

    However, it may be that's just how it is, as sad as it makes me. Nevertheless:

  2. Can failure mode be better?

    The "failure mode" for doing it wrong of hanging forever/deadlocking
    is pretty annoying and confusing. It can take someone quite a while
    to figure out what's going on, and that it's even related to auto-loading
    at all. Especially with concurrency getting easier to use for less
    experienced developers (hooray ruby-concurrent), I can see people
    getting really stuck here they accidentally do it 'wrong' and get a
    mysterious deadlock.

    Is there any way to get a better failure mode, an actual quick fail with
    an exception with a useful class/message, instead of a hang forever/deadlock?

    Whether this is considered a 'bug' or a 'new feature' has to do with the
    intentions of whoever wrote the new autoloading stuff, I guess. I couldn't
    say.

  3. Documentation.

    All of these things need better documentation -- which has been
    true pre-Rails5 too, but with changes those of us who kind of
    sort of figured it out in Rails <5 could use it too.

    Save others the several hours I spent debugging my deadlock, investigating
    the issue, resulting in this here you are reading.

    I can't submit a doc PR myself, because I really don't understand
    well enough what's actually going on, or the intended behavior/configuration.

    Some suggested doc needs:

    • config.eager_load, config.cache_classes and config.auto_load have
      always been poorly-doc'd, especially their interactions with each other,
      and their defaults, which seem to depend on how others of them are set.
      Not only poorly doc'd, but fairly high 'churn' changing from Rails version
      to version.

      • For my use case, I only avoided deadlock with eager_load and
        cache_classes both true. Is there any reason at all to have
        one true and the other false? In what circumstances might
        this make sense, and what does it do?

      • Does config.auto_load do anything at all in Rails5, or has it
        become a no-op? It didn't seem to have an effect on my problem
        case, but maybe it does in other cases? Or is it gone?

      • If it's true that you need to turn off dev-mode reloading in
        order to use threads inside a Rails action method (inside the request loop),
        and it's true that you do that with eager_loading and cache_classes
        both set to true, then some documentation to that effect would be
        welcome, and would save people some confusing debugging time.
        If i havne't actually figured out the right/best/only way to do this,
        and there are other options -- I'm prob not the only one who would
        appreciate some docs!

      • Is there anyway to turn off class reloading, to be concurrency-safe
        for the auto-loader, but still have view template (erb) reloading? I remember
        there was in an old Rails version (3?), but the config seems to have
        disappeared.

      • test enviornments. Default generated Rails5 test env
        is cache_classes==true, eager_load==false. Rails 4
        had some generated comments in config/test.rb
        about eager_load: " If you are using a tool that
        preloads Rails for running tests, you may have to set it to true."

        A bit confusing and in my experience not entirely reliable advice in Rails 4.
        I believe a "tool that preloads rails for running tests" basically
        means "Capybara". Rails5 no longer generates this hint, but still defaults
        eager_load to false. Is this going to cause problems with capybara?

        Concurrency problems with capybara are super frustrating to debug
        and figure out how to deal with them, so some instructions
        here would be much appreciated. Especially since the default setting
        is one that under my use case above still caused deadlocks.

    • Rails.application.reloader

      The only reason I knew this even existed, or anything at all about how to use
      it, is from the sidekiq pr.

      The only reason it's in a sidekiq PR is because someone from Rails core
      team gave sidekiq hints/code, it would be unlikely for anyone who doesn't
      already know the code to have known that it should be used, and the correct
      way to use it.

      It may be that Rails.application.reloader is entirely irrelevant to
      my use case -- some docs explaining what it is and how it use it could
      have saved me some time in figuring that out myself by experimenting.

      But clearly there are some appropriate use cases for it in non-Rails-core code,
      like in sidekiq. some docs are needed so people not on Rails core team know when to use it,
      and how.

      Tagging @matthewd , because he suggested in the sidekiq PR that I open
      a Rails issue on documentation of the "executor/reloader API", and I'm
      hoping he has some input on the general reloading/concurrency issues
      above, since they look pretty intimately related to the executor/reloader
      in Rails5.

@jrochkind jrochkind referenced this issue in ruby-concurrency/concurrent-ruby Oct 20, 2016
Open

Rails5, auto-loading, future, circular dependency #585

@matthewd
Member
matthewd commented Oct 21, 2016 edited

Briefly for now, because I'm about to get on a plane:

1 No it's not a bug, but it is indeed unfortunate. The good news is that doesn't need to make you sad: you can wrap your futures.collect line with permit_concurrent_loads (under-documented, you say? 😅) -- that will both prevent the deadlock and prevent the occasional exceptions.

The one-line description for when to use permit_concurrent_loads is: wrap blocking operations; the wrapped block must not access any "user-defined" constants, which could trigger an autoload (at risk of the Rails 4 style exceptions).

2 The short answer is no. There's some relevant discussion in #24028.

The problem is that we don't know the other thread is blocked (and not just legitimately doing some work).

5.0.1 will have #25344, which helps explain what/where/why you're getting stuck... but you still need to recognise that's what's happening, and seek it out. 😕

3 Yes, we need docs 😀

@jrochkind
Contributor

Thanks @matthewd, I'll spend some time with your links. But yes, docs, docs, please please please.

; the wrapped block must not access any "user-defined" constants, which could trigger an autoload (at risk of the Rails 4 style exceptions).

I'm confused. As far as I can tell, I only have the deadlock (?) issue with user-defined constants -- I thought the problem was triggering an autoload of an autoloadable constant. (By "user-defined", you mean "autoloadable", right?)

If you give me a solution, but it only works if the wrapped code doesn't reference any autoloadable constants... that seems to be a solution to a problem I'm not having? Or are you saying even without referencing autoloadable constants I'd have problems I just haven't noticed yet?

Alas, in my case, I'm not sure it's possible to guarantee avoiding autoloadable constants. it's very hard to keep track for sure of whether you are accessing autoloadable constants or not, of course -- normally, whether a particular constant is autoloadable is pretty much an invisible abstraction, you don't need to care about it at all. Having to care about and make sure a certain block of code (and all code it calls directly or indirectly) does not reference any autoloadable constants... I'm not sure how feasible it is really.

Although if the danger of messing up is just a rails4-style exception... that seems potentially tolerable, at least no worse than rails4!

I'll play around with it. But yeah, I never would have found permit_concurrent_loads without you telling me about it, and I don't think anyone else will either.

Docs are really needed, maybe even a "concurrency guide" or something. Unfortunately, I think there are probably currently very few people who understand what's up enough to write such a guide -- you may even be the only one? Which is both why good docs are neccesary (so other people can develop the needed understanding)... and why it seems to be not happening.

@matthewd
Member

Oops, I missed the rather critical words "on that thread". The value/wait call you're doing can't itself trigger an autoload, and is thus perfectly safe.

I will be writing docs.. just have to get the much-overdue 5.0.1 out first.

@jrochkind
Contributor

Aha, permit_concurrent_loads basically releases the exclusive autoloading lock? Perfect! Sweet, I even almost asked if there was a way to do that.

So... how do I get the appropriate instance of an ActiveSupport::Dependencies::Interlock to call permit_concurrent_loads on? The only entry point I know into the new autoloading stuff is Rails.application.reloader. That object has no permit_concurrent_loads method, and strangely and unhelpfully Rails.application.reloader.class is just Class, so I'm not even sure what to look at to see if whatever the Rails.application.reloader is has a reference to an Interlock.

Also, is it also still wise to wrap my Future(/Thread) body in a Rails.application.reloader.wrap? And this won't force them all to run serially, it will still allow them to run concurrently?

Also, regarding the annoying failure mode of hanging forever -- what if there was an optional configurable 'max wait' value, for waiting on the autoload lock? And it was by default in development (which is the only place this ordinarily matters, right?) set to 4 or 5 seconds or something. And if maxwait was reached, it would raise an exception "Couldn't get autoload lock, for more info see (url to hypothetically future existing docs on this)"? A developer could always raise the maxwait or disable it completely when running into this, but it would be a hint for the newcomer that at least it's got something to do with autoload and concurrency and locks, instead of just a mysterious deadlock.

@jrochkind
Contributor

Okay, I think I figured it out by spelunking through Rails source:

a) I get the appropriate interlock with ActiveSupport::Dependencies.interlock? I might suggest you delegate this in the Application.reloader, maybe Rails.application.reloader.release_autoload_lock do or something? I feel dirty having to touch multiple global objects, and unsure if I'm using public API or something likely to break without notice -- I think this function does need to be in public API, and perhaps the public API can be centralized one one accessible object?

b) I do need to put the future body in a Rails.application.reloader.wrap do, or I'm at risk of #<RuntimeError: Circular dependency detected while autoloading constant SomeWorker> from the threads.

But when I do those things, it does seem to work! Sweet! Thanks for designing the internals so well.

@jrochkind
Contributor

Deleted my last comment about apparent serialization, that wasn't going on, mistake in my own instrumentation. Indeed all seems to be well as above. Sorry for the noise, concurrency is hard!

@jrochkind
Contributor

But crap! First load works fine even in dev-mode settings with recommended use of ActiveSupport::Dependencies.interlock.permit_concurrent_loads on blocking wait for futures, and Rails.application.reloader.wrap do inside future body...

But if I change the autoloaded source file on disk, and request again... it seems to still deadlock. Darnit. Okay, enough of this for the night, I'll wait for some feedback.

@matthewd
Member

AS::D.interlock is indeed the thing you need.

Getting this stuff onto the same 'level' of API is tricky. Technically, you could achieve what you want just with the interlock, using interlock.permit_concurrent_loads and interlock.running... the latter is what the executor (which is invoked by the reloader) calls. The challenge is that the executor/reloader are general "this lifecycle thing is happening now" callback hooks; releasing the lock that prevents loads is a concrete low level action, and one that necessarily obliges the caller to have a deeper understanding of what they're actually asking for.

(You don't actually want to use interlock.running, though, because the other things that hook the executor are also desirable... releasing any acquired AR connections, for one.)

Ideally, we might arrange a mechanism to hook c-r's wait, so we can automatically/always do it for you.


Ah, you probably want to wrap with the executor, not the reloader. The reloader... reloads, which isn't necessary or desirable here. You want the reload to occur on the next request, not at the start of one of your futures.

That's my fault for not noticing which one you were using earlier -- and something that'll be made very clear by the future docs: wanting the reloader is rare, and only applies to things handling "top level" actions -- like Action Controller, and Sidekiq.

(The problem you're hitting is that when a reloader decides to reload, it has to wait for all in-flight executor work to complete, and unlike an autoload, it can't take advantage of permit_concurrent_loads -- it needs them to be Really Finished.)

@jrochkind
Contributor

Okay, sweet! So I do want ActiveSupport::Dependencies.interlock.running?

Here's what I have now, which seems to be working!

I've left in my basic instrumentation that demonstrates the futures are really executing concurrently, just to copy-and-paste to make sure I don't make any errors in pasting in.

Can you kindly confirm that this code looks right? This seems to be exactly what I need, which is awesome.

def example
    s_time = Time.now
    futures = 3.times.collect do |i|
      Concurrent::Future.execute do
        ActiveSupport::Dependencies.interlock.running do
          puts "Thread #{i} starting in #{Time.now - s_time}"
          sleep 3
          SomeWorker.new.value
       end
      end
    end

    ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
      @example_values = futures.collect(&:value)
    end

    @t_time = Time.now - s_time
  end
@jrochkind
Contributor

(I'm not sure what ruby weirdness is going on that ActiveSupport::Dependencies.interlock.class and Rails.application.reloader both == '#<Class...', rather than a specific class name like I expected. It does make 'reverse engineering' what's going on to look up the source code more challenging. Perhaps source-code docs onActiveSupport::Dependencies.interlockandRails.application.reloader` stating the actual class of the object they hold is all that can be done.

@matthewd
Member

So I do want ActiveSupport::Dependencies.interlock.running?

You're probably better off with Rails.application.executor.wrap -- which will ultimately call interlock.running, and thus behave the same for purposes of load locking. It just also gives you some other behaviours that are likely to help. (Again, the obvious one that comes to mind is AR pool management.)


Interlock is an ordinary class, so I'd expect it to inspect with its name. The reloader and executor are trickier -- they're anonymous subclasses of the real thing. But we could at least override inspect.

@jrochkind
Contributor
jrochkind commented Oct 22, 2016 edited

Okay, awesome, so I want Rails.application.executor.wrap -- do I still want ActiveSupport::Dependencies.interlock.permit_concurrent_loads for the other part, or is there a way to use Rails.application.executor for that too? (if not, should there be?)

the obvious one that comes to mind is AR pool management.

Can you say more about this? I'm used to having to wrap anything that might use AR in a created thread in a ActiveRecord::Base.connection_pool.with_connection -- are you saying I don't need to do this anymore if it's in a Rails.application.executor.wrap? That'd be interesting and nice.

And all this stuff (wrapping in Rails.application.executor.wrap and ``ActiveSupport::Dependencies.interlock.permit_concurrent_loads) is safe and relatively low negative performance impact in production (ie witheager_load == cache_classes == true`), true?

@matthewd
Member

do I still want ActiveSupport::Dependencies.interlock.permit_concurrent_loads for the other part

Yes. That's where the high-level vs low-level API issue comes in. I agree it feels weird to use different things on each 'side'... but the executor doesn't know anything about load locks.

I'm used to having to wrap anything that might use AR in a created thread in a ActiveRecord::Base.connection_pool.with_connection -- are you saying I don't need to do this anymore if it's in a Rails.application.executor.wrap?

Yes. Active Record registers a 'complete' hook on the executor to return outstanding connections to the pool. (As of 4.2, AR was already able to reclaim connections from threads that had died/completed... but that doesn't apply here, because your futures are running on long-lived c-r pool threads.)

This sort of thing is why the executor is intended to be a higher level API: it's not about load locks, or connection pools, or anything else. It's about "I'm going to do stuff now" -- allowing us (and more generally, any other library that wants to register a handler) to abstract away those details, in a similar way to the middleware stack on a web request.

And all this stuff (wrapping in Rails.application.executor.wrap and ActiveSupport::Dependencies.interlock.permit_concurrent_loads) is safe and relatively low negative performance impact in production (ie with eager_load == cache_classes == true), true?

True. At least, it's certainly supposed to be: the executor doesn't even get told about the lock... and while permit_concurrent_loads will still [needlessly] do its thing, that's just a couple of low-contention lock acquisitions.

@jrochkind
Contributor
jrochkind commented Oct 22, 2016 edited

Awesome. I really like the design of this, and the tricky implementation seems solid. I agree that an "I'm going to do stuff now, please let me do it concurrency-safely" public API was totally needed, thanks for providing one with the executor, that's totally nice.

I do think it would be better if you figured out a way to structure the API so you didn't need to access two different objects to get the permit_concurrent_loads -- it feels like I'm accessing implementation-dependent details subject to change, rather than a solid public API.

I think perhaps you initially thought permit_concurrent_loads would be unlikely to be needed by end-developers, but I suggest this use case shows that indeed it is needed as public API.

So, just to make sure I have it right before I put it in my code, blog about it, and tell everyone about it, can you sign off on this implementation as a correct, as-intended, use of the new executor stuff? I really appreciate the hints @matthewd .

  def example
    futures = 3.times.collect do |i|
      Concurrent::Future.execute do
        Rails.application.executor.wrap do
          SomeWorker.new.value
       end
      end
    end

    ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
      @example_values = futures.collect(&:value)
    end
  end
@jrochkind
Contributor
jrochkind commented Oct 22, 2016 edited

(And I do still suggest, if there was some easy way to make the wait-for-autoload-lock have a configurable max-wait timeout value that resulted in an exception saying "it's got to do with the autoload lock, friend!", it would be a really good solution to the problem of impenetrability of deadlock-as-failure-mode.

I'm less enthused by the rack middleware attempt, by the time you realize you need it, I'm not sure how much additional value it will give you, the hardest part is knowing you need it in the first place, although there perhaps could be some situations where you still need the middleware's help in figuring out exactly what's up).

@matthewd
Member

can you sign off on this implementation as a correct, as-intended, use of the new executor stuff?

👌🏻

configurable max-wait timeout value

Yeah, that does sound feasible -- especially if it just produced a warning. I'll add it to the list.

I'm less enthused by the rack middleware attempt

It helps a lot when you know the autoload lock is involved (e.g., you recognise the symptoms, or have received the above warning), but didn't just add a future.value call to your code: the autoloader is not the only thing that has a hidden lock it manages, so you can easily end up with two requests doing rather harmless-looking things, that actually deadlock against each other. Agreed it's not as informative when your single action is responsible for both halves of the deadlock.

Awesome. I really like the design of this, and the tricky implementation seems solid

Thanks! 😊

And thank you for exploring it, and asking lots of questions along the way; the above discussion will help a lot while I'm framing these things from the API consumer's perspective.

@jrochkind
Contributor
jrochkind commented Oct 23, 2016 edited

Thanks! So one more question, or really two parts:

  • In Rails5, should one always wrap work in threads one creates oneself (or via c_r) in a Rails.application.executor.wrap, regardless of autoloading situation (that is, regardless of value of cache_classes and eager_load)? (I think the answer is yes based on your explanation of the executor, it's a signal of intent, not just about autoloading).
  • But if you know (or have code that checks that eager_load == cache_classes == true, or that autoloading is def not happening, although I'm not sure you can actually disable that with config.autoload = false anymore), there's no reason to use ActiveSupport::Dependencies.interlock.permit_concurrent_loads, right? (I think there is not).
@jrochkind jrochkind added a commit to jrochkind/bento_search that referenced this issue Oct 24, 2016
@jrochkind jrochkind proper concurrency for Rails5
See rails/rails#26847 thanks @matthewd.

Prob needs a controller test to test that we're doing Rails5 concurrency
okay.
4720f24
@jrochkind
Contributor

And, if I call ActiveSupport::Dependencies.interlock.permit_concurrent_loads, and I wasn't in a rails action methods or a context where something actually already had an autoload lock reserved... is it a harmless no-op? I'm actually writing library code that may or may not be called in a rails action method, is it save to ActiveSupport::Dependencies.interlock.permit_concurrent_loads?

@matthewd
Member

In Rails5, should one always wrap work in threads one creates oneself (or via c_r) in a Rails.application.executor.wrap

Yes.

But if you know [..] that autoloading is def not happening, [..] there's no reason to use ActiveSupport::Dependencies.interlock.permit_concurrent_loads, right?

Right. But you needn't go out of your way to avoid it, because:

if I call ActiveSupport::Dependencies.interlock.permit_concurrent_loads, and I wasn't in a rails action methods or a context where something actually already had an autoload lock reserved... is it a harmless no-op?

Yes...ish. As currently written, when the wrapped block completes, if there is a thread currently holding the exclusive load lock (i.e., doing an autoload right now), permit_concurrent_loads will wait for it to finish, without realising that [because it didn't have a lock in the first place] it doesn't actually conflict.

@jrochkind
Contributor

Yes...ish. As currently written, when the wrapped block completes, if there is a thread currently holding the exclusive load lock (i.e., doing an autoload right now), permit_concurrent_loads will wait for it to finish, without realising that [because it didn't have a lock in the first place] it doesn't actually conflict.

Cool, that seems like no big deal probably. It will only happen in development configuration, and even then actually doing an autoload doesn't happen all the time. And (I think?) shouldn't take very long until the lock is released -- if the lock is only held for the actual autoload (rather than until the end of the entire interlock.running block? Even then, hey, it's just development mode).

@jrochkind
Contributor

And one last time, wow, nice job! I don't understand how you've implemented it, but when I was thinking of how it might be possible to get this stuff to work under concurrency in Rails... I kept thinking, yeah, that's just not feasible, I don't see how you'd do that. Thanks! Looking forward to docs, including what the executor.run might take care of for you in addition to autoloading safety and obviating the need for connection_pool.with_connection, which is all pretty sweet.

@matthewd matthewd self-assigned this Nov 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment