ActiveRecord query plan logging not thread-safe? #4671

Closed
jrochkind opened this Issue Jan 25, 2012 · 16 comments

Comments

Projects
None yet
2 participants
Contributor

jrochkind commented Jan 25, 2012

In a Rails 3.2 app, I am manually spawning threads in application code, wrapping all ActiveRecord calls in a #with_connection block, which the docs say is a recommended 'contract' with ActiveRecord: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html

However, the auto-explain-logging of slow queries seems not to be thread-safe, it raises, stack trace below. Note that the stack trace does verify that my AR use is inside a with_connection block.

#<NoMethodError: private method `warn' called for nil:NilClass>

["/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/explain.rb:32:in `logging_query_plan'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/relation.rb:158:in `to_a'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/relation/finder_methods.rb:377:in `find_first'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/relation/finder_methods.rb:122:in `first'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/relation/finder_methods.rb:335:in `find_one'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/relation/finder_methods.rb:311:in `find_with_ids'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/relation/finder_methods.rb:107:in `find'",
 "/home/rochkind/umlaut/app/models/service_wave.rb:71:in `block (3 levels) in handle'",
 "/home/rochkind/.rvm/gems/ruby-1.9.2-p290@umlaut3/gems/activerecord-3.2.0/lib/active_record/connection_adapters/abstract/connection_pool.rb:118:in `with_connection'",
 "/home/rochkind/umlaut/app/models/service_wave.rb:68:in `block (2 levels) in handle'"]

@ghost ghost assigned fxn Jan 25, 2012

Contributor

jrochkind commented Jan 25, 2012

Aha, wait, this has nothing to do with threads, after all.

This has to do with I had manually set in environments/development.rb

config.active_record.logger = nil

So questions:

  1. I thought this was the recommended way to turn off AR debugging. Is there a better way?
  2. Is it a bug that config.active_record.logger = nil combined with config.active_record.auto_explain_threshold_in_seconds will raise? setting active_record.logger = nil doesn't raise for any other circumstances I've run into, it simply suppresses AR logging.
  3. Is there any way to turn off logging of each individual query, while keeping explain logging of queries beyond threshold? I had config.active_record.logging = nil, because in my app, there was a lot of noise from logging every query making the logs harder to see.
Contributor

jrochkind commented Jan 25, 2012

BAH. But another bug that might be related to threading. It looks to me that when you have an auto_explain_threshold_in_seconds, and use AR in app-created-threads, then EVERY query gets logged as an expensive query, not just those over explain_threshold_in_seconds -- is the code for calculating the time of a query problematic under threading? i'm not sure what's going on, trying to figure it out, sorry, appreciate any guidance.

Owner

fxn commented Jan 25, 2012

@jrochkind I am working now, if you have any more findings please post them here. I'll check the issue asap.

Contributor

jrochkind commented Jan 25, 2012

Thanks @fxn. Looking at source, it looks like other calls to #logger in AR source are guarded by an "if logger" or "if @logger".

I think this one should be too, in #logging_query_plan, so if there's no logger, it'll be silent instead of raising. Perhaps the entire #logging_query_plan method should just yield and return unless logger?

If you need me to, I can prepare a pull request, although I don't have a lot of time either, if someone else can take care of it that would be awesome.

The other half of my problem: That, in multi-threaded use of AR, it seems to be logging 'explain's for queries that are NOT over the threshold --- remains a mystery to me, I have not been able to figure it out, or figure out under what conditions this is reproducible or actually is happening. But it appears to be. In my actual app, every single query executed in an app-created thread ends up logging an explain, even if the explain logged says it took 0.0ms to complete! But I have not been able to figure out why or what's going on.

Owner

fxn commented Jan 25, 2012

On Wed, Jan 25, 2012 at 8:04 PM, Jonathan Rochkind
reply@reply.github.com
wrote:

Thanks @fxn. Looking at source, it looks like other calls to #logger in AR source are guarded by an "if logger" or "if @logger".

I think this one should be too, in #logging_query_plan, so if there's no logger, it'll be silent instead of raising. Perhaps the entire #logging_query_plan method should just yield and return unless logger?

I am unsure about this one. The whole purpose of the threshold is to
log slow queries, so having the logger to nil, and auto logging
enabled is an inconsistent configuration. It could be the case that
you want logger = nil to win assuming the threshold is an overlook.
But it is also possible that the threshold is wanted and logger = nil
is an ovelook, in which case silencing would be misleading.

If we warn about the inconsistency the warning may be also overlooked.

I think I prefer to err so the user is forced to fix the
configuration, since the configuration needs to be fixed.

/cc @josevalim @tenderlove @jonleighton what do you think?

The other half of my problem:  That, in  multi-threaded use of AR,  it seems to be logging 'explain's for queries that are NOT over the threshold ---  remains a mystery to me, I have not been able to figure it out, or figure out under what conditions this is reproducible or actually is happening. But it appears to be. In my actual app, every single query executed in an app-created thread ends up logging an explain, even if the explain logged says it took 0.0ms to complete! But I have not been able to figure out why or what's going on.

Could you please provide a minimal application that reproduces this?

Contributor

jrochkind commented Jan 25, 2012

Your call what it should do, but I feel reasonably strongly that the
current behavior is NOT appropriate: raising with 'calling 'warn' on
nil' but only as a surprise in cases where the explain logging is
triggered. That seems a clear bug to me.

But my argument is that developers (like me) will occasionally toggle
active_record.logging on and off during development, when I want to
focus on other issues and AR logging is getting in the way. Under your
suggestion, every time I want to turn AR logging on or off, I have two lines
in my env.rb to edit instead of one, if I do want explain logging when
AR logging is turned on.

It also seems to me that every other call to logger I can find in AR
is guarded to no-op if there's no logger -- I am not sure if there are
other AR logging that is controlled by other config or not, but if
there is, I think that too is guarded to no-op if there's no logger,
because I could not find a logging statement that was not so guarded.
So it seems to me this one should too, but I dunno.

As far as the other issue -- I will try to create a reproducible test,
I have not been able to yet. If I do, I'll open a seperate issue? I
don't have privileges to change name of issues, but you might want to
change the name of this one to, say, "config.logger = nil conflicts
with config.config.active_record.auto_explain_threshold_in_seconds" to
be more accurate as to what this ticket has ended up reporting.

Thanks for your attention!

Contributor

jrochkind commented Jan 25, 2012

On the other issue (explain getting logged in multi-threaded env even when query is not slow), I am having trouble reproducing in a a minimal application.

One hypothesis I have is maybe: Even under ruby 1.9.3, in a busy multi-threaded app, checking a connection out of the connection pool can itself take more than the default explain threshold, and the timer ends up including that time when calculating if an explain is required. Maybe.

If someone knows the code and can say if this seems plausible, or has any other ideas, would appreciate feedback. Trying to reproduce in a minimal app, but, sadly like all multi-threaded issues, it's very hard to extract the minimal reproduction case out of my app. However, in my app, it is pretty deterministic, nearly every query inside a thread and a #with_connection ends up getting explain-logged, even if they are reported as only taking 0ms to complete. Very frustrating.

Owner

fxn commented Jan 25, 2012

Yes, I see your point, but this case is a bit different in my view, if the threshold is not nil the user is asking Active Record to log slow queries, that's inconsistent with logger = nil. In ordinary AR code traces are informative so it makes sense to check if there's a logger, the user didn't ask explicitly for them.

Regarding the span of time, the implementation sets a few thread local variables, and measure only what happens in AR::Relation#to_a (https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation.rb#L150-162) or in AR::Base.find_by_sql (https://github.com/rails/rails/blob/master/activerecord/lib/active_record/querying.rb#L36-40). When the code reaches that point the connection is already there.

It could be the case that you see more queries logged in a multi-thread environment because elapsed time is measured using Time.now at the beginning and end of the block. I don't really know if there's a way to compute elapsed time within a thread.

Anyone has a suggestion to address that?

Contributor

jrochkind commented Jan 25, 2012

Thanks.

Hmm, there's no way for there to be more than one thread within a
single call to find_by_sql or relation#to_a, that I can think of. So
the algorithm you describe seems as if it should be just fine, and
it's still a mystery why my app is deciding to 'explain' every query
in a manually created thread. I will continue trying to isolate a
simple reproduction case, its' very tricky, but it's certainly
possible the problem is related to some unusual situation particular
to my app.

As far as logger, I still believe that logger = nil should take
precedence, and suspect we could find analagous cases in other parts
of Rails (can't find any in AR) -- but as long as we agree that the
current behavior is a bug that should be fixed somehow, I'll leave it
to the committers to decide how.

Contributor

jrochkind commented Jan 25, 2012

Okay, a couple of analagous cases.

What if you have config.logger = nil, but

  • active_support.deprecation_behavior = :log

or

  • active_record.log_runtime = true

I suspect that both of these things you've said to send something to
logs will just no-op if there is no logger. Should they instead raise
at boot-time? Should the explain-slow-queries raise at boottime both
if active_record.logger is nil and if config.logger in general is
nil and config.active_record.logger is default (use config.logger)?
(we all agree they should not wait to raise until they happen, right?
Raise only when deprecation behavior is encountered, and you've said
:log but there's no logger? No!)

Perhaps they should raise at boot time, but I think not. Right now
there are not actually THAT many configurable "do I log or not?"
options in Rails. But presumably the number of such options will
steadily increase (or be provided by third party optional gems). I
suggest it is very typical developer behavior to want to toggle
logging on and off, but when it's on have it log all the stuff. The
approach you suggest, the developer would need to go change a whole
bunch of lines every time she wants to turn logging on or off to
accomplish this.

I think active_record.log_runtime may even default to true? Carrying
your approach forward consistently, if a developer set config.logger
or config.active_record.logger to nil, she'd also need to go change
config.active_record.log_runtime from it's default, to avoid a raise
on boot? Madness! It's quite possible
config.auto_explain_threshold_in_seconds could be given a non-nil
default value in the future, resulting in the same situation. The
simplest and most obvious and most maintainable thing to do, seems to
me, is if you've disabled logging, then things you've also configured
to log simply won't be able to, rather than raising. Perhaps a warning
on boot, realizing that, yes, many people will ignore warnings, but
still.

Contributor

jrochkind commented Jan 25, 2012

Also, however, and I'll try to make this the last thing I say on this:

Right now at present, the default starting rails 3.2 app does not
have auto_explain_threshold_in_seconds in production, but does have
it commented out, suggesting you can un-comment to get the behavior in
production.

Rails does not usually log AR in production. But I have no idea if
this is done by setting ActiveRecord::Base.logger to nil, or by some
other magic (that it's just logging at a 'debug' level? That something
in AR is hard-coded to do something different for env == 'production'?
No idea.).

Should probably consider what happens if a user un-comments the
auto_explain_threshold_in_seconds line in production.rb , and I get
confused even thinking about what the 'right' thing is, rails logging
has gotten a bit inconsistent and hard to predict in general.

Owner

fxn commented Jan 25, 2012

@jrochkind yes, let's think about it. I am not opposed to do something different if logger = nil, only wrote some ideas in favor of the current behaviour, but also open to consider other solutions and to listen to your arguments and the opinion of other people in core.

It could be the case that we need to take a decision and make sure those other config params you mention behave consistently in this regard.

Contributor

jrochkind commented Jan 25, 2012

in favor of the current behaviour

You don't really mean that, right? I find what I thought was your
proposed solution acceptable but not ideal: raise on app startup.

I do not find the the current behavior at all acceptable. The current
behavior is just a time-bomb -- app starts out fine with the
problematic configuration setup, but will raise only when and if a
query over the threshold is encountered, which could happen at any non-
predictable future point, surprising the user with an exception at an
indeterminate point in the future, and an exception with an unclear
class/message on top of that (NoMethod, calling #warn on nil), which
takes a look at the rails source code to even figure out was caused by
the (arguably) contradictory config.

I found it right away only because my app is ALSO weirdly explain-
logging all queries, even short ones. But in a typical app not subject
to that issue, this is just a time-bomb waiting to confuse someone.

There are other acceptable solutions that are not the one I propose
-- although the one I propose is probably the simplest to implement.
But I think the current behavior is pretty clearly unacceptable. You
disagree?

Owner

fxn commented Jan 26, 2012

No, no, I agree. I actually had fail fast vs silence in mind, rather than leaving the code as is. Raising an exception from somewhere deep in the framework is not acceptable.

@fxn fxn closed this in 2483460 Jan 26, 2012

Owner

fxn commented Jan 26, 2012

@jrochkind after thinking about this I think you're right and giving logger = nil a high prority makes sense in the most common case.

We then relax the definition of the threshold as a solution: If the threshold is nil, automatic EXPLAIN is disabled, otherwise it is a threshold that is taken into account as long as there is a logger. I have added a note about this in its guide, so we deal with the "inconsistency" via documentation.

arunagw pushed a commit to arunagw/rails that referenced this issue Jan 26, 2012

Contributor

jrochkind commented Jan 26, 2012

Cool, thanks for your time on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment