Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to log slow queries and recording rules #1315

Open
AlekSi opened this Issue Jan 14, 2016 · 63 comments

Comments

Projects
None yet
@AlekSi
Copy link
Contributor

AlekSi commented Jan 14, 2016

Having something like PostgreSQL's log_min_duration_statement would be useful to debug performance problems.

@AlekSi AlekSi changed the title Add option to log slow queries Add option to log slow queries and recording rules Jan 14, 2016

@spinus

This comment has been minimized.

Copy link

spinus commented May 24, 2016

+1 for that feature

@grobie

This comment has been minimized.

Copy link
Member

grobie commented May 24, 2016

I was thinking a few times that it would be great to collect detailed query information, like how many chunks were necessary to compute the result and how many had to be loaded from disk.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 24, 2016

+1 for the more detailed reporting – especially those feeding into metrics.
We are collecting how long various stages of a query evaluation take but
are not making use of it.

On Tue, May 24, 2016 at 9:09 AM Tobias Schmidt notifications@github.com
wrote:

I was thinking a few times that it would be great to collect detailed
query information, like how many chunks were necessary to compute the
result and how many had to be loaded from disk.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1315 (comment)

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 30, 2016

We used to spit out a lot of query information on debug log level. But that's gone now.
I'm currently trying to hunt down an abusive query, but it's not possible anymore.

Just bringing back individual query logging on debug level would already be quite helpful.
If we could also create some useful metrics (mysqld_exporter style), even better.

@noushi

This comment has been minimized.

Copy link

noushi commented Jul 21, 2016

+1

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 2, 2016

Suggestion for logging:

  • On INFO level, log the API accesses, just the URL, response time, status code. (This could nicely feed into log analysis. An ELK stack and Prometheus could then look after each other... ;)
  • On DEBUG level, bring back (and improve) the comprehensive analysis of query time break up.

Rationale: API calls are not that frequent. And whoever doesn't like to log them, goes to WARN level in logging. On the other hand, we want the comprehensive analysis on DEBUG level. So we need a level for the "classical" query logging you might want to routinely feed into log analysis. If it's not INFO, what would it be then?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 2, 2016

On INFO level, log the API accesses, just the URL, response time, status code. (This could nicely feed into log analysis. An ELK stack and Prometheus could then look after each other... ;)

This would be way too spammy. API calls are frequent, a few dashboards auto-refreshing every minute and you could easily be hitting 1 qps. Application logging should aim for no more than a few log lines per minute, and even then that's high.

If these are all to be logged, they belong in some other logging mechanism more suited for request logs - not the application logs.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 2, 2016

WARN level could be the default. What else, in a completely re-thought logging, would be on INFO level? Perhaps INFO is already the place for "spam" and should only be activated explicitly by those interested. (Here at SC, I'd love to have API logging on all our Prometheus servers. There are way bigger log producers around...)

A separate logging mechanism would be a solution, too, but requires plenty of configuration.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 2, 2016

The problem here is the conflation of request logs and application logs. We currently have application logs, and the level of logging is appropriate.

@discordianfish

This comment has been minimized.

Copy link
Member

discordianfish commented Sep 26, 2016

I think the Prometheus maintainers are all on the same page on logging vs instrumentation, but I also find myself sometimes in the situation where I need some information which can't easily by inferred by metrics. We should take this issue here as an example for that and solve it in a way that can be applied in similar situations.

In general, don't we have already a query timeout? @beorn7 the query you want to investigate doesn't hit that limit I assume? If it would hit the limit, it should definitely get logged with a error severity, which is probably already the case, right?
So I'm wondering, how far would we get if we only log timed out queries (maybe reduce the timeout?) and improve the telemetry so it's easier to isolate issues with that?

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 26, 2016

Instead of creating a partial solution that only logs slow queries, I'd like to have proper query logging (which can be a separate thing from application logs, as @brian-brazil suggested).

Even in the special case when I'm only interested in investigating slow queries, I want to know if the slow query is new, or if the same query has run before but has degraded in performance.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Sep 26, 2016

I think query logging should be configurable, i.e. allowing to specify to log all queries >= X seconds (which could be 0s to log everything). Adding the option to direct query logging to a different destination then the usual application logs sounds reasonable, but I wouldn't make that mandatory. There are many solutions availble to split/filter different log lines from a common source.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Sep 27, 2016

For the record, I for one am not on the same page with

Application logging should aim for no more than a few log lines per minute, and even then that's high.

Having grown up with the manifesto I expect applications to log "everything they do" (on the order of one line per request) to a single stream (stdout). If these log lines are reasonably well structured it's easy to filter down. A single grep on a weak laptop can easily keep up with the query load of a Prometheus server.

What is the benefit of separate request logs?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 27, 2016

It's difficult to find one application-level message mixed among 10k request logs.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 28, 2016

Not with proper structured logging.

On Tue, Sep 27, 2016 at 10:53 PM Brian Brazil notifications@github.com
wrote:

It's difficult to find one application-level message mixed among 10k
request logs.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1315 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEuA8pbBOmGNhVQ7o0f4ZAg_XsTOaTJAks5quYJBgaJpZM4HE6t5
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 28, 2016

I don't think we can assume that the user has the tooling and infrastructure in place to take advantage of that in an emergency. I'm working on the basis that the user has tail, and maybe grep.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 28, 2016

Grep seams perfectly sufficient to filter messages in structured logging.
Not making a case for any level of verbosity here, but I think with
structured logging it's way less of an issue.

On Wed, Sep 28, 2016 at 12:19 PM Brian Brazil notifications@github.com
wrote:

I don't think we can assume that the user has the tooling and
infrastructure in place to take advantage of that in an emergency. I'm
working on the basis that the user has tail, and maybe grep.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1315 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEuA8mjjDt-_O21KEKWhIWRN5A3GB1e-ks5quj8XgaJpZM4HE6t5
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 28, 2016

I said maybe grep. There is more than one system out there which can only give you a raw dump of application logs in the browser, with no filtering ability.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 28, 2016

Yeah, I'm with Dave Cheney there. Request/audit logs are a different beast from application logs. One logs a couple of times a day and is relevant for humans reading over it (thus also structured logging is not that relevant there), the other logs potentially thousands of times a second (and usually processed by machines, thus important to have good structure there). So generally it would make sense to keep them separate - then you can also have different retention times, manage them differently, act on them differently, etc.

But in practice, we have the problem that we don't want to write to any files from Prometheus and all we have are stdin/stderr (and those also don't seem like quite the right split for app/request logs). So no idea if we should ever support the more application-log style logging (and what to include in there? every scrape? every query?). Or if the best compromise is to mix in a very low amount of request logging (like slow queries, if configured) into the application log stream.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 28, 2016

Another interesting case is the crash-recovery log. It is by far the most copious chunk of logs Prometheus is creating at the moment. At the same time, it's also the kind of logs I have looked at most often for forensics (not only as a Prometheus dev, also as a user).
Since crash recovery writes to files anyway (namely the orphaned directory), we could also create a separate file for the crash recovery logs.

Logging as a whole is definitely an area we should invest some more careful thoughts in.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Mar 6, 2017

Restarting discussion on this, I would like to give this a shot over the next week. I would like to know:

  1. Are we going to expose them as metrics (as @fabxc suggested)?
  2. What all are to be logged? What should be the format?
  3. Should we log all the long queries with a log_min_duration flag? Or do we want to pass a parameter to the API to log the query or not?

Thanks

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 6, 2017

There's already a PR out for metrics I believe.

Logurs supports writing to a separate file, so that might be the way to go.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 6, 2017

I'd just love to have the ability to log every single query with its runtime. Personally, I don't feel strongly if the logging is controlled by log level and/or by configurable output files.

@tamsky

This comment has been minimized.

Copy link

tamsky commented Jan 24, 2018

#3491 exposes a per-recorded-rule eval time metric

@free

This comment has been minimized.

Copy link
Contributor

free commented Jul 19, 2018

One of my Prometheus instances just got killed due to what I can only assume was a (series of) heavy user queries that caused it to run out of memory after some 10s of minutes of pounding (memory usage kept growing, Prometheus was unresponsive (scrapes failed for minutes at a time), CPU was pegged, /query_range latency reached 2 minutes. (I also managed to get a CPU profile during a later occurrence and most of the time was being spent on query evaluation triggered by HTTP requests).

Problem is that a year and a half after this issue was filed, there was still no mention in the logs of queries having timed out, except for a bunch of rule evals, which I know that under normal conditions take milliseconds and thus were not the cause of the CPU/memory load. So I am still entirely clueless about what went down just before Prometheus got killed.

Would anyone mind if I proposed a PR adding logging of timed out queries only (no additional threshold or flag for it), for now? Just to get the ball rolling?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 19, 2018

I don't see a problem with adding debug logging of all queries. A query timing out is not an error from Prometheus's standpoint, so should be no more than a debug log.

@free

This comment has been minimized.

Copy link
Contributor

free commented Jul 19, 2018

I'll try to keep my frustration to myself, but what use is logging it with a debug level? Should I enable debug logging on all production instances of Prometheus to find out why they are blowing out their RAM allocation?

I'm not saying it should be an error, info is perfectly fine for me as long as Prometheus gives a clue (in its default configuration) about why it was using 6x the amount of memory it normally uses. Normally the Prometheus output consists entirely of TSDB WAL and compaction messages. Just how useful is the following?

level=info ts=2018-07-18T19:00:02.41379889Z caller=head.go:348 component=tsdb msg="head GC completed" duration=59.4369ms
level=info ts=2018-07-18T19:00:04.301289768Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=1.88729339s
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 19, 2018

A query timing out is uninteresting to overall Prometheus health in the same way a failed scrape is uninteresting, and such occurrences should not spam the logs as application logs should be kept to application-level issues.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 19, 2018

I agree that this information shouldn't be included in the logs by default, but I can see how providing access to such information without having to enable debug logging everywhere would be useful. Is it maybe time to think about another in-memory store of recent debug-level data (like latest timed-out queries and such) that can be retrieved via the web UI + API?

@free

This comment has been minimized.

Copy link
Contributor

free commented Jul 19, 2018

A query timing out is uninteresting to overall Prometheus health in the same way a failed scrape is uninteresting, and such occurrences should not spam the logs as application logs should be kept to application-level issues.

You are, (very) strictly speaking, correct. How about the case of a query timing out at a point in time when memory usage is 5x what it was over the previous 24 hours? Is that still uninteresting to overall Prometheus health? (Leaving aside the importance to overall Prometheus health of "head GC [having] completed" in exactly 59.4369ms.)

And regarding the "strictly speaking" part, how often does a query time out after 2 minutes without having used up multiple GB of RAM? Is that still uninteresting? I mean I am all for identifying the very query that pushed Prometheus over the top (whatever that means, since Prometheus is not necessarily aware of how much memory it is supposed to use before it gets killed), but seeing how this feature request went exactly nowhere in a year and a half, I'm not exactly holding out hope for this ever happening. In my personal view, spamming the log with all 10 queries that timed out because of the one query-of-death is much preferable to holding out for the perfect solution forever.

Is it maybe time to think about another in-memory store of recent debug-level data (like latest timed-out queries and such) that can be retrieved via the web UI + API?

What use would that have been in my case, where Prometheus got killed for blowing out its memory allocation? Should I have it page me when it gets to 90%, in the (highly unlikely, as Prometheus was wholly unresponsive for the last 10 minutes before getting killed) hope that I can quickly get in an API query to check what's killing it?

I get the argument for keeping application and request logging separate, but unless someone is going to actually implement request logging, it's not much of an argument, is it?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 19, 2018

Is it maybe time to think about another in-memory store of recent debug-level data (like latest timed-out queries and such) that can be retrieved via the web UI + API?

I've been hoping that we'd be doing that via tracing for the past 2-ish years, but the technology still doesn't seem to be at that point. That's why I ended up hacking up something myself for the blackbox exporter.

In my personal view, spamming the log with all 10 queries that timed out because of the one query-of-death is much preferable to holding out for the perfect solution forever.

I see no problem with printing out the queries that were running when a Prometheus is killed, however that is not the same thing as logging all slow queries. When I last investigated this there's also no way to hook into an OOM with Go, so this isn't technically feasible.

@ipstatic

This comment has been minimized.

Copy link

ipstatic commented Jul 19, 2018

What about having a separate log dedicated only to slow queries? It can be disabled by default but enabled via a flag. We have been bitten by this many times as well, and it would be great to see which rules are causing the issues. It frustrates our users when we tell them "one or more of your rules are performing bad, please find and fix them".

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 19, 2018

What about having a separate log dedicated only to slow queries? It can be disabled by default but enabled via a flag.

I was thinking that too, or having a flag to log all slow queries at info vs. debug level? Though that gets kind of into the territory of per-component log level options (which could be interesting, just wondering whether we should introduce it ad-hoc like this, and in a way where we have to guarantee feature stability of the flag).

Good point about a UI/API not being useful when the server is already dead.

@free

This comment has been minimized.

Copy link
Contributor

free commented Jul 19, 2018

I see no problem with printing out the queries that were running when a Prometheus is killed, however that is not the same thing as logging all slow queries.

I'd be more willing to agree with that if said "slow queries" were using a fraction of Prometheus' memory footprint, but when one slow query can easily allocate more memory than everything else put together that easily crosses into "interesting to overall Prometheus health" territory in my eyes. Or is everyone else using Prometheus with 50GB+ of RAM? (Which would indeed make any one "slow query" merely a data point in a log.)

@roganartu

This comment has been minimized.

Copy link
Contributor

roganartu commented Jul 19, 2018

Or is everyone else using Prometheus with 50GB+ of RAM? (Which would indeed make any one "slow query" merely a data point in a log.)

fwiw in my experience using a prom with 100GB of RAM or more still doesn't make a slow query just a datapoint in every case. It's still very possible to have a single slow query take more memory than everything else and OOM a node that big within a single-digit-minute timeout period.

Your problem seems somewhat related to #4383 @free. I have a PR up for it that needs some work. I plan to clean it up over the next few weeks or so when I get some more time to revisit it, but it should also be possible to tie in logging of the oversize queries however is decided as acceptable which might help.

@ipstatic

This comment has been minimized.

Copy link

ipstatic commented Jul 19, 2018

I'll add we have very high usage Prometheus instances with 64GB of ram that run into this issue as well. As mentioned by @roganartu we dedicate most of that memory to in memory metrics, so only a bad query or two can cause OOM to kill Prometheus. Note we are still on Prometheus 1.x so no clue if 2.x helps solve any of these problems yet.

@free

This comment has been minimized.

Copy link
Contributor

free commented Jul 25, 2018

So, could someone with the power to later approve a PR decide on a realistic solution? (I.e. not "printing out the queries that were running when a Prometheus is killed, [which] isn't technically feasible".) Someone might be more willing to give it a go as long as there exists a chance of their work not being entirely wasted.

Proposed (realistic) solutions so far:

  • log all queries that time out;
  • log all queries that time out while memory utilization is above some (static or dynamic) threshold;
  • log all/timed out/slow queries to a separate request log;
  • introducing per-component log levels (combined with logging timed out queries at some lower level) -- although this would require explicit set up, so it wouldn't be very useful out-of-the-box.

One other idea would be to record a timeseries for each of the last (say) 10 queries that have timed out, although that would be a weird workaround to simply logging them.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

One other idea would be to record a timeseries for each of the last (say) 10 queries that have timed out, although that would be a weird workaround to simply logging them.

That's event logging so doesn't belong in the Prometheus tsdb, however that's roughly how I'd expect a tracing system accessible from the UI to work.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jul 25, 2018

Personally, I would argue that STDOUT/ERR on default log level are the best place for this; pareto, etc.

Having a counter for timed out metrics would enanble operators to know when to take a look.

While a web UI would be a nice extra, this data will be gone if OOM killer comes along so it can not be the only way to expose this data.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

As has been indicated many times above, these are debug logs at best in our current application logs. If we want to log individual queries beyond that we need a new type of log system that's suitable for request logs or have some in-memory store of recent queries (similar to the blackbox exporter).

If we don't have a counter for timed out queries, we should create one.

We can't log anything when the OOM killer comes through, as Prometheus was just killed. We should not focus discussion on a class of log we can't create.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jul 25, 2018

This has been pointed out above, but I diagree.

If a query OOMs Prometheus before the query takes more than X time, we can't log it. If it's logged after X time and Prometheus OOMs afterwards, the user would have a (hopefully correct) lead to follow. For debugging and in case of repeated OOMs, the user would set X lower and lower until they have a hit.

Still, operational sanity would mandate that this is on by default.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 25, 2018

My definition of operational sanity includes not having debug or request logs mixed in with application logs, as I've seen where that ends up. A better approach is to work on preventing OOMs, not to build complicated code and human procedures to try and catch them by chance.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jul 25, 2018

I agree that avoidance is better.

Yet, the quick fix until we do have avoidance is to make the failure condition easier to track down.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 31, 2018

One thing we could do is maintain a list of running queries in a file, and print them on startup if the file isn't empty.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jul 31, 2018

That would be OK by me, but I am somewhat wary of the I/O cost of that approach vs keeping the bulk of the queries in RAM and only exposing the data when needed. STDOUT/ERR go to journald (or screen/tmux) which also does not have linear I/O cost.

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Jul 31, 2018

Middle ground could be to dump slow queries to disk, simply appending. This is functionally equivalent to STDOUT/ERR without the default OS magic, though.

We should expose a counter for slow queries either way. Maybe even better: a histogram.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 31, 2018

A mmaped file is cheap to update as queries come and go.

@ipstatic

This comment has been minimized.

Copy link

ipstatic commented Jul 31, 2018

One thing we could do is maintain a list of running queries in a file, and print them on startup if the file isn't empty.

Would this list be appended to or merely replaced? The biggest issue for us is slow queries that don't quite take down a Prometheus instance and we need to investigate them later (days later). If the file is just being updated in place, this does not solve the issue for us. An appended to file would be much better for us.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 31, 2018

Just replaced. It'd be for finding OOMs, not a general logging mechanism.

@arunkjn

This comment has been minimized.

Copy link

arunkjn commented Mar 25, 2019

Hi @brian-brazil, So what is the recommended way (if any) to do this today? I am using prometheus 2.1. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.