Add more logging with traceID #1973

shanson7 · 2021-04-01T11:06:26Z

Before submitting a feature request please review other open requests and issues to see if someone else has already requested this feature.

Is your feature request related to a problem? Please describe.

Metrictank has good logging for incoming render requests, but it has a few shortcomings:

Only logs on completion
- It would be useful to log before completion so if query node dies due to large request we can see what was being worked on
Only logs high level stats (e.g. total runtime)
- It would be nice to have the meta.RenderStats logged on completion
Doesn't log the request payload when POST application/json queries come in.

Describe the solution you'd like

Add some logging of high-level render stats and maybe some other interesting cases and include the traceID where possible.

Describe alternatives you've considered

Having trace enabled is great for some of this, but it is fairly heavy-weight compared to logging and (depending on where you store your trace/logs) harder to search against than logs. Often, trace is sampled (we use 5% sampling). Supplementing with logs would be great.

Additional context

I put in #1972 with very basic details of what I'm talking about. I expect there might be some interesting things to add (possibly on the read nodes side as well).

The text was updated successfully, but these errors were encountered:

Dieterbe · 2021-05-19T18:50:21Z

It would be useful to log before completion so if query node dies due to large request we can see what was being worked on

stating the obvious maybe, but this would double the logging volume. But I agree that this would be valuable enough, and many clusters don't have such a large request rate that this would be an issue.

Only logs high level stats (e.g. total runtime)

It would be nice to have the meta.RenderStats logged on completion

We have to be careful with how much value we confer to such stats (whether total runtime we already have or the breakdown you want to add)
as a query being slow often doesn't say much about the query, but rather says more about the state of the system (e.g. a process may be busy executing other queries, waiting for a lock, etc). So a perfectly lightweight fast query may be reported as a slow one if it was being held up. At least as soon as there's some concurrency, this becomes a concern. This is also why I never found much value in "slow query logs".
Specifically, read nodes have resources that can be explicitly occupied by other requests (index locks, read concurrency limits etc), but I think query nodes can execute a query only constrained by cpu, ram and network.
So for read nodes it may be worthwhile to track, when a subquery (a call to "/index/find_by_tag`, "/index/find" or "/getdata") finishes whether at any time during its execution, other such queries were running. We can then include that in the "finish" log message on the read node, and also include the flag in the response to the query node, such that when the query node logs the finish, it can also include it (whether or not any part of the query may have been affected by another) .

Alternatively, if you seek back enough in the log, you should be able to infer which queries ran concurrently only by processing the log file. But then you'd also need to tie that back from the read node logs back into the query node's logs.

Maybe I'm overthinking/idealizing things too much. I'm fine with your suggestion in general.

shanson7 · 2021-05-20T08:44:41Z

a query being slow often doesn't say much about the query

However, in many cases the series/points fetched say quite a lot about the cost of the query. Using these in conjunction with the timings and the chunk pull stats is very useful.

shanson7 added the feature label Apr 1, 2021

shanson7 mentioned this issue Apr 1, 2021

Log traceID and meta RenderStats #1972

Merged

Dieterbe closed this as completed in #1972 Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more logging with traceID #1973

Add more logging with traceID #1973

shanson7 commented Apr 1, 2021

Dieterbe commented May 19, 2021

shanson7 commented May 20, 2021

Add more logging with traceID #1973

Add more logging with traceID #1973

Comments

shanson7 commented Apr 1, 2021

Dieterbe commented May 19, 2021

shanson7 commented May 20, 2021