Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Add more logging with traceID #1973

Closed
shanson7 opened this issue Apr 1, 2021 · 2 comments · Fixed by #1972
Closed

Add more logging with traceID #1973

shanson7 opened this issue Apr 1, 2021 · 2 comments · Fixed by #1972
Labels

Comments

@shanson7
Copy link
Collaborator

shanson7 commented Apr 1, 2021

Before submitting a feature request please review other open requests and issues to see if someone else has already requested this feature.

Is your feature request related to a problem? Please describe.

Metrictank has good logging for incoming render requests, but it has a few shortcomings:

  1. Only logs on completion
    • It would be useful to log before completion so if query node dies due to large request we can see what was being worked on
  2. Only logs high level stats (e.g. total runtime)
    • It would be nice to have the meta.RenderStats logged on completion
  3. Doesn't log the request payload when POST application/json queries come in.

Describe the solution you'd like

Add some logging of high-level render stats and maybe some other interesting cases and include the traceID where possible.

Describe alternatives you've considered

Having trace enabled is great for some of this, but it is fairly heavy-weight compared to logging and (depending on where you store your trace/logs) harder to search against than logs. Often, trace is sampled (we use 5% sampling). Supplementing with logs would be great.

Additional context

I put in #1972 with very basic details of what I'm talking about. I expect there might be some interesting things to add (possibly on the read nodes side as well).

@Dieterbe
Copy link
Contributor

It would be useful to log before completion so if query node dies due to large request we can see what was being worked on

stating the obvious maybe, but this would double the logging volume. But I agree that this would be valuable enough, and many clusters don't have such a large request rate that this would be an issue.

Only logs high level stats (e.g. total runtime)

It would be nice to have the meta.RenderStats logged on completion

We have to be careful with how much value we confer to such stats (whether total runtime we already have or the breakdown you want to add)
as a query being slow often doesn't say much about the query, but rather says more about the state of the system (e.g. a process may be busy executing other queries, waiting for a lock, etc). So a perfectly lightweight fast query may be reported as a slow one if it was being held up. At least as soon as there's some concurrency, this becomes a concern. This is also why I never found much value in "slow query logs".
Specifically, read nodes have resources that can be explicitly occupied by other requests (index locks, read concurrency limits etc), but I think query nodes can execute a query only constrained by cpu, ram and network.
So for read nodes it may be worthwhile to track, when a subquery (a call to "/index/find_by_tag`, "/index/find" or "/getdata") finishes whether at any time during its execution, other such queries were running. We can then include that in the "finish" log message on the read node, and also include the flag in the response to the query node, such that when the query node logs the finish, it can also include it (whether or not any part of the query may have been affected by another) .

Alternatively, if you seek back enough in the log, you should be able to infer which queries ran concurrently only by processing the log file. But then you'd also need to tie that back from the read node logs back into the query node's logs.

Maybe I'm overthinking/idealizing things too much. I'm fine with your suggestion in general.

@shanson7
Copy link
Collaborator Author

a query being slow often doesn't say much about the query

However, in many cases the series/points fetched say quite a lot about the cost of the query. Using these in conjunction with the timings and the chunk pull stats is very useful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants