New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing: Trace to metrics default range #72433
Conversation
Backend code coverage report for PR #72433 |
Frontend code coverage report for PR #72433
|
@joey-grafana why 2m? Seems very arbitrary and not necessary for long spans. Why don't we just make the time range have some minimal value like 1s or 1m if it is smaller? |
So we discussed a time range of about 5 minutes during stand up/in the linked issue a few times so thought that an even 2 either side would suffice. Why is this not necessary for long spans? Are you suggesting to change the start range to -1s and end range to 1s (or 1m)? Up until now we we had no range which meant a trace to metrics link click would basically not show any data in the metrics query unless the range was manually was configured. Now it will work out of the box and I am hoping we get feedback over time so we can tweak the default range. |
Any value we pick here will be arbitrary unless there's some specific metric we can point to as the average time range set by users. If we don't have it then we should pick a sensible default, be it 1s, 1m, 2m, 5m, etc. Without metrics to back our claims there's really no way to know what the best value is for this so 2m is not better or worse than 1s or 1m. |
So how this works now as far as I know is that we take the span duration and start time and use that as a time range. So I don't think the issue is we don't have a range it's just small (and btw other issue is grafana UI does not show <1s diff in the time range that is why it looks like there is no difference between start and end). So the question is similar to logs as what is the purpose of going to the metric, I assume it's to see what was happening during the time of the span. If you have 1s of span and show 4min of timerange that is going to be lots of data that isn't actually connected to the span where you clicked on the link. So imho the requirements would be keep the timerange as close to the span start-end while actually getting some data. To me that seems better served by using the span timerange but also making sure we have at least enough time range to show something. The min time range btw would probably be best defined as a multiple of a prometheus metric scrape interval (which says how often it has data points) though not sure we can easily get that information. Alternatively, we could use some arbitrary time range offsets which should be big enough to get some data and that maybe show a marked region of the true time range of the span which would both give wide context while showing which subset to focus on. |
To be clear, what I mean here is that we don't have a large enough range to show any results. I'm aware we don't show <1s so but the user never actually sees any metrics for their trace at all because of the range. I think with logs we have to be a lot more careful as there can literally be thousands of logs over a very short amount of time but with metrics we're rendering a graph over time and it's easier to drill down into the results they want. Again, with the previous default range of 0 users would see no metrics at all. The goal here was to present the user with some data as opposed to no data, while allowing us time to gather feedback and adjust as necessary. Agreed that we want to keep the time ranges as close to the span start-end while also getting data. Definitely don't want to present the user with too much metrics data when they click through a link but I'm just not sure what is the best range and why. It feels like a bit of a guess at this point in time. If there is a particular time range you have in mind, please lmk. |
@joey-grafana
Seems like you fish for a particular value but as I said I don't think that makes sense overall and there is no "good" value whoever we would ask. That is why I think we should change the approach here. |
* Update default range * Update tests
Highlighting the timerange in the graph would be neat. I didn't fully understand what you were suggesting before but this makes more sense. Not sure how plausible these are atm but with some work could improve on our current approach :) I'll open an issue for this. |
* Update default range * Update tests
What is this feature?
Adds a default range to our trace to metrics feature.
Why do we need this feature?
So trace to metrics works out of the box with useful results.
Who is this feature for?
Tracing users.
Which issue(s) does this PR fix?:
Fixes #64872
Special notes for your reviewer:
Please check that: