Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing: Trace to metrics default range #72433

Merged
merged 2 commits into from Jul 31, 2023

Conversation

joey-grafana
Copy link
Contributor

What is this feature?

Adds a default range to our trace to metrics feature.

Why do we need this feature?

So trace to metrics works out of the box with useful results.

Who is this feature for?

Tracing users.

Which issue(s) does this PR fix?:

Fixes #64872

Special notes for your reviewer:

Please check that:

  • It works as expected from a user's perspective.
  • If this is a pre-GA feature, it is behind a feature toggle.
  • The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

@github-actions
Copy link
Contributor

Backend code coverage report for PR #72433
No changes

@github-actions
Copy link
Contributor

Frontend code coverage report for PR #72433

Plugin Main PR Difference
explore 86.41% 86.41% 0%

@joey-grafana joey-grafana marked this pull request as ready for review July 27, 2023 12:36
@joey-grafana joey-grafana requested review from a team as code owners July 27, 2023 12:36
@joey-grafana joey-grafana requested review from joshhunt and ashharrison90 and removed request for a team, joshhunt and ashharrison90 July 27, 2023 12:36
@joey-grafana joey-grafana merged commit 813e471 into main Jul 31, 2023
24 checks passed
@joey-grafana joey-grafana deleted the joey/trace-to-metrics-default-range branch July 31, 2023 10:24
@aocenas
Copy link
Member

aocenas commented Aug 1, 2023

@joey-grafana why 2m? Seems very arbitrary and not necessary for long spans. Why don't we just make the time range have some minimal value like 1s or 1m if it is smaller?

@joey-grafana
Copy link
Contributor Author

So we discussed a time range of about 5 minutes during stand up/in the linked issue a few times so thought that an even 2 either side would suffice. Why is this not necessary for long spans? Are you suggesting to change the start range to -1s and end range to 1s (or 1m)? Up until now we we had no range which meant a trace to metrics link click would basically not show any data in the metrics query unless the range was manually was configured. Now it will work out of the box and I am hoping we get feedback over time so we can tweak the default range.

@adrapereira
Copy link
Contributor

Any value we pick here will be arbitrary unless there's some specific metric we can point to as the average time range set by users. If we don't have it then we should pick a sensible default, be it 1s, 1m, 2m, 5m, etc. Without metrics to back our claims there's really no way to know what the best value is for this so 2m is not better or worse than 1s or 1m.

@aocenas
Copy link
Member

aocenas commented Aug 1, 2023

So how this works now as far as I know is that we take the span duration and start time and use that as a time range. So I don't think the issue is we don't have a range it's just small (and btw other issue is grafana UI does not show <1s diff in the time range that is why it looks like there is no difference between start and end).

So the question is similar to logs as what is the purpose of going to the metric, I assume it's to see what was happening during the time of the span. If you have 1s of span and show 4min of timerange that is going to be lots of data that isn't actually connected to the span where you clicked on the link.

So imho the requirements would be keep the timerange as close to the span start-end while actually getting some data. To me that seems better served by using the span timerange but also making sure we have at least enough time range to show something. The min time range btw would probably be best defined as a multiple of a prometheus metric scrape interval (which says how often it has data points) though not sure we can easily get that information.

Alternatively, we could use some arbitrary time range offsets which should be big enough to get some data and that maybe show a marked region of the true time range of the span which would both give wide context while showing which subset to focus on.

@joey-grafana
Copy link
Contributor Author

To be clear, what I mean here is that we don't have a large enough range to show any results. I'm aware we don't show <1s so but the user never actually sees any metrics for their trace at all because of the range.

I think with logs we have to be a lot more careful as there can literally be thousands of logs over a very short amount of time but with metrics we're rendering a graph over time and it's easier to drill down into the results they want. Again, with the previous default range of 0 users would see no metrics at all. The goal here was to present the user with some data as opposed to no data, while allowing us time to gather feedback and adjust as necessary.

Agreed that we want to keep the time ranges as close to the span start-end while also getting data. Definitely don't want to present the user with too much metrics data when they click through a link but I'm just not sure what is the best range and why. It feels like a bit of a guess at this point in time. If there is a particular time range you have in mind, please lmk.

@aocenas
Copy link
Member

aocenas commented Aug 1, 2023

@joey-grafana
As I mentioned in the comment one of these approaches would work better imho:

  • min(span.timerange, N * scrape interval)
  • arbitrary timerange offset but highlight the span.timerange region in the graph
  • or probably a combination of those even

Seems like you fish for a particular value but as I said I don't think that makes sense overall and there is no "good" value whoever we would ask. That is why I think we should change the approach here.

sarahzinger pushed a commit that referenced this pull request Aug 1, 2023
* Update default range

* Update tests
@joey-grafana
Copy link
Contributor Author

Highlighting the timerange in the graph would be neat. I didn't fully understand what you were suggesting before but this makes more sense. Not sure how plausible these are atm but with some work could improve on our current approach :) I'll open an issue for this.

chauchausoup pushed a commit to chauchausoup/grafana that referenced this pull request Sep 15, 2023
* Update default range

* Update tests
@zerok zerok modified the milestones: 10.2.x, 10.2.0 Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracesToMetrics uses exact time not a range when jumps to metrics so the graph is empty
4 participants