Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intermittent errors in trace spans comparisons #105

Merged
merged 6 commits into from
May 8, 2023

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented May 5, 2023

This PR is a follow-up to #104.

For the purpose of nested spans we compare where the client spans land to find the matching server span. The earlier approach used the wall-clock converted times, which can be unstable and in some local stress testing I noticed that sometimes the span nesting doesn't work properly.

I've changed the HTTPRequestSpan such that it contains the original mono time captured by the bpf probes, and we only convert to wall-clock time when we need to make the trace or metric span. This should also help with performance and memory consumption. Namely size metrics will never have to do any conversion on timestamps and we'll keep lower memory footprint on the span channel.

I also found a bug while investigating this issue, I had put the HTTP client probes on the Gin registration too, so they were firing double. The required modules should notice we need the HTTP eBPF program too.

@grcevski grcevski requested a review from mariomac May 5, 2023 19:43
@grcevski grcevski changed the title Fix intermittent trace spans comparisons Fix intermittent errors in trace spans comparisons May 5, 2023
@codecov-commenter
Copy link

codecov-commenter commented May 5, 2023

Codecov Report

Merging #105 (268dc52) into main (274c458) will decrease coverage by 26.23%.
The diff coverage is 85.00%.

@@             Coverage Diff             @@
##             main     #105       +/-   ##
===========================================
- Coverage   73.99%   47.76%   -26.23%     
===========================================
  Files          20       19        -1     
  Lines        1496     1411       -85     
===========================================
- Hits         1107      674      -433     
- Misses        293      683      +390     
+ Partials       96       54       -42     
Flag Coverage Δ
integration-test ?
unittests 47.76% <85.00%> (+0.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/ebpf/nethttp/nethttp.go 0.00% <ø> (-100.00%) ⬇️
pkg/export/debug/debug.go 13.79% <0.00%> (-61.21%) ⬇️
pkg/export/otel/metrics.go 67.10% <66.66%> (-9.57%) ⬇️
pkg/export/otel/traces.go 77.23% <100.00%> (+0.30%) ⬆️
pkg/transform/spanner.go 80.95% <100.00%> (-2.39%) ⬇️

... and 11 files with indirect coverage changes

@grcevski grcevski added the bug Something isn't working label May 5, 2023
Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have a minor comment but it doesn't have to be done in this PR.

return s.RequestStart >= parent.RequestStart && s.End <= parent.End
}

func (s *HTTPRequestSpan) Timings() (time.Time, time.Time, time.Time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for better API readability, I'd maybe use named return values in the signature:

func (s *HTTPRequestSpan) Timings() (goroutineStart time.Time, requestStart time.Time, end time.Time)

Or even returning a structure:

type Timings struct {
    GoroutineStart time.Time
    RequestStart time.Time
    End time.Time
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! I'll fix it in this PR.

@grcevski grcevski merged commit 8cf38af into grafana:main May 8, 2023
3 checks passed
@grcevski
Copy link
Contributor Author

grcevski commented May 8, 2023

Thanks Mario!

@grcevski grcevski deleted the fix_traces branch May 8, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants