-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix intermittent errors in trace spans comparisons #105
Conversation
Codecov Report
@@ Coverage Diff @@
## main #105 +/- ##
===========================================
- Coverage 73.99% 47.76% -26.23%
===========================================
Files 20 19 -1
Lines 1496 1411 -85
===========================================
- Hits 1107 674 -433
- Misses 293 683 +390
+ Partials 96 54 -42
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have a minor comment but it doesn't have to be done in this PR.
pkg/transform/spanner.go
Outdated
return s.RequestStart >= parent.RequestStart && s.End <= parent.End | ||
} | ||
|
||
func (s *HTTPRequestSpan) Timings() (time.Time, time.Time, time.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for better API readability, I'd maybe use named return values in the signature:
func (s *HTTPRequestSpan) Timings() (goroutineStart time.Time, requestStart time.Time, end time.Time)
Or even returning a structure:
type Timings struct {
GoroutineStart time.Time
RequestStart time.Time
End time.Time
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! I'll fix it in this PR.
Thanks Mario! |
This PR is a follow-up to #104.
For the purpose of nested spans we compare where the client spans land to find the matching server span. The earlier approach used the wall-clock converted times, which can be unstable and in some local stress testing I noticed that sometimes the span nesting doesn't work properly.
I've changed the HTTPRequestSpan such that it contains the original mono time captured by the bpf probes, and we only convert to wall-clock time when we need to make the trace or metric span. This should also help with performance and memory consumption. Namely size metrics will never have to do any conversion on timestamps and we'll keep lower memory footprint on the span channel.
I also found a bug while investigating this issue, I had put the HTTP client probes on the Gin registration too, so they were firing double. The required modules should notice we need the HTTP eBPF program too.