Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP client spans #87

Merged
merged 18 commits into from
May 3, 2023
Merged

HTTP client spans #87

merged 18 commits into from
May 3, 2023

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented May 2, 2023

This PR re-introduces the HTTP client spans. It does so by tapping into http.client send and then it tracks the goroutine creation to match with a parent HTTP/GRPC span.

The PR might be too large to review, so I think I'll start splitting it off into smaller PRs. I had to do some ground work to make this happen:

  • I split off the goruntime into its own bpf program, since we can mix and match HTTP and GRPC the goroutine uprobes were firing twice.
  • I pinned two maps, since it wasn't possible to properly find parent GRPC server request on HTTP client.
  • I added support for map pinning and cleanup.

@grcevski grcevski requested a review from mariomac May 2, 2023 19:56
@grcevski
Copy link
Contributor Author

grcevski commented May 2, 2023

Hey @mariomac, I'm open to splitting this into multiple PRs, let me know if it's unreasonable to review.

@codecov-commenter
Copy link

codecov-commenter commented May 2, 2023

Codecov Report

Merging #87 (decf7b5) into main (a89145e) will increase coverage by 0.87%.
The diff coverage is 81.12%.

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   73.07%   73.95%   +0.87%     
==========================================
  Files          19       20       +1     
  Lines        1315     1463     +148     
==========================================
+ Hits          961     1082     +121     
- Misses        264      289      +25     
- Partials       90       92       +2     
Flag Coverage Δ
integration-test 59.21% <32.65%> (-3.50%) ⬇️
unittests 47.89% <58.72%> (+1.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/ebpf/grpc/grpc.go 100.00% <ø> (+5.66%) ⬆️
pkg/goexec/structmembers.go 75.45% <ø> (ø)
pkg/export/otel/metrics.go 68.53% <17.39%> (-9.81%) ⬇️
cmd/otelauto/main.go 59.03% <50.00%> (-3.68%) ⬇️
pkg/ebpf/tracer.go 64.08% <80.00%> (+3.21%) ⬆️
pkg/export/otel/traces.go 80.58% <95.00%> (+7.09%) ⬆️
pkg/ebpf/goruntime/goruntime.go 100.00% <100.00%> (ø)
pkg/ebpf/nethttp/nethttp.go 100.00% <100.00%> (+4.05%) ⬆️
pkg/pipe/instrumenter.go 90.62% <100.00%> (+0.96%) ⬆️
pkg/transform/spanner.go 82.50% <100.00%> (+0.22%) ⬆️

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely it's a huge PR! Amazing job.

There are some points that we should take into the consideration (see code comments), but it's fine to address them after merging so it's easier to review them.

bpf/go_common.h Outdated
__type(key, void *); // key: pointer to the request goroutine
__type(value, func_invocation);
__uint(max_entries, MAX_CONCURRENT_REQUESTS);
__uint(pinning, LIBBPF_PIN_BY_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: adjust indentation?

bpf/go_common.h Outdated
__uint(max_entries, MAX_CONCURRENT_REQUESTS);
__uint(pinning, LIBBPF_PIN_BY_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: adjust indentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe after this PR is merged, we should also evaluate whether we will have any type of conflict/corruption if we use the same BPF map from multiple auto-instrumenters in the same host.

If that happened, maybe we can try to assign a unique per-process map name on load so they are not shared (it seems Cilium library would allow that).

Another option is to override the /sys/fs/bpf path a unique /var/run/otelauto/... subfolder. LibBPF allows doing it but I don't know if Cilium lib can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I think we should privatise these paths based on the otelauto PID, as well as you suggest adding otelauto as prefix. I don't think it would be a problem to mount another path, we should be good to go with any as long as the mount type is bpf, which we already have to do for systems that don't have the bpf file system auto-mounted.

@@ -33,6 +38,14 @@ func main() {
os.Exit(-1)
}

c := make(chan os.Signal, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be deleted, as the last version in the main branch already implements graceful desktop.

Comment on lines +74 to +83
func erasePinnedMaps() {
for _, m := range fs.PinnedMaps {
slog.Debug("cleaning storage used by pinned object", "map", m)
err := os.Remove(filepath.Join(fs.PinnedRoot, m))
if err != nil {
slog.Error("can't remove pinned map "+m, err)
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to a previous comment: we should be careful with this. If there are other otelauto instances running in the host, will they have their runtime data erased?

sp.End(trace2.WithTimestamp(spans[i].End))
span := &spans[i]

if span.Type == transform.EventTypeHTTPClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since gocyclo threshold has been decreased to 10, maybe it complains about this method. In that case I'd suggest to split some blocks into methods/functions.

@grcevski
Copy link
Contributor Author

grcevski commented May 3, 2023

Thanks for the thorough review Mario, I'll resolve the conflicts, merge and then work on the issues related to erasing conflicting maps.

@grcevski grcevski merged commit 85071ad into grafana:main May 3, 2023
3 checks passed
@grcevski
Copy link
Contributor Author

grcevski commented May 3, 2023

I'll follow-up with a PR to resolve the path sharing.

@grcevski grcevski deleted the http_client_spans branch May 3, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants