Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add host info to traces #33

Merged
merged 9 commits into from
Mar 28, 2023
Merged

Add host info to traces #33

merged 9 commits into from
Mar 28, 2023

Conversation

grcevski
Copy link
Contributor

With this PR we can track the host and the port used on the request. This additional information along with the peer info can be used to create service graphs.

@grcevski grcevski requested a review from mariomac March 23, 2023 23:18
Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this and testing it with equivalent calls resulted in different hostnames. For example:

$ curl http://localhost:8080/ping
$ curl http://127.0.0.1:8080/ping
$ curl http://0.0.0.0:8080/ping

Printed the following traces:

200 GET /ping [127.0.0.1]->[192.168.5.15(localhost):8080]
200 GET /ping [127.0.0.1]->[192.168.5.15(127.0.0.1):8080]
200 GET /ping [127.0.0.1]->[192.168.5.15(0.0.0.0):8080]

Maybe we could discard the host part from eBPF and only take the port. And follow a similar approach to getLocalIPv4(): reading the hostname once from Go and decorate the flows with it so we make sure it's always the same name. What do you think?

@@ -19,7 +21,7 @@ import (
"github.com/grafana/http-autoinstrument/test/collector"
)

const testTimeout = 5 * time.Second
const testTimeout = 500 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you were debugging and you forgot to revert it back to 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Thanks for spotting it!

@grcevski
Copy link
Contributor Author

Maybe we could discard the host part from eBPF and only take the port. And follow a similar approach to getLocalIPv4(): reading the hostname once from Go and decorate the flows with it so we make sure it's always the same name. What do you think?

Yeah I knew this is how it worked, I thought a service will mostly use a single name when talking to a downstream service, but you have a very valid point, it will be inconsistent across different upstreams. Service A might use an IP address, service B a DNS name, which may even be hardcoded in /etc/hosts.

Thanks for the suggestion! I'll discard the host name and use reverse dns lookup on the host to find the hostname matching the ip address.

@grcevski grcevski requested a review from mariomac March 27, 2023 18:04
Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I have a comment but I'm fine if you merge this and we address it later in another PR.

For the future we will probably need to do this a bit more complex. Usually Cloud users prefer to set here the Instance ID as hostname. We'd need to check/decide this with our potential users.

Also, just to annotate some future work: some users (<1% but they are very noisy with support) have "exotical" network configurations and the returned os.Hostname() switches from time to time (e.g. from hostName to hostName.localDomain and vice-versa).

In that case, we can offer them to configure the hostname Resolution by contacting DNS (see how New Relic does it), and, as a last fallback solution, allow them overriding the host name via environment variables.

}
}
// We ignore the hostname from the request, and use the hostname of the machine
hostname, _ := os.Hostname()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Ok for now but in a later PR I'd query this periodically in background (e.g. every 5 seconds). Doing that on each trace requires opening and reading the Open("/proc/sys/kernel/hostname"), which involves some system call and, from my previous experience, this increases the system %CPU reported by the process (even if it is just because the process is in wait time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, I should've done what I did for ip address. I missed that. As discussed, I'll revert back to using the host name supplied in the request. Since the collector may run on a different container as the service that's being instrumented, I think the host name queried as I did here will be wrong, so will the ip address. I'm going to look if I can collect the ip address from the request somehow, if not, I'll just use the host name supplied in the request. I expect that in most network configuration, the hostname that the peer is using should be either hardcoded Ip address behind some proxy/loadbalancer or DNS resolved name through some service discovery.

@grcevski grcevski merged commit 1276969 into main Mar 28, 2023
@grcevski
Copy link
Contributor Author

Thanks Mario!

@grcevski grcevski deleted the add_host_info branch March 28, 2023 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants