Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce peer.hostPort standard tag #116

Closed
yurishkuro opened this issue Aug 21, 2016 · 11 comments
Closed

Introduce peer.hostPort standard tag #116

yurishkuro opened this issue Aug 21, 2016 · 11 comments

Comments

@yurishkuro
Copy link
Member

We currently have a series of standard peer tags like peer.ipv4, peer.hostname, peer.port, etc. I've discovered that quite often the RPC libraries being instrumented with OT do not actually know the exact format of "host:port" string they are given, because they just pass it to lower-level networking libraries. Therefore when RPC libraries need to identify the peer via peer tags, they often have to duplicate the same parsing code, for example here.

I would like to propose a standard tag peer.hostPort (string) which can take a variety of different formats and let the tracer implementation deal with parsing it (or even send it off to collection tier to process later). Specifically, the tag value could be one of these forms:

  • host:port (port must be a number)
  • host (no port)
  • where host can be
    • host name, e.g.
      • "myhost"
      • "webserver.mydomain.com"
      • "ip-12-34-56-78.us-west-2.compute.internal"
    • IPv4 address, e.g. "10.20.30.40"
    • IPv6 address in brackers [], e.g. "[2001:0db8:85a3:0000:0000:8a2e:0370:7334]"

cc @bensigelman @dkuebric

@bhs
Copy link
Contributor

bhs commented Aug 22, 2016

@yurishkuro what do you think about peer.address? E.g., at google there was a service-naming scheme that wasn't host:port but was widely available and canonical (and did ultimately boil down to host:port after resolution).

@yurishkuro
Copy link
Member Author

yurishkuro commented Aug 22, 2016

@bensigelman address sounds quite different from hostPort, given what you said about resolution. For example, if you're instrumenting a plain HTTP client, I don't think it will work with address that needs resolution, but it will work with all the variants of hostPort I showed above, and that hostPort will be eventually available in the user code after resolution step.

It's certainly useful to capture resolvable address, but I would do it as a separate tag from hostPort. Also, resolvable address probably isn't going to differentiate between the instances of the target service, whereas hostPort will (unless there's additional routing like haproxy).

PS: I am ok to have peer.address in addition to peer.hostPort.

@bhs
Copy link
Contributor

bhs commented Aug 22, 2016

It just seems weird to me to support host, port, and hostPort (not to mention address)

@yurishkuro
Copy link
Member Author

It reflects various degrees of information available to instrumentation:

  1. if you know exactly the host and port, use peer.ipv4 / peer.ipv6 and peer.port
  2. if you only know the unparsed hostPort string, use peer.hostPort
  3. if you only know resolvable/virtual name, use peer.address

Right now we don't have a place to store something like google.com:8080 at all.

@bhs
Copy link
Contributor

bhs commented Aug 22, 2016

@yurishkuro another option would be to write a helper (could even be part of OT/ext) that takes an address string and does its best to convert to the finest-grain tags available. So google.com:8080 would become peer.host and peer.port, though some weird/unparseable custom addressing scheme would just end up being passed through as-is to peer:address.

I am (obviously) trying to avoid a situation where the "official" tags overlap more than is needed.

@yurishkuro
Copy link
Member Author

@bensigelman that's certainly an option, my concern with that was actually about adding the CPU overhead on the client app, because this helper function might turn out to be fairly involved. In contrast, having client app submit the same info via hostPort tag moves the performance concern into tracer implementation, where it can try to amortize the parsing cost via caching, or even not bother with parsing and let the out-of-process collectors do the parsing (NB: in order to submit well-formed Zipkin Thrift spans, the host:port do need to be parsed).

PS: I was wrong about "Right now we don't have a place to store something like google.com:8080 at all" - we do have peer.hostname.

@yurishkuro
Copy link
Member Author

PS: performance implication of always parsing hostPort string is especially pronounced when tracer performs heavy upfront sampling (e.g. our default is 1 in 1000)

@bhs
Copy link
Contributor

bhs commented Aug 23, 2016

@yurishkuro I still find myself feeling more enthusiastic about peer.address than peer.hostPort... In the spirit of "cattle not pets"—especially when things like kubernetes/mesos/etc are further virtualizing/obfuscating the notion of a hostname or the meaning of an ip address—I would rather that OpenTracing was encouraging people to use a more semantically meaningful identifier (than hostPort) for their peer. Or, if not encouraging it, I think it would be nice if OpenTracing chose a peer identifier that's general enough to be useful once hostnames are transient and don't mean anything in 5 years.

On that note, we could look towards the Go Dial approach: https://golang.org/pkg/net/#Dial

This would yield a peer.network and peer.address if we were literal about it. I could imagine s/network/scheme/ or similar.

@dkuebric
Copy link
Contributor

dkuebric commented Aug 23, 2016

For the reason that @yurishkuro mentions that different clients have different levels of visibility into a RPC call, we've adopted a model that allows either reporting an all-in-one RemoteURL key or a set of component keys (RemoteHost,RemotePort, etc...) depending on what is available in the client.

Ultimately we prefer RemoteURL and allow it to be having/missing various components. Format: [protocol://][domain-or-ip][:port][/path][?args], where protocol, port, path, and args are all optional. This is fairly similar to the Dial approach. We continue to support the other both for legacy reasons and also as it can be more performant in some cases where stringifying a URL isn't ideal.

peer.scheme and peer.address both make sense, and with some flexible parsing on address it will fit a large range of cases. However, given the commonness of HTTP RPC it's nice to support a URL-ish format as well, since this is often available--so despite the reservation about overlapping keys what would you think about offering an alternative/supplement there?

@cwe1ss
Copy link
Member

cwe1ss commented Aug 23, 2016

I also like peer.address - its name is clear enough to know what it should contain and broad enough to support a lot of scenarios.

@bhs
Copy link
Contributor

bhs commented Nov 16, 2016

... continued on opentracing/specification#16

@bhs bhs closed this as completed Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants