Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More ECS field on spans: "destination" #115

Closed
roncohen opened this issue Jul 10, 2019 · 30 comments
Closed

More ECS field on spans: "destination" #115

roncohen opened this issue Jul 10, 2019 · 30 comments
Milestone

Comments

@roncohen
Copy link

roncohen commented Jul 10, 2019

There are several areas that could benefit from having more data on spans. There's a lot we can do, but i suggest we start with simple things. destination is a good first candidate. This builds on and supersedes this proposal which suggests to introduce the peer namespace.

I suggest we automatically set destination.address as the remote address for spans of type ext. Also for db spans, this could be relevant if the data is available. There are probably other we can think of. Generally, i think we shouldn't constrain ourselves to certain types of spans, but set the destination fields everywhere they make sense. Would be great to get your ideas.

outdated proposal 1

destination.address is the raw address. That is, we use whatever we have. We can set destination.port if we can derive it, e.g. for http libraries where the user did not specify a port we can assume it's port 80 for http and 443 for https. Down the line we can look into also supplying destination.ip or destination.domain, but i don't think it's necessary as a start. The rest like destination.bytes and destination.packets are less relevant in this context.

If users create spans manually and set any of the following, we should also set the ECS fields:

peer.ipv4 -> destination.ip
peer.ipv6 -> destination.ip
peer.port -> destination.port
peer.hostname -> destination.domain

@elastic/apm-agent-devs Please have a look. When there are no more objections or additional clarification needed I'll post the checkbox matrix

outdated proposal 2 from @beniwohli:
  • For databases and similar, we derive the address from the connection string. Drop port if it is the standard port for the given service
  • For HTTP requests, we derive the address from the URL
    • {url.protocol}://{url.domain}:{url.port}, drop port component if it is the standard port for the given protocol

Agents send context.destination.address and context.destination.port, with the exact same meanings as defined by ECS:

  • context.destination.address: should hold either an IP (v4 or v6) or a host/domain name. The server can copy the value to destination.ip or destination.domain, depending on the value, rather than duplicating that logic in each agent, which requires no local context
  • context.destination.port: should hold a port number; agents should report default ports

For database queries, the destination information can be extracted from the connection string. For outgoing HTTP requests, the information should be extracted from the URI's authority.

We will leave translating OpenTracing peer.* tags to destination. field(s) to a later time.

Example 1: client request to https://elastic.co/foo/bar

context.destination.address: elastic.co
context.destination.port: 443 (default HTTPS port)

Example 2: client request to http://[::1]:8080/

context.destination.address: ::1
context.destination.port: 8080

Example 3: query to postgresql://postgres123.local/dbinst

context.destination.address: postgres123.local
context.destination.port: 5432 (default postgres port)

Example 4: query to user:pass@tcp(1.2.3.4:1234)/dbname (MySQL)

context.destination.address: 1.2.3.4
context.destination.port: 1234

Agent Link to agent issue
.NET elastic/apm-agent-dotnet#611
Go elastic/apm-agent-go#676
Java elastic/apm-agent-java#934
Node.js elastic/apm-agent-nodejs#1684
Python elastic/apm-agent-python#618
Ruby elastic/apm-agent-ruby#613
RUM elastic/apm-agent-rum-js#490
APM Server elastic/apm-server#2917
@hmdhk
Copy link
Contributor

hmdhk commented Jul 18, 2019

For the RUM agent the raw address might be a relative url! Since ECS doesn't include relative urls as valid addresses we need to generate address using the page origin. capturing destination.ip is not feasible for the RUM agent.

@roncohen , re. destination.port and destination.domain should we parse these from address on the agent side or can this be done on the APM server. The logic would be the same either way, so we have the opportunity to implement it only once (i.e. in APM server).

@watson
Copy link
Contributor

watson commented Aug 2, 2019

In OpenTracing, the peer.* tags are used a lot. As you mention in elastic/apm-server#813 we should just use destination for outgoing spans, which I'm generally ok with, but how do we know when a user sets peer.*, if it's outgoing on incoming?

@felixbarny
Copy link
Member

Is it safe to say that when setting peer.* on a transaction, it's incoming when setting on a span, it's outgoing?

@axw
Copy link
Member

axw commented Aug 5, 2019

That depends: do we consider the message broker to be the destination or the source for a consumer span? If it's always the destination then yes, but otherwise we might also need to consider the span.kind tag.

@SergeyKleyman
Copy link
Contributor

Can we add source.ip/port and destination.ip/port for transactions in addition to spans?
For example for [SIEM integration] Collect authentication related information #128 we are planning to add authentication attempts info to transaction events and it would make things easier if other properties (such as source.ip/port and destination.ip/port) where available in transaction events as well.

@SergeyKleyman
Copy link
Contributor

@axw By span.kind tag do you mean something like incoming and outgoing?
But wouldn't event with kind incoming be transaction and not a span? Which is okay since as I mentioned above we should add source.ip/port and destination.ip/port to transaction as well.

@axw
Copy link
Member

axw commented Aug 23, 2019

@SergeyKleyman according to https://github.com/opentracing/specification/blob/master/semantic_conventions.md, span.kind can be "producer" or "consumer" for messaging. Message consumption might be represented as a transaction (e.g. for JMS MessageListener style APIs), but you could also have a span for receiving a message from a queue.

@roncohen roncohen mentioned this issue Aug 23, 2019
6 tasks
@axw
Copy link
Member

axw commented Oct 23, 2019

elastic/apm-agent-go#664 is a POC which adds context.destination.address for database spans. If this looks OK, we should press on and get the intake API updated.

@roncohen if we're capturing the destination address for HTTP spans, we should be taking into account proxies in order to ensure we record the peer network address, for SIEM, right?

@beniwohli
Copy link
Contributor

Another POC, elastic/apm-agent-python#618. This adds the destination.address for psycodb2 (aka Python PostgreSQL driver), redis and Elasticsearch. I prioritized services that are used by opbeans-python.

The branch can be tested with

scripts/compose.py start master --with-opbeans-python --opbeans-python-agent-branch=db-destination --opbeans-python-agent-repo=beniwohli/apm-agent-python

@roncohen
Copy link
Author

@axw great! not sure how SIEM deals with proxies. Perhaps there's a separate set of proxy fields?

@beniwohli
Copy link
Contributor

Summarizing some discussion from the weekly meeting:

  • For databases and similar, we derive the address from the connection string.
    • proposal: drop port if it is the standard port for the given service
  • For HTTP requests, we derive the address from the URL
    • proposal: {url.protocol}://{url.domain}:{url.port}, drop port component if it is the standard port for the given protocol)

Summarizing open questions:

  • normalized (destination.protocol, destination.domain, destination.port) vs. denormalized (destination.address)
  • translate opentracing peer.* tags to destination. field(s)?

Proposal: go with the denormalized form, and leave opentracing translation for the next iteration.

@elastic/apm-agent-devs please comment if you disagree with the above proposals, or 👍 if you're good.

@mikker
Copy link
Contributor

mikker commented Nov 13, 2019

I don't really want to maintain a list of default ports if I can help it. Why not just send it? 4-5 digits are not going to add much to the gzipped payloads.

@beniwohli
Copy link
Contributor

beniwohli commented Nov 13, 2019

Sorry, should have added some arguments to the proposals :D this isn't really about saving space, but about trying to harmonize data. If one service is configured to connect to mysql://some-host:3306 and another one to mysql://some-host, some-host will appear as two separate MySQL services in the service map. Unless we do some kind of merging based on the default port in the UI, but I think the agents are in a better position to know default ports of services they instrument.

@roncohen
Copy link
Author

thanks for pushing this forward @beniwohli! I've updated the description to your suggestion and added the check box matrix.

@elastic/apm-agent-devs Please create your individual issues and link them in the list.

@simitt
Copy link
Contributor

simitt commented Nov 18, 2019

@beniwohli not sure you are the right person to ask for, but could you summarize the fields that will be necessary on the Intake API, so we can create the according server issue and plan for it.

@beniwohli
Copy link
Contributor

@simitt as far as I can tell, only context.destination.address for now. @roncohen probably knows if they need to be indexed/stored based on what the service map requires.

@roncohen
Copy link
Author

right, thanks @simitt. context.destination.address is 👍 . Nothing else for now.

@felixbarny
Copy link
Member

IIUC, it's quite important that agents end up sending the same context.destination.address if they connect to the same service, right?

I think we should update the spec (https://github.com/elastic/apm/blob/master/docs/agent-development.md) and add some examples or acceptance tests for things like Postgres, Elasticsearch, MongoDB, Redis, etc.

@wolframhaussig
Copy link

Will context.destination.address support more than 1 value? Background of my question is #107 - Proposal: add Database Link to span context. We connect to a database and jump to a second one using a db link. Therefore we would have 2 target DBs

@simitt
Copy link
Contributor

simitt commented Nov 18, 2019

And just to clarify this should be added only to the context for spans right? Or are there any use cases where a destination would also be available for a transaction or error?

@axw
Copy link
Member

axw commented Nov 19, 2019

@wolframhaussig it would not, and this is only intended to hold network addresses

@simitt yes, only spans

@axw
Copy link
Member

axw commented Nov 19, 2019

Seeing as the agents are only producing one field, the server will presumably need to explode the address into ip/domain/port.

For IPv6 addresses, I think we should format them as in URLs, i.e. by surrounding the IPv6 address component in square brackets, regardless of whether there's a port included.

e.g. given http://[::1]:80, we should report [::1], and given http://[::1]:8080, report [::1]:8080

@beniwohli
Copy link
Contributor

Seeing as the agents are only producing one field, the server will presumably need to explode the address into ip/domain/port.

Can you expand on that? AFAIK, it's not necessary for the service map to have the components of the address split up, or is it?

@axw
Copy link
Member

axw commented Nov 19, 2019

Can you expand on that? AFAIK, it's not necessary for the service map to have the components of the address split up, or is it?

Not directly, but I presume we'll be storing the information in ECS destination.* fields, which do not permit the hostname/IP and port to be stored together. The service maps code would consult those fields to produce the labels. @roncohen is that accurate?

@simitt
Copy link
Contributor

simitt commented Nov 19, 2019

From an APM/SIEM integration points of view we want to at least extract the destination.ip from the destination.address (elastic/apm-server#2917)

@SergeyKleyman
Copy link
Contributor

@axw

Could you please clarify the following?

e.g. given http://[::1]:80, we should report [::1], and given http://[::1]:8080, report [::1]:8080

I was under impression that span.context.destination.address should include protocol as well so http://[::1]:80 should be reported as http://[::1]:80. Although in this case naming this field span.context.destination.address seems a little bit wrong - maybe span.context.destination.url or span.context.destination.link but we can discuss the name of the field after agreeing on its intended content.

@axw
Copy link
Member

axw commented Nov 20, 2019

I had a chat with Ron offline. There's a few issues here.

For service maps, we really want to be able to aggregate on the address and port as one field. That doesn't work if we stick to ECS definitions, as destination.address is expected to hold either an IP or a domain name, while destination.port holds the port number.

Later on we'll likely want to add other types of destination labels for service maps, such as queue or topic names when sending to a message queue or bus. We were deferring this, but given that we'll need a separate field to combine address and port, it probably makes sense for that field to hold these types of labels too.

We've been talking about omitting the port number from the context due to a service maps requirement. This data will also be used for SIEM integration (#115 (comment)); the service maps requirement should not impact SIEM.

Due to the interference caused by these, I'd like to refocus this issue specifically on recording network destination information for SIEM, and create a separate issue for capturing a more abstract, logical service destination label. That label is where we'll consider omitting ports and so on.

So for this specific issue, I'd like to wind back the proposal half way: agents send context.destination.address and context.destination.port, with the exact same meanings as defined by ECS:

  • context.destination.address: should hold either an IP (v4 or v6) or a host/domain name. The server can copy the value to destination.ip or destination.domain, depending on the value, rather than duplicating that logic in each agent, which requires no local context.
  • context.destination.port: should hold a port number; agents will not omit default ports.

Example 1: client request to https://elastic.co/foo/bar

  • context.destination.address: elastic.co
  • context.destination.port: 443

Example 2: client request to http://[::1]:8080/

  • context.destination.address: ::1
  • context.destination.port: 8080

Example 3: query to postgresql://postgres123.local/dbinst

  • context.destination.address: postgres123.local
  • context.destination.port: 5432 (default postgres port)

Example 4: query to user:pass@tcp(1.2.3.4:1234)/dbname (MySQL)

  • context.destination.address: 1.2.3.4
  • context.destination.port: 1234

@axw
Copy link
Member

axw commented Nov 22, 2019

The 👀 have it, I've updated the description. I'll follow up with a separate issue for the logical destination soon.

@axw
Copy link
Member

axw commented Dec 6, 2019

Note: for IPv6 addresses, surrounding the address with square brackets is only relevant where you have (or may have) the port alongside, e.g. [::1]:80, http://[::1]:80, http://[::1] (square brackets still required because the port is optional). The square brackets are there to disambiguate the IPv6 address, which contains colons, from the proceeding port, which is separated by a colon.

When recording an IPv6 address separately from a port, as we are in the case of context.destination.address, then it should be recorded in its canonical form: without square brackets, e.g. ::1.

@graphaelli
Copy link
Member

this is done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests