Proposal: Introduce connection prefix, move source / destination #51

ruflin · 2018-07-17T12:32:17Z

There have been recently several discussions around source, destination and connection recently, especially in #9. The conclusion from my side is that source and destination normally belongs to a connection and we actually miss a connection prefix. Also some information from network like forward_ip more belong to a connection then network.

An additional change I made to source and destination is that they both contain now a host prefix. All the fields in source and destination also exist in host. The host prefix can be reused here too. This makes ECS very predictable that every time host.* shows up it will contain the same fields. Also source and destination could contain additional data like the location, see #50 for more details.

The connection fields now look as following:

Field	Description	Type
`connection.destination.host.ip`	IP address of the destination. Can be one or multiple IPv4 or IPv6 addresses.	ip
`connection.destination.host.name`	Hostname of the destination.	keyword
`connection.destination.host.port`	Port of the destination.	long
`connection.destination.host.mac`	MAC address of the destination.	keyword
`connection.destination.host.domain`	Destination domain.	keyword
`connection.destination.host.subdomain`	Destination subdomain.	keyword
`connection.source.host.ip`	IP address of the source. Can be one or multiple IPv4 or IPv6 addresses.	ip
`connection.source.host.name`	Hostname of the source.	keyword
`connection.source.host.port`	Port of the source.	long
`connection.source.host.mac`	MAC address of the source.	keyword
`connection.source.host.domain`	Source domain.	keyword
`connection.source.host.subdomain`	Source subdomain.	keyword
`connection.direction`	Direction of the network traffic. Recommended values are: * inbound * outbound * unknown	keyword
`connection.forwarded_ip`	Host IP address when the source IP address is the proxy.	ip

I opened a PR to discuss this instead of an issue as it will allow us to discuss the high level parts as comment but also details directly in the code.

There have been recently several discussions around source, destination and connection recently, especially in elastic#9. The conclusion from my side is that source and destination normally belongs to a connection and we actually miss a connection prefix. Also some information from network like `forward_ip` more belong to a connection then network. An additional change I made to source and destination is that they both contain now a host prefix. All the fields in source and destination also exist in `host`. The host prefix can be reused here too. This makes ECS very predictable that every time `host.*` shows up it will contain the same fields. Also source and destination could contain additional data like the location, see elastic#50 for more details. The connection fields now look as following: | Field | Description | Type | |---|---|---|---|---| | <a name="connection.destination.host.ip"></a>`connection.destination.host.ip` | IP address of the destination. Can be one or multiple IPv4 or IPv6 addresses. | ip | | <a name="connection.destination.host.name"></a>`connection.destination.host.name` | Hostname of the destination. | keyword | | <a name="connection.destination.host.port"></a>`connection.destination.host.port` | Port of the destination. | long | | <a name="connection.destination.host.mac"></a>`connection.destination.host.mac` | MAC address of the destination. | keyword | | <a name="connection.destination.host.domain"></a>`connection.destination.host.domain` | Destination domain. | keyword | | <a name="connection.destination.host.subdomain"></a>`connection.destination.host.subdomain` | Destination subdomain. | keyword | | <a name="connection.source.host.ip"></a>`connection.source.host.ip` | IP address of the source. Can be one or multiple IPv4 or IPv6 addresses. | ip | | <a name="connection.source.host.name"></a>`connection.source.host.name` | Hostname of the source. | keyword | | <a name="connection.source.host.port"></a>`connection.source.host.port` | Port of the source. | long | | <a name="connection.source.host.mac"></a>`connection.source.host.mac` | MAC address of the source. | keyword | | <a name="connection.source.host.domain"></a>`connection.source.host.domain` | Source domain. | keyword | | <a name="connection.source.host.subdomain"></a>`connection.source.host.subdomain` | Source subdomain. | keyword | | <a name="connection.direction"></a>`connection.direction` | Direction of the network traffic. Recommended values are: * inbound * outbound * unknown | keyword | | <a name="connection.forwarded_ip"></a>`connection.forwarded_ip` | Host IP address when the source IP address is the proxy. | ip | I opened a PR to discuss this instead of an issue as it will allow us to discuss the high level parts as comment but also details directly in the code.

ruflin · 2018-07-17T12:41:55Z

As discussed in #9 there are also cases where the host from your and destination should end up in one field. For these cases the copy_to feature could be used. Here a small example:

PUT ecs
{
  "mappings": {
    "_doc": {
      "properties": {
        "connection.source.host.name": {
          "type": "keyword",
          "copy_to": "host.name" 
        },
        "connection.destination.host.name": {
          "type": "keyword",
          "copy_to": "host.name" 
        },
        "host.name": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT ecs/_doc/1
{
  "connection.source.host.name": "elastic.co",
  "connection.destination.host.name": "ruflin.com"
}

GET ecs/_search
{
  "query": {
    "match": {
      "host.name": { 
        "query": "ruflin.com"
      }
    }
  }
}

The host.name field can not be used to query for host.name from source and destination.

urso · 2018-07-25T12:30:03Z

As connection.forwarded_ip is for a proxy, I wonder if we want to be able to add more fields to forward? E.g. host, location, datacenter, domain...

strawgate · 2018-07-25T19:45:13Z

This is a pretty fundamental change to the schema -- should we be planning that this will be included if we are hoping to conform to this schema? Do we know when this might be approved or merged?

praseodym · 2018-07-25T20:59:42Z

I quite dislike how this pull request introduces vastly longer field names for mostly superfluous categorisation. In terms of daily usability I much prefer source.ip over connection.source.host.ip.

ruflin · 2018-07-26T11:57:15Z

@urso Interesting idea. So you are basically saying a proxy is also a host with additional info? Is forward_ip always coming from a proxy?

@strawgate There is no conclusion yet on this topic and the reason it's here for discuss. Outcome not clear yet.

@praseodym Can you share a bit more background on the problem of long field names? If it is the typing, I wonder how much the auto complete in newer Kibana versions solves this issue?

urso · 2018-07-26T12:44:51Z

Interesting idea. So you are basically saying a proxy is also a host with additional info? Is forward_ip always coming from a proxy?

No idea where forwarded_ip comes from. Use case and such. Descriptions says something proxy, so I wondered why a proxy is not allowed to have a host and other similar settings. Why does source and destination have a rich structure while you only reserve one field for the proxy?

In the schema the source field can be either a proxy or the actual source. Presence of forward_ip changes the meaning of the source ip? Why not source.origin_ip? It's just that something doesn't feel consistent

Checking the proposal again, I wonder if connection and <X>.host carries redundant information. I understand we used to have source.host, which implicitely states that source is a network based endpoint. But by introducing a connection namespace, the schema implies that source/destination are network based endpoints. How about
connection.source.mac, connection.source.ip, connection.source.port, connection.source.hostname, connection.source.fqdn... ?

spartan782 · 2018-08-01T19:12:00Z

I think that there should be a consideration for tools that do not log Source and Destination. I propose there should be 2 different fields. One for tools that use source and destination, and another for tools that are sessionized. For example, the tool bro uses Originating and Responding because it keeps track and logs each conversation rather than individual packets.

dcode · 2018-08-02T03:52:30Z

I like the concept for connection oriented tools, but can't we just put it under network instead of connection?

It's shorter.
network is already a prefix
Where this really pays off is searching across network-oriented logs (i.e. bro, netflow, suricata, etc), and process oriented logs that contain information gleaned from the operating system socket listings.
I'm a huge fan of a catch-all hosts field. One thing we do in RockNSM^1 today is add a pivot for @meta.related_ips, which is a list of all IPs that fall into a given event. This makes pivoting across related logs dead simple for analysts. I would propose a top-level prefix of related which enables pivoting across multiple types (e.g. related.ip, related.hostname, related.mac, etc). This would enable a top-level semantic that makes sense for any sort of relation, based upon type. Kibana + Elastic handles the list of keywords/atomic types really well for purposes of filtering/pivoting.

As for @spartan782's comment, the way Bro actually does this today is for the connection log, the origin is marked as the host that started the conversation. At the TCP/IP level this doesn't really make a difference. In the protocol-specific logs, the origin is the host that started that particular protocol conversation. SMTP is one example where this nuance is important. Host A may connect to Host B using TCP on port 25 (keeping it simple). Host A is the TCP/IP originating host. However, upon connection, Host B initiate the SMTP protocol by sending multiple emails for further processing/routing. At the SMTP protocol layer, Host B is actually the originating host. FTP-DATA connections are similar in that the directions of the TCP connections are usually opposite of who initiated the transfer.

I propose that those cases are actually annotated in the protocol-specific prefixes (i.e. smtp.source or smtp.origin <-- this is also semantically valid terminology with respect to the protocol itself.)

^1 Migration to ECS is underway for RockNSM, so this is a timely topic.

webmat · 2018-08-03T17:39:07Z

Proxy IPs come from reverse proxies. The reason it's an array is that there can be more than one reverse proxy in front of the application logging the event. For example: Cloudflare => NGINX => application. Your list of forwarded IPs would include Cloudflare's edge node, then your NGINX load balancer's. You can have more than two, of course, just add in Varnish and Apache running PHP-fpm as the "application".

I love the idea of building a plain array of all seen IPs for a given event, for ease of pivoting. This would not only help catch situations where a proxy is compromised, and that's the hostile entity, but would also simplify pivoting for situations where we have potentially hostile IPs in "source" as well as in "destination" IPs, like DNS (see this discussion for more context).

ruflin · 2018-08-07T08:17:02Z

++ on having a place for all ip addresses (and other fields). My idea here so far is that the higher up in ECS, the more generic it is. As an example:

connection.source.host.ip: Contains only the source ip
connection.host.ip: Can contain source, destination ip
host.ip: Can contain all ip's appearing in the event.

strawgate · 2018-08-08T18:08:47Z

host.ip containing all ip addresses from the event seems confusing.

The norm is to have things like src_ip and dst_ip, the current ecs makes that source.ip and destination.ip, this now makes it connection.destination.host.ip and connection.source.host.ip and I'm not entirely sure there is a benefit to this.

Fields which relate to a host are vast (architecture, OS, timezone, etc) whereas fields that relate to a host that is part of a connection that a router, switch, or firewall witnessed are minimal so I'm not exactly sure why prefixing each field with the object type here is useful and stuff like this:

connection.source.host.ip: Contains only the source ip
connection.destination.host.ip: Contains only the destination ip
connection.host.ip: Can contain source, destination ip

Don't really help the confusion.

webmat · 2018-08-08T19:06:34Z

I'm not sure I see the benefit of adding host. under connection.source and connection.destination. When I see host, I understand it as the server or process that generated the event or the log.

In a connection scenario, the process only knows host details about its own side of the connection, and not the other side. This means the bulk of the host details will actually shift around, between source and destination. Two examples

Application handling inbound requests:

connection.source.host.ip: A remote IP

connection.destination.host.ip: My server's IP
connection.destination.host.name: My server's hostname
connection.destination.host.id: ...
connection.destination.host.timezone: ...

Application calling out to an external system:

connection.source.host.ip: My server's IP
connection.source.host.name: My server's hostname
connection.source.host.id: ...
connection.source.host.timezone: ...

connection.destination.host.ip: A remote IP

This is what I mean by the host details shifting around.

I'm ok with having connection.source and connection.destination, but I think host. belongs outside of there. I like host at the top level, actually. It's a few informations I'm used to having on virtually all of my log events. So keeping host outside of connection, we always have this shape of event, regardless of direction.

connection.source.ip: IP of initiator
connection.destination.ip: IP of queried service
host.name: My hostname
host.id: ...
host.timezone: ...

MikePaquette · 2018-08-14T22:15:44Z

@ruflin thanks for this PR. Clearly a great topic and a needed discussion, as it's generated a lot of sub-topics!

Here's my $0.02

I am not in favor of creating a new top-level namespace/object/prefix for connection.*
I am not in favor of re-using the host.* object anywhere except as a top level namespace/object/prefix.
I am in favor of using the existing network.* namespace/object/prefix for flow or connection-related fields.

@strawgate #51 (comment) I agree, this would be a big change, and I prefer to work through any shortcoming with the current set of namespaces/objects/prefixes.

@praseodym #51 (comment) I agree that vastly longer names without significant value, will detract from two key ECS benefits, Ease of Recall, and Ease of Deduction, and therefore should be avoided.

@urso #51 (comment) the network.forwarded_ip field definition may need some improvement. The original intent was to populate this field with the IP address(es) of network entity(ies) (e.g., proxies) forwarding network traffic associated with an event, when the source.ip is extracted from a field such as the x-forwarded-for HTTP header. Since the x-forwarded-for header contains both the "client" IP and the list of other proxies that have forwarded it, the network.forwarded_ip field would hold a list of IP addresses of all the proxies that may have forwarded this network traffic.

@spartan782 #51 (comment) The source.* and destination.* namespaces/objects/prefixes are indeed defined in ECS to cover packet-level, session/connection-level, and application-level events, even when those events do not use the names "source" and "destination" to refer to their participants, or don't contain source and destination fields. The only shortcoming with this approach occurs when you need to have multiple-levels in the same documents, as highlighted by Rob Cowart in #9 (comment), which I have a proposal for fixing in the network.* namespace/object/prefix by adding a few fields to be used only in that case. More details soon.

@dcode #51 (comment) +1 to keeping the connection-related fields in the network.* namespace/object/prefix. Also, I am working on a mapping table (sorry not code) of the bro conn.log fields to ECS. Would love to compare this to your mapping. Stay tuned.

@strawgate #51 (comment). Agreed with your point that details (fields) relating to source and destinations in a network event will be fewer than those relating to a host in a host event. This was a key factor in originally choosing host.* , source.* , and destination.* as distinct top-level namespaces/objects/prefixes in ECS.

@webmat #51 (comment) Agreed, thanks.

ruflin · 2018-08-15T08:42:44Z

@webmat Having host inside source and destination would not remove it from the top level. I see host.* like a struct in Golang that can be reused in many places.

webmat · 2018-10-25T03:27:44Z

@ruflin Are you ok if we close this? I think it's clear we're not going to move in this direction after all :-)

ruflin · 2018-10-26T10:55:36Z

I think it's not something we do for 1.0 of ECS but it's still something I think we should do in the long term to support more complex connection data. Based on the recent changes the initial PR will need updating but my proposal to have a connection object is still standing. I suggest to keep this open but currently put it on hold.

ruflin · 2018-12-07T08:22:21Z

Now that we are introducing also server / client, let's close this for now. I still like the idea of a connection though ;-)

ruflin added the discuss label Jul 17, 2018

ruflin mentioned this pull request Jul 17, 2018

Doing geoip on more than one address... #9

Closed

ruflin requested review from webmat and andrewkroh July 25, 2018 08:20

willemdh mentioned this pull request Aug 1, 2018

Top level: "client" and "server" #63

Closed

ruflin mentioned this pull request Aug 2, 2018

host.name vs hostname, host.ip vs ip inconsistencies #62

Open

This was referenced Aug 8, 2018

Propose new top-level prefix 'related' #67

Closed

Compose ECS objects vs reuse objects #71

Closed

webmat mentioned this pull request Sep 18, 2018

Getting ECS to 1.0 #115

Closed

26 tasks

This was referenced Oct 31, 2018

Post GA tasks #160

Closed

Beta1 related but not blocker tasks #161

Closed

webmat mentioned this pull request Nov 16, 2018

Proposal for more straightforward network metrics fields. #179

Merged

ruflin closed this Dec 7, 2018

ruflin deleted the connection branch December 7, 2018 08:22

jeffrysleddens mentioned this pull request Aug 18, 2020

forwarded_ip - no place to put geo ... fields #523

Open

webmat mentioned this pull request Aug 18, 2020

[meta] Add support for proxies in ECS #938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Introduce connection prefix, move source / destination #51

Proposal: Introduce connection prefix, move source / destination #51

ruflin commented Jul 17, 2018 •

edited

Loading

ruflin commented Jul 17, 2018 •

edited

Loading

urso commented Jul 25, 2018

strawgate commented Jul 25, 2018

praseodym commented Jul 25, 2018

ruflin commented Jul 26, 2018

urso commented Jul 26, 2018 •

edited

Loading

spartan782 commented Aug 1, 2018

dcode commented Aug 2, 2018

webmat commented Aug 3, 2018 •

edited

Loading

ruflin commented Aug 7, 2018

strawgate commented Aug 8, 2018 •

edited

Loading

webmat commented Aug 8, 2018

MikePaquette commented Aug 14, 2018

ruflin commented Aug 15, 2018

webmat commented Oct 25, 2018

ruflin commented Oct 26, 2018

ruflin commented Dec 7, 2018

Proposal: Introduce connection prefix, move source / destination #51

Proposal: Introduce connection prefix, move source / destination #51

Conversation

ruflin commented Jul 17, 2018 • edited Loading

ruflin commented Jul 17, 2018 • edited Loading

urso commented Jul 25, 2018

strawgate commented Jul 25, 2018

praseodym commented Jul 25, 2018

ruflin commented Jul 26, 2018

urso commented Jul 26, 2018 • edited Loading

spartan782 commented Aug 1, 2018

dcode commented Aug 2, 2018

webmat commented Aug 3, 2018 • edited Loading

ruflin commented Aug 7, 2018

strawgate commented Aug 8, 2018 • edited Loading

webmat commented Aug 8, 2018

MikePaquette commented Aug 14, 2018

ruflin commented Aug 15, 2018

webmat commented Oct 25, 2018

ruflin commented Oct 26, 2018

ruflin commented Dec 7, 2018

ruflin commented Jul 17, 2018 •

edited

Loading

ruflin commented Jul 17, 2018 •

edited

Loading

urso commented Jul 26, 2018 •

edited

Loading

webmat commented Aug 3, 2018 •

edited

Loading

strawgate commented Aug 8, 2018 •

edited

Loading