Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Community ID Flow Hashing #3948

Open
Damientinker opened this issue Nov 4, 2019 · 4 comments
Open

Feature request - Community ID Flow Hashing #3948

Damientinker opened this issue Nov 4, 2019 · 4 comments

Comments

@Damientinker
Copy link

Feature request - Community ID Flow Hashing

Why, How & Usecase

https://github.com/corelight/community-id-spec

(Description is taken from Project page)
When processing flow data from a variety of monitoring applications (such as Zeek and Suricata), it's often desirable to pivot quickly from one dataset to another. While the required flow tuple information is usually present in the datasets, the details of such "joins" can be tedious, particular in corner cases. This spec describes "Community ID" flow hashing, standardizing the production of a string identifier representing a given network flow, to reduce the pivot to a simple string comparison.

Suggestions to implementation

Maybe a function like ipv42num, not really sure what would be best here.

1:
idflowhash_full(IP src / IP dst / IP proto / source port / dest port)

2:
idflowhash_icmp(IP src / IP dst / IP proto / ICMP type + "counter-type" or code)

3:
idflowhash_basic(IP src / IP dst / IP proto)

Could also be a toggle in the function

1:
idflowhash(IP src / IP dst / IP proto / source port / dest port, "full")

2:
idflowhash(IP src / IP dst / IP proto / ICMP type + "counter-type" or code, "icmp")

3:
idflowhash(IP src / IP dst / IP proto, "basic")
@davidelang
Copy link
Contributor

davidelang commented Nov 4, 2019 via email

@Damientinker
Copy link
Author

Hello David

Was originally playing a bit around with hash functions. But could not really se a way to do it. So it might be possible !

There is a "psude code" example on the project github.

function community_id_v1(ipaddr saddr, ipaddr daddr, port sport, port dport, int proto, int seed=0)
{
    # Get seed and all tuple parts into network byte order
    seed = pack_to_nbo(seed); # 2 bytes
    saddr = pack_to_nbo(saddr); # 4 or 16 bytes
    daddr = pack_to_nbo(daddr); # 4 or 16 bytes
    sport = pack_to_nbo(sport); # 2 bytes
    dport = pack_to_nbo(dport); # 2 bytes

    # Flip the endpoints as needed to abstract away directionality
    saddr, daddr, sport, dport = order_endpoints(saddr, daddr, sport, dport);

    # Produce 20-byte SHA1 digest. "." means concatenation. The
    # proto value is one byte in length and followed by a 0 byte
    # for padding.
    sha1_digest = sha1(seed . saddr . daddr . proto . 0 . sport . dport)

    # Prepend version string to base64 rendering of the digest.
    # v1 is currently the only one available.
    return "1:" + base64(sha1_digest)
}

function community_id_icmp(int seed, ipaddr saddr, ipaddr daddr, int type, int code, int seed=0)
{
    port sport, dport;

    # ICMP / ICMPv6 endpoint mapping directly inspired by Zeek
    sport, dport = map_icmp_to_ports(type, code);

    # ICMP is IP protocol 1, ICMPv6 would be 58
    return community_id_v1(saddr, daddr, sport, dport, 1, seed); 
}

@davidelang
Copy link
Contributor

davidelang commented Nov 5, 2019 via email

@Damientinker
Copy link
Author

Unsure what you are asking - There is a reference implementation here:
https://github.com/corelight/pycommunityid

That shows a example like this:

$ community-id tcp 10.0.0.1 10.0.0.2 10 20
1:9j2Dzwrw7T9E+IZi4b4IVT66HBI=

Think the Technical Details here her:
https://github.com/corelight/community-id-spec#technical-details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants