Normalization Effort: Part 1: Remove string ip fields WIP #734

Phrozyn · 2018-08-13T23:29:33Z

Do not merge!

Purpose of this change:
With ES 5.x the ip field type now supports both ipv4 and ipv6. This reduces the need to add these additional fields as string fields. This change will reduce the 6 field types we currently use to track IP information to only 2.

sourceipaddress (field type "ip") <-> no change
sourceipv4address (field type "string") --> sourceipaddress (field type "ip")
sourceipv6address (field type "string") --> sourceipaddress (field type "ip")
destinationipaddress (field type "ip") <-> no change
destinationipv4address (field type "string") --> destinationipaddress (field type "ip")
destinationipv6address (field type "string") --> destinationipaddress (field type "ip")

Left to do:
Test:

bro event types: These are working fine
sqsFluentd event types:
Aggregations: Will have to check kibana for saved aggregations to see if these will require being rewritten to use the sourceipaddress or destinationipaddress field. There are no aggregations that depend on the string field.
Visualizations: These use the ipv4address field generated by the CollectAttackers.py script, which bases it on the sourceipaddress field, and uses an isIPv4 function to validate the value and send to mongo. So nothing breaks here.
Alerting: Alerts appear to be working fine.
okta2mozdef.py: validated that this script only utilizes the sourceipaddress field and uses a function to validate if it's ipv4.
createIPBlockList.py: validated that this script only searches for the ipv4address in the mongodb. Out of scope for ES field mappings.
CollectAttackers : validated that this script only utilizes the sourceipaddress field for it's functionality. It does add an ipv4address field to mongo, but that is not in the scope of our effort.

Testing Completed:

eventtask event types populate correctly.
unit test completes successfully.

Number of Alerts that depend on the removed fields: 0
Files with ipv4 or ipv6 strings:

CollectAttackers.py
createIPBlockList.py
okta2mozdef.py
geomodel.py
defaultMappingTemplate.json
elasticsearch.yml
examples directory has a lot of files that will need to be cleaned up.
vr.html
ipFixup.py
broFixup.py
fluentdSqsFixup.py
vulnerability.py
test_geomodel.py
test_broFixup.py
test_parse_sshd.py (not sure why this utilizes an ipv4 string field, when the plugin doesn't use one at all.)
test_vulnerability.py

It seems like we can still use netaddr to validate if an IP is ipv4 or ipv6 in these scripts, but they don't necessarily need to be fields within the mapping.

New Finding:
Seems like aggregation of ipv6 is supported in ES, but not in Kibana using the IP range bucket.
To aggregate the ip field you'd need to use "Terms", but "significant terms" will omit the IP field.
I created a post on their site to get more information around this.
https://discuss.elastic.co/t/no-ipv6-range-aggregation-bucket/144928

jeffbryner · 2018-08-14T15:26:37Z

There are alerts and visualizations that use the string version especially for aggregations. I don't see a need to do this since it just reduces the flexibility and leaves us open to issues in ES where they misinterpret an IP (happened before, especially in aggregations).

Lets discuss before further work.

Phrozyn · 2018-08-14T15:38:16Z

@jeffbryner FYI, there are no alerts that actually use this field, they all use sourceipaddress or destinationipaddress. As mentioned in my OP ES now supports both versions of IP addresses (link for reference: https://www.elastic.co/blog/indexing-ipv6-addresses-in-elasticsearch), thus the issue seen previously should no longer be a problem. We can handle what version an IP is within the code without having to explode our mapping.

jeffbryner · 2018-08-14T15:46:08Z

Might not be alerts, but there is definitely other code using these fields. Again, I don't see a need but I do see lots of regression testing that would be necessary.

Phrozyn · 2018-08-14T16:10:40Z

I completely agree on that point. I'll add further test cases to the OP.

jeffbryner · 2018-08-14T21:02:28Z

Hey, caught up with Tristan on this. My apologies for not realizing this is part of the normalization effort, I should have looked for context before commenting.

https://github.com/mozilla/MozDef/search?q=ipaddress&unscoped_q=ipaddress may be a useful start at all the places that may be assuming string/ipv4. In particular we should be cautious about offering up an ipv6 to a routine that is not expecting it.

Phrozyn · 2018-08-14T21:59:05Z

I do understand the concern, and do plan on ensuring everything is fully tested and scoped before any of this is implemented. I was using vscode to search for occurrances of "ipv" in our repo.
Some items that use functions for ipv4 use the ipaddress field to validate what type of IP it is, the code that did transform and populate the ipv4/ipv6 address fields was changed in my PR and tested. However, I have much more testing to do to ensure there aren't weird anomalies presented as a result of this. For example I have not looked through the cron scripts I mentioned in my OP as things to review.

…elds

Phrozyn · 2018-11-20T20:55:20Z

Going to close this PR for now, until Kibana adds a method for Aggregating on ipv6 IPs (they only aggregate on IPv4) we can't remove the text fields as it will cause some problems.

Phrozyn added 4 commits August 13, 2018 18:01

Removing ipv4 and ipv6 fields from mq plugins

14784ac

merging fixes from previous commit

301c481

removing erroneous line from previous commit

d1e5e59

Adding modified unit test with string ip fields removed.

d56cd06

Phrozyn added this to the Release v1.32 milestone Aug 13, 2018

Modifying test for unknown ips being replaced with zeroed address.

a308113

Phrozyn changed the title Remove string ip fields **DO NOT MERGE YET** Normalization Effort: Part 1: Remove string ip fields **DO NOT MERGE YET** Aug 14, 2018

Phrozyn changed the title Normalization Effort: Part 1: Remove string ip fields **DO NOT MERGE YET** Normalization Effort: Part 1: Remove string ip fields **WIP** Aug 14, 2018

Phrozyn added 3 commits August 17, 2018 14:42

Removal of IPv4 and IPv6 string field additions, and comment updates.

59d69fd

Adding ipv6 to initial population of sourceipaddress field in details.

9cd2bd3

removing sourceipv4address from default event, it is not used here.

6690e6d

pwnbus modified the milestones: Release v1.32, Release v1.33 Sep 26, 2018

Merge remote-tracking branch 'origin/master' into remove_string_ip_fi…

739730e

…elds

pwnbus modified the milestones: Release v1.33, Release v1.34 Oct 31, 2018

Phrozyn closed this Nov 20, 2018

pwnbus deleted the remove_string_ip_fields branch December 14, 2018 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization Effort: Part 1: Remove string ip fields WIP #734

Normalization Effort: Part 1: Remove string ip fields WIP #734

Phrozyn commented Aug 13, 2018 •

edited

jeffbryner commented Aug 14, 2018

Phrozyn commented Aug 14, 2018

jeffbryner commented Aug 14, 2018

Phrozyn commented Aug 14, 2018

jeffbryner commented Aug 14, 2018

Phrozyn commented Aug 14, 2018

Phrozyn commented Nov 20, 2018

Normalization Effort: Part 1: Remove string ip fields **WIP** #734

Normalization Effort: Part 1: Remove string ip fields **WIP** #734

Conversation

Phrozyn commented Aug 13, 2018 • edited

jeffbryner commented Aug 14, 2018

Phrozyn commented Aug 14, 2018

jeffbryner commented Aug 14, 2018

Phrozyn commented Aug 14, 2018

jeffbryner commented Aug 14, 2018

Phrozyn commented Aug 14, 2018

Phrozyn commented Nov 20, 2018

Normalization Effort: Part 1: Remove string ip fields WIP #734

Normalization Effort: Part 1: Remove string ip fields WIP #734

Phrozyn commented Aug 13, 2018 •

edited