Skip to content
This repository has been archived by the owner on Nov 3, 2021. It is now read-only.

Normalization Effort: Part 1: Remove string ip fields **WIP** #734

Closed
wants to merge 9 commits into from

Conversation

Phrozyn
Copy link
Contributor

@Phrozyn Phrozyn commented Aug 13, 2018

Do not merge!

Purpose of this change:
With ES 5.x the ip field type now supports both ipv4 and ipv6. This reduces the need to add these additional fields as string fields. This change will reduce the 6 field types we currently use to track IP information to only 2.

sourceipaddress (field type "ip") <-> no change
sourceipv4address (field type "string") --> sourceipaddress (field type "ip")
sourceipv6address (field type "string") --> sourceipaddress (field type "ip")
destinationipaddress (field type "ip") <-> no change
destinationipv4address (field type "string") --> destinationipaddress (field type "ip")
destinationipv6address (field type "string") --> destinationipaddress (field type "ip")

Left to do:
Test:

  • bro event types: These are working fine
  • sqsFluentd event types:
  • Aggregations: Will have to check kibana for saved aggregations to see if these will require being rewritten to use the sourceipaddress or destinationipaddress field. There are no aggregations that depend on the string field.
  • Visualizations: These use the ipv4address field generated by the CollectAttackers.py script, which bases it on the sourceipaddress field, and uses an isIPv4 function to validate the value and send to mongo. So nothing breaks here.
  • Alerting: Alerts appear to be working fine.
  • okta2mozdef.py: validated that this script only utilizes the sourceipaddress field and uses a function to validate if it's ipv4.
  • createIPBlockList.py: validated that this script only searches for the ipv4address in the mongodb. Out of scope for ES field mappings.
  • CollectAttackers : validated that this script only utilizes the sourceipaddress field for it's functionality. It does add an ipv4address field to mongo, but that is not in the scope of our effort.

Testing Completed:

  • eventtask event types populate correctly.
  • unit test completes successfully.

Number of Alerts that depend on the removed fields: 0
Files with ipv4 or ipv6 strings:

  • CollectAttackers.py
  • createIPBlockList.py
  • okta2mozdef.py
  • geomodel.py
  • defaultMappingTemplate.json
  • elasticsearch.yml
  • examples directory has a lot of files that will need to be cleaned up.
  • vr.html
  • ipFixup.py
  • broFixup.py
  • fluentdSqsFixup.py
  • vulnerability.py
  • test_geomodel.py
  • test_broFixup.py
  • test_parse_sshd.py (not sure why this utilizes an ipv4 string field, when the plugin doesn't use one at all.)
  • test_vulnerability.py

It seems like we can still use netaddr to validate if an IP is ipv4 or ipv6 in these scripts, but they don't necessarily need to be fields within the mapping.

New Finding:
Seems like aggregation of ipv6 is supported in ES, but not in Kibana using the IP range bucket.
To aggregate the ip field you'd need to use "Terms", but "significant terms" will omit the IP field.
I created a post on their site to get more information around this.
https://discuss.elastic.co/t/no-ipv6-range-aggregation-bucket/144928

@Phrozyn Phrozyn added this to the Release v1.32 milestone Aug 13, 2018
@jeffbryner
Copy link
Collaborator

There are alerts and visualizations that use the string version especially for aggregations. I don't see a need to do this since it just reduces the flexibility and leaves us open to issues in ES where they misinterpret an IP (happened before, especially in aggregations).

Lets discuss before further work.

@Phrozyn
Copy link
Contributor Author

Phrozyn commented Aug 14, 2018

@jeffbryner FYI, there are no alerts that actually use this field, they all use sourceipaddress or destinationipaddress. As mentioned in my OP ES now supports both versions of IP addresses (link for reference: https://www.elastic.co/blog/indexing-ipv6-addresses-in-elasticsearch), thus the issue seen previously should no longer be a problem. We can handle what version an IP is within the code without having to explode our mapping.

@jeffbryner
Copy link
Collaborator

Might not be alerts, but there is definitely other code using these fields. Again, I don't see a need but I do see lots of regression testing that would be necessary.

@Phrozyn
Copy link
Contributor Author

Phrozyn commented Aug 14, 2018

I completely agree on that point. I'll add further test cases to the OP.

@Phrozyn Phrozyn changed the title Remove string ip fields **DO NOT MERGE YET** Normalization Effort: Part 1: Remove string ip fields **DO NOT MERGE YET** Aug 14, 2018
@Phrozyn Phrozyn changed the title Normalization Effort: Part 1: Remove string ip fields **DO NOT MERGE YET** Normalization Effort: Part 1: Remove string ip fields **WIP** Aug 14, 2018
@jeffbryner
Copy link
Collaborator

Hey, caught up with Tristan on this. My apologies for not realizing this is part of the normalization effort, I should have looked for context before commenting.

https://github.com/mozilla/MozDef/search?q=ipaddress&unscoped_q=ipaddress may be a useful start at all the places that may be assuming string/ipv4. In particular we should be cautious about offering up an ipv6 to a routine that is not expecting it.

@Phrozyn
Copy link
Contributor Author

Phrozyn commented Aug 14, 2018

I do understand the concern, and do plan on ensuring everything is fully tested and scoped before any of this is implemented. I was using vscode to search for occurrances of "ipv" in our repo.
Some items that use functions for ipv4 use the ipaddress field to validate what type of IP it is, the code that did transform and populate the ipv4/ipv6 address fields was changed in my PR and tested. However, I have much more testing to do to ensure there aren't weird anomalies presented as a result of this. For example I have not looked through the cron scripts I mentioned in my OP as things to review.

@pwnbus pwnbus modified the milestones: Release v1.32, Release v1.33 Sep 26, 2018
@pwnbus pwnbus modified the milestones: Release v1.33, Release v1.34 Oct 31, 2018
@Phrozyn
Copy link
Contributor Author

Phrozyn commented Nov 20, 2018

Going to close this PR for now, until Kibana adds a method for Aggregating on ipv6 IPs (they only aggregate on IPv4) we can't remove the text fields as it will cause some problems.

@Phrozyn Phrozyn closed this Nov 20, 2018
@pwnbus pwnbus deleted the remove_string_ip_fields branch December 14, 2018 21:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants