Skip to content
This repository has been archived by the owner on Mar 22, 2024. It is now read-only.

[Query] Performance impact of geo lookup #5

Closed
bluefangs opened this issue Sep 5, 2017 · 3 comments
Closed

[Query] Performance impact of geo lookup #5

bluefangs opened this issue Sep 5, 2017 · 3 comments

Comments

@bluefangs
Copy link
Contributor

Hi,
I'm fairly new to ELK and netflow, I was taking a look at your implementation. Going by the official documentation about geo ip lookup, it's been stated that the cost is quite expensive.
I would like to understand if it has any noticeable impact when implemented - like in this project. Are there any metrics that you have benchmarked, with and without the geo ip lookup?

@robcowart
Copy link
Owner

robcowart commented Sep 5, 2017

The developer of the Netflow codec for Logstash provides the following guidance:

For high-performance production environments the configuration below will decode up to 15000 flows/sec on a dedicated 16 CPU instance. If your total flowrate exceeds 15000 flows/sec, you should use multiple Logstash instances.

This used to be:

For high-performance production environments the configuration below will decode up to 6000 flows/sec on an 8 CPU instance. If your total flowrate exceeds 6000 flows/sec, you should use multiple Logstash instances.

On my quad-core laptop using the ElastiFlow pipeline I can process 4000-5000 flows/sec, which is inline with, maybe even a bit better than, the guidance above.

Ultimately you will have to test in your environment. It is however worth mentioning that the ElastiFlow pipeline is designed to lookup ONLY those IP addresses which are part of the public address space (as determined by a CIDR filter). There is no GeoIP lookup for flows that are from private IP to private IP. If your traffic patterns involve a lot of external traffic, then the penalty of the GeoIP lookups may be more noticeable.

@bluefangs
Copy link
Contributor Author

Thanks for the quick reply!

@geertn444
Copy link

FYI: I am running Logstash (Elastiflow) on a VM, 4 vCPU of 2.3Ghz, using htop i am seeing 100% cpu on all 4 CPU (mainly from logstash), but strangely my "thoughput" goes from 1250 - 2000 flows/second depending on time of day. I have relatively many public ips, so i disabled geoip lookup (commenting out postprocessing)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants