-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #91 - Allow using the ISP database #94
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,12 +6,14 @@ | |
|
||
java_import "java.net.InetAddress" | ||
java_import "com.maxmind.geoip2.DatabaseReader" | ||
java_import "com.maxmind.geoip2.model.AnonymousIpResponse" | ||
java_import "com.maxmind.geoip2.model.CityResponse" | ||
java_import "com.maxmind.geoip2.record.Country" | ||
java_import "com.maxmind.geoip2.record.Subdivision" | ||
java_import "com.maxmind.geoip2.record.City" | ||
java_import "com.maxmind.geoip2.record.Postal" | ||
java_import "com.maxmind.geoip2.record.Location" | ||
java_import "com.maxmind.geoip2.model.ConnectionTypeResponse" | ||
java_import "com.maxmind.geoip2.model.CountryResponse" | ||
java_import "com.maxmind.geoip2.model.DomainResponse" | ||
#java_import "com.maxmind.geoip2.model.EnterpriseResponse" | ||
java_import "com.maxmind.geoip2.model.InsightsResponse" | ||
java_import "com.maxmind.geoip2.model.IspResponse" | ||
java_import "com.maxmind.db.CHMCache" | ||
|
||
def suppress_all_warnings | ||
|
@@ -69,11 +71,7 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base | |
# For the built-in GeoLiteCity database, the following are available: | ||
# `city_name`, `continent_code`, `country_code2`, `country_code3`, `country_name`, | ||
# `dma_code`, `ip`, `latitude`, `longitude`, `postal_code`, `region_name` and `timezone`. | ||
config :fields, :validate => :array, :default => ['city_name', 'continent_code', | ||
'country_code2', 'country_code3', 'country_name', | ||
'dma_code', 'ip', 'latitude', | ||
'longitude', 'postal_code', 'region_name', | ||
'region_code', 'timezone', 'location'] | ||
config :fields, :validate => :array, :default => [] | ||
|
||
# Specify the field into which Logstash should store the geoip data. | ||
# This can be useful, for example, if you have `src\_ip` and `dst\_ip` fields and | ||
|
@@ -88,6 +86,12 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base | |
# is still valid GeoJSON. | ||
config :target, :validate => :string, :default => 'geoip' | ||
|
||
# Specify the database type. By default "city" is selected (This is the type of the built-in GeoLite2 City database) | ||
config :dbtype, :validate => ['country', 'city', 'anonymousIp', 'connectionType', 'domain', 'enterprise', 'isp' ], :default => 'city' | ||
|
||
# Should we merge the results into target. If set to false, we will override the value of the target field | ||
config :merge, :validate => :boolean, :default => false | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The behavior prior to geoip2 was merging by default with no option to turn it off (reference :https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L147-L161) Should this default to true? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will change |
||
|
||
# GeoIP lookup is surprisingly expensive. This filter uses an cache to take advantage of the fact that | ||
# IPs agents are often found adjacent to one another in log files and rarely have a random distribution. | ||
# The higher you set this the more likely an item is to be in the cache and the faster this filter will run. | ||
|
@@ -155,7 +159,7 @@ def filter(event) | |
ip = ip.first if ip.is_a? Array | ||
geo_data_hash = Hash.new | ||
ip_address = InetAddress.getByName(ip) | ||
response = @parser.city(ip_address) | ||
response = @parser.send(@dbtype,ip_address) | ||
populate_geo_data(response, ip_address, geo_data_hash) | ||
rescue com.maxmind.geoip2.exception.AddressNotFoundException => e | ||
@logger.debug("IP not found!", :exception => e, :field => @source, :event => event) | ||
|
@@ -166,8 +170,12 @@ def filter(event) | |
# Dont' swallow this, bubble up for unknown issue | ||
raise e | ||
end | ||
|
||
event[@target] = geo_data_hash | ||
if @merge | ||
event[@target] ||= {} | ||
event[@target].merge!(geo_data_hash) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Out of curiosity - why isn't it safe? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because the api for accessing an event is only available as 'event.get |
||
else | ||
event[@target] = geo_data_hash | ||
end | ||
|
||
if geo_data_hash.empty? | ||
tag_unsuccessful_lookup(event) | ||
|
@@ -176,56 +184,40 @@ def filter(event) | |
|
||
filter_matched(event) | ||
end # def filter | ||
|
||
def populate_geo_data(response, ip_address, geo_data_hash) | ||
country = response.getCountry() | ||
subdivision = response.getMostSpecificSubdivision() | ||
city = response.getCity() | ||
postal = response.getPostal() | ||
location = response.getLocation() | ||
|
||
# if location is empty, there is no point populating geo data | ||
# and most likely all other fields are empty as well | ||
if location.getLatitude().nil? && location.getLongitude().nil? | ||
return | ||
end | ||
|
||
@fields.each do |field| | ||
case field | ||
when "city_name" | ||
def populate_geo_data(response, ip_address, geo_data_hash) | ||
case @dbtype | ||
when "city" | ||
country = response.getCountry() | ||
subdivision = response.getMostSpecificSubdivision() | ||
city = response.getCity() | ||
postal = response.getPostal() | ||
location = response.getLocation() | ||
|
||
# if location is empty, there is no point populating geo data | ||
# and most likely all other fields are empty as well | ||
if location.getLatitude().nil? && location.getLongitude().nil? | ||
return | ||
end | ||
geo_data_hash["city_name"] = city.getName() | ||
when "country_name" | ||
geo_data_hash["country_name"] = country.getName() | ||
when "continent_code" | ||
geo_data_hash["continent_code"] = response.getContinent().getCode() | ||
when "continent_name" | ||
geo_data_hash["continent_name"] = response.getContinent().getName() | ||
when "country_code2" | ||
geo_data_hash["country_code2"] = country.getIsoCode() | ||
when "country_code3" | ||
geo_data_hash["country_code3"] = country.getIsoCode() | ||
when "ip" | ||
geo_data_hash["ip"] = ip_address.getHostAddress() | ||
when "postal_code" | ||
geo_data_hash["postal_code"] = postal.getCode() | ||
when "dma_code" | ||
geo_data_hash["dma_code"] = location.getMetroCode() | ||
when "region_name" | ||
geo_data_hash["region_name"] = subdivision.getName() | ||
when "region_code" | ||
geo_data_hash["region_code"] = subdivision.getIsoCode() | ||
when "timezone" | ||
geo_data_hash["timezone"] = location.getTimeZone() | ||
when "location" | ||
geo_data_hash["location"] = [ location.getLongitude(), location.getLatitude() ] | ||
when "latitude" | ||
geo_data_hash["latitude"] = location.getLatitude() | ||
when "longitude" | ||
geo_data_hash["longitude"] = location.getLongitude() | ||
else | ||
raise Exception.new("[#{field}] is not a supported field option.") | ||
end | ||
geo_data_hash.merge!(JSON.parse(response.toJson())) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It actually works without it. I guess it is loaded implicitly somehow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I cherry-picked your code into the master branch, so my conditions are different. I get There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately, I don't know when I will be able to do that. If someone wants to continue from here, fine, if not, I will try to find a time for it in my next sprint. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a better way to get the things out of the response than having it parse some immediately-generated JSON. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jordansissel I am not aware of a better way. The alternative would be to either hard code each field method (e.g. getAutonomousSystemOrganization()) for every possible DB, or use some reflection hacks. toJson() seems like MaxMind's opinionated way of getting all the fields. This resembles what you will get from their API. |
||
end | ||
geo_data_hash.reject! {|key, value| !@fields.include? key} unless @fields.empty? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This mechanism of building a hash and then removing fields that are not selected is doing extra work that could be avoided. The current version of this file selectively adds fields based on @fields -- is there a reason you rewrote it in this way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. This is a more generic way. The docs state that if you leave fields I haven't benchmarked it, but I think this might even be faster for use cases where the user wants all the fields. |
||
end | ||
|
||
def tag_unsuccessful_lookup(event) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[ | ||
{ | ||
"url": "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz", | ||
"sha1": "76b9e7152e3765298ab89474eaeacb675f4fea03" | ||
"sha1": "19190da72e067b158079ca03964f05781ce475b7" | ||
} | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the database file seems to infer the database type, can we avoid this setting and use the database file to determine what the database type is?
This inference is what the geoip plugin did before we upgraded to geoip2:
https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L104-L115
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into it. The geoip gem is doing some low level logic to understand the type (https://github.com/cjheath/geoip/blob/master/lib/geoip.rb#L546) and as far as I can tell, this is not possible with the Java SDK. In the Java SDK, you can get the database type via the metadata, but I couldn't find a full list of possible values for the databases_type field (https://maxmind.github.io/MaxMind-DB/). Therefore, I am leaving it as is for now, unless someone can provide me with a full list of values, or a better way to detect it.