Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #91 - Allow using the ISP database #94

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 37 additions & 45 deletions lib/logstash/filters/geoip.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@

java_import "java.net.InetAddress"
java_import "com.maxmind.geoip2.DatabaseReader"
java_import "com.maxmind.geoip2.model.AnonymousIpResponse"
java_import "com.maxmind.geoip2.model.CityResponse"
java_import "com.maxmind.geoip2.record.Country"
java_import "com.maxmind.geoip2.record.Subdivision"
java_import "com.maxmind.geoip2.record.City"
java_import "com.maxmind.geoip2.record.Postal"
java_import "com.maxmind.geoip2.record.Location"
java_import "com.maxmind.geoip2.model.ConnectionTypeResponse"
java_import "com.maxmind.geoip2.model.CountryResponse"
java_import "com.maxmind.geoip2.model.DomainResponse"
#java_import "com.maxmind.geoip2.model.EnterpriseResponse"
java_import "com.maxmind.geoip2.model.InsightsResponse"
java_import "com.maxmind.geoip2.model.IspResponse"
java_import "com.maxmind.db.CHMCache"

def suppress_all_warnings
Expand Down Expand Up @@ -69,11 +71,7 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base
# For the built-in GeoLiteCity database, the following are available:
# `city_name`, `continent_code`, `country_code2`, `country_code3`, `country_name`,
# `dma_code`, `ip`, `latitude`, `longitude`, `postal_code`, `region_name` and `timezone`.
config :fields, :validate => :array, :default => ['city_name', 'continent_code',
'country_code2', 'country_code3', 'country_name',
'dma_code', 'ip', 'latitude',
'longitude', 'postal_code', 'region_name',
'region_code', 'timezone', 'location']
config :fields, :validate => :array, :default => []

# Specify the field into which Logstash should store the geoip data.
# This can be useful, for example, if you have `src\_ip` and `dst\_ip` fields and
Expand All @@ -88,6 +86,12 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base
# is still valid GeoJSON.
config :target, :validate => :string, :default => 'geoip'

# Specify the database type. By default "city" is selected (This is the type of the built-in GeoLite2 City database)
config :dbtype, :validate => ['country', 'city', 'anonymousIp', 'connectionType', 'domain', 'enterprise', 'isp' ], :default => 'city'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the database file seems to infer the database type, can we avoid this setting and use the database file to determine what the database type is?

This inference is what the geoip plugin did before we upgraded to geoip2:

https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L104-L115

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it. The geoip gem is doing some low level logic to understand the type (https://github.com/cjheath/geoip/blob/master/lib/geoip.rb#L546) and as far as I can tell, this is not possible with the Java SDK. In the Java SDK, you can get the database type via the metadata, but I couldn't find a full list of possible values for the databases_type field (https://maxmind.github.io/MaxMind-DB/). Therefore, I am leaving it as is for now, unless someone can provide me with a full list of values, or a better way to detect it.


# Should we merge the results into target. If set to false, we will override the value of the target field
config :merge, :validate => :boolean, :default => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior prior to geoip2 was merging by default with no option to turn it off (reference :https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L147-L161)

Should this default to true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change


# GeoIP lookup is surprisingly expensive. This filter uses an cache to take advantage of the fact that
# IPs agents are often found adjacent to one another in log files and rarely have a random distribution.
# The higher you set this the more likely an item is to be in the cache and the faster this filter will run.
Expand Down Expand Up @@ -155,7 +159,7 @@ def filter(event)
ip = ip.first if ip.is_a? Array
geo_data_hash = Hash.new
ip_address = InetAddress.getByName(ip)
response = @parser.city(ip_address)
response = @parser.send(@dbtype,ip_address)
populate_geo_data(response, ip_address, geo_data_hash)
rescue com.maxmind.geoip2.exception.AddressNotFoundException => e
@logger.debug("IP not found!", :exception => e, :field => @source, :event => event)
Expand All @@ -166,8 +170,12 @@ def filter(event)
# Dont' swallow this, bubble up for unknown issue
raise e
end

event[@target] = geo_data_hash
if @merge
event[@target] ||= {}
event[@target].merge!(geo_data_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.merge! is not a safe thing to do on a Logstash Event. You should insert each key separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Out of curiosity - why isn't it safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the api for accessing an event is only available as 'event.getandevent.set`. Getting a value, then modifying that value, instead of using just 'get' and 'set' APIs is not supported and not guaranteed to work correctly.

else
event[@target] = geo_data_hash
end

if geo_data_hash.empty?
tag_unsuccessful_lookup(event)
Expand All @@ -176,56 +184,40 @@ def filter(event)

filter_matched(event)
end # def filter

def populate_geo_data(response, ip_address, geo_data_hash)
country = response.getCountry()
subdivision = response.getMostSpecificSubdivision()
city = response.getCity()
postal = response.getPostal()
location = response.getLocation()

# if location is empty, there is no point populating geo data
# and most likely all other fields are empty as well
if location.getLatitude().nil? && location.getLongitude().nil?
return
end

@fields.each do |field|
case field
when "city_name"
def populate_geo_data(response, ip_address, geo_data_hash)
case @dbtype
when "city"
country = response.getCountry()
subdivision = response.getMostSpecificSubdivision()
city = response.getCity()
postal = response.getPostal()
location = response.getLocation()

# if location is empty, there is no point populating geo data
# and most likely all other fields are empty as well
if location.getLatitude().nil? && location.getLongitude().nil?
return
end
geo_data_hash["city_name"] = city.getName()
when "country_name"
geo_data_hash["country_name"] = country.getName()
when "continent_code"
geo_data_hash["continent_code"] = response.getContinent().getCode()
when "continent_name"
geo_data_hash["continent_name"] = response.getContinent().getName()
when "country_code2"
geo_data_hash["country_code2"] = country.getIsoCode()
when "country_code3"
geo_data_hash["country_code3"] = country.getIsoCode()
when "ip"
geo_data_hash["ip"] = ip_address.getHostAddress()
when "postal_code"
geo_data_hash["postal_code"] = postal.getCode()
when "dma_code"
geo_data_hash["dma_code"] = location.getMetroCode()
when "region_name"
geo_data_hash["region_name"] = subdivision.getName()
when "region_code"
geo_data_hash["region_code"] = subdivision.getIsoCode()
when "timezone"
geo_data_hash["timezone"] = location.getTimeZone()
when "location"
geo_data_hash["location"] = [ location.getLongitude(), location.getLatitude() ]
when "latitude"
geo_data_hash["latitude"] = location.getLatitude()
when "longitude"
geo_data_hash["longitude"] = location.getLongitude()
else
raise Exception.new("[#{field}] is not a supported field option.")
end
geo_data_hash.merge!(JSON.parse(response.toJson()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require "json" is needed for this call

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually works without it. I guess it is loaded implicitly somehow?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cherry-picked your code into the master branch, so my conditions are different. I get :exception=>#<NameError: uninitialized constant LogStash::Filters::GeoIP::JSON> at this line without including the json module.
Anyway, I think we should work on getting the PR to work against the mater branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't know when I will be able to do that. If someone wants to continue from here, fine, if not, I will try to find a time for it in my next sprint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a better way to get the things out of the response than having it parse some immediately-generated JSON.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordansissel I am not aware of a better way. The alternative would be to either hard code each field method (e.g. getAutonomousSystemOrganization()) for every possible DB, or use some reflection hacks. toJson() seems like MaxMind's opinionated way of getting all the fields. This resembles what you will get from their API.
The obvious benefit is that we don't need to be aware of the possible fields in advance, and if they add more fields, we will seamlessly support it.

end
geo_data_hash.reject! {|key, value| !@fields.include? key} unless @fields.empty?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism of building a hash and then removing fields that are not selected is doing extra work that could be avoided.

The current version of this file selectively adds fields based on @fields -- is there a reason you rewrote it in this way?

Copy link
Author

@imriz imriz Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is a more generic way. The docs state that if you leave fields
empty, you will get all fields - that was not the case. Previously, if you
left it empty, you would have gotten what the plugin has in the default
list (nit picking - the doc states there is no default for the fields parameter, and there is a default value).
Putting aside the fact it is not semantically the same, it would mean
we need to maintain a list for every type of DB, plus running multiple
calls to the DB wrapper to get specific fields. This way, you make one
call, don't need to maintain a list of default fields, and the semantics
match the docs better.

I haven't benchmarked it, but I think this might even be faster for use cases where the user wants all the fields.

end

def tag_unsuccessful_lookup(event)
Expand Down
2 changes: 1 addition & 1 deletion spec/filters/geoip_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@
end

context "when a IP is not found in the DB" do
let(:ipstring) { "113.208.89.21" }
let(:ipstring) { "10.10.10.10" }

it "should set the target field to an empty hash" do
expect(event["geoip"]).to eq({})
Expand Down
2 changes: 1 addition & 1 deletion vendor.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[
{
"url": "http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz",
"sha1": "76b9e7152e3765298ab89474eaeacb675f4fea03"
"sha1": "19190da72e067b158079ca03964f05781ce475b7"
}
]