-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #91 - Allow using the ISP database #94
Conversation
ff32cea
to
2f1b28e
Compare
Is there any idea on when this will be merged in? No support for ISP lookup is a pretty major issue for us upgrading to 5. |
@tegud well, it needs to be reviewed before it will be merged. :) |
I'm happy to review if that's acceptable? :) |
end | ||
geo_data_hash.reject! {|key, value| !@fields.include? key} unless @fields.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mechanism of building a hash and then removing fields that are not selected is doing extra work that could be avoided.
The current version of this file selectively adds fields based on @fields -- is there a reason you rewrote it in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is a more generic way. The docs state that if you leave fields
empty, you will get all fields - that was not the case. Previously, if you
left it empty, you would have gotten what the plugin has in the default
list (nit picking - the doc states there is no default for the fields parameter, and there is a default value).
Putting aside the fact it is not semantically the same, it would mean
we need to maintain a list for every type of DB, plus running multiple
calls to the DB wrapper to get specific fields. This way, you make one
call, don't need to maintain a list of default fields, and the semantics
match the docs better.
I haven't benchmarked it, but I think this might even be faster for use cases where the user wants all the fields.
@tegud All are welcome to participate in whatever means they feel comfortable. So, yes, you are welcome to review. :) |
geo_data_hash["longitude"] = location.getLongitude() | ||
else | ||
raise Exception.new("[#{field}] is not a supported field option.") | ||
end | ||
geo_data_hash.merge!(JSON.parse(response.toJson())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require "json"
is needed for this call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually works without it. I guess it is loaded implicitly somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cherry-picked your code into the master branch, so my conditions are different. I get :exception=>#<NameError: uninitialized constant LogStash::Filters::GeoIP::JSON>
at this line without including the json
module.
Anyway, I think we should work on getting the PR to work against the mater branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't know when I will be able to do that. If someone wants to continue from here, fine, if not, I will try to find a time for it in my next sprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a better way to get the things out of the response than having it parse some immediately-generated JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordansissel I am not aware of a better way. The alternative would be to either hard code each field method (e.g. getAutonomousSystemOrganization()) for every possible DB, or use some reflection hacks. toJson() seems like MaxMind's opinionated way of getting all the fields. This resembles what you will get from their API.
The obvious benefit is that we don't need to be aware of the possible fields in advance, and if they add more fields, we will seamlessly support it.
Is this going to allow us to use the ISP databse (paid sub) from mm? |
I'm happy to test this in my environment as well. |
Any idea when this will this will be supported? This is the number one reason some of my clients cannot upgrade to version 5. The ability to use the ISP database is highly important for security logs as well as for filtering. I have to keep telling people to hold off on going to 5 as many of their visualizations and dashboards are dependent on the ISP database support. I appreciate your help on this and would also be willing to test. |
I would also really like to see this support logstash plugin, in fact just raised a Elastic support request about how is it possible to use this database. |
@@ -88,6 +86,12 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base | |||
# is still valid GeoJSON. | |||
config :target, :validate => :string, :default => 'geoip' | |||
|
|||
# Specify the database type. By default "city" is selected (This is the type of the built-in GeoLite2 City database) | |||
config :dbtype, :validate => ['country', 'city', 'anonymousIp', 'connectionType', 'domain', 'enterprise', 'isp' ], :default => 'city' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the database file seems to infer the database type, can we avoid this setting and use the database file to determine what the database type is?
This inference is what the geoip plugin did before we upgraded to geoip2:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into it. The geoip gem is doing some low level logic to understand the type (https://github.com/cjheath/geoip/blob/master/lib/geoip.rb#L546) and as far as I can tell, this is not possible with the Java SDK. In the Java SDK, you can get the database type via the metadata, but I couldn't find a full list of possible values for the databases_type field (https://maxmind.github.io/MaxMind-DB/). Therefore, I am leaving it as is for now, unless someone can provide me with a full list of values, or a better way to detect it.
config :dbtype, :validate => ['country', 'city', 'anonymousIp', 'connectionType', 'domain', 'enterprise', 'isp' ], :default => 'city' | ||
|
||
# Should we merge the results into target. If set to false, we will override the value of the target field | ||
config :merge, :validate => :boolean, :default => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior prior to geoip2 was merging by default with no option to turn it off (reference :https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L147-L161)
Should this default to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change
event[@target] = geo_data_hash | ||
if @merge | ||
event[@target] ||= {} | ||
event[@target].merge!(geo_data_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.merge!
is not a safe thing to do on a Logstash Event. You should insert each key separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Out of curiosity - why isn't it safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the api for accessing an event is only available as 'event.getand
event.set`. Getting a value, then modifying that value, instead of using just 'get' and 'set' APIs is not supported and not guaranteed to work correctly.
geo_data_hash["longitude"] = location.getLongitude() | ||
else | ||
raise Exception.new("[#{field}] is not a supported field option.") | ||
end | ||
geo_data_hash.merge!(JSON.parse(response.toJson())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a better way to get the things out of the response than having it parse some immediately-generated JSON.
I received error while trying to use it in 5: "Direct event field references (i.e. event['field']) have been disabled in favor of using event get and set methods (e.g. event.get('field')). Please consult the Logstash 5.0 breaking changes documentation for more details" After this change is approved/merged the same change can be applied to plugin-api-v2 just updating the event references? Or maybe different ways that other things should be done too?
Seems to work great in 5.2.2 after only this change and updating the various other event references to .set or .get. Thanks for this so far @imriz ! |
@suyograo I have no objections. My 2 cents are that the new commit again follows the path of using explicit methods to fetch explicit fields. This will require changing code if MaxMind add fields. I find this rather limiting and not really required. |
No description provided.