Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #91 - Allow using the ISP database #94

Closed

Conversation

imriz
Copy link

@imriz imriz commented Aug 7, 2016

No description provided.

@imriz imriz force-pushed the plugin-api-v1 branch 2 times, most recently from ff32cea to 2f1b28e Compare August 8, 2016 06:42
@tegud
Copy link

tegud commented Nov 17, 2016

Is there any idea on when this will be merged in? No support for ISP lookup is a pretty major issue for us upgrading to 5.

@jordansissel
Copy link
Contributor

@tegud well, it needs to be reviewed before it will be merged. :)

@tegud
Copy link

tegud commented Nov 18, 2016

I'm happy to review if that's acceptable? :)

end
geo_data_hash.reject! {|key, value| !@fields.include? key} unless @fields.empty?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism of building a hash and then removing fields that are not selected is doing extra work that could be avoided.

The current version of this file selectively adds fields based on @fields -- is there a reason you rewrote it in this way?

Copy link
Author

@imriz imriz Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is a more generic way. The docs state that if you leave fields
empty, you will get all fields - that was not the case. Previously, if you
left it empty, you would have gotten what the plugin has in the default
list (nit picking - the doc states there is no default for the fields parameter, and there is a default value).
Putting aside the fact it is not semantically the same, it would mean
we need to maintain a list for every type of DB, plus running multiple
calls to the DB wrapper to get specific fields. This way, you make one
call, don't need to maintain a list of default fields, and the semantics
match the docs better.

I haven't benchmarked it, but I think this might even be faster for use cases where the user wants all the fields.

@jordansissel
Copy link
Contributor

@tegud All are welcome to participate in whatever means they feel comfortable. So, yes, you are welcome to review. :)

geo_data_hash["longitude"] = location.getLongitude()
else
raise Exception.new("[#{field}] is not a supported field option.")
end
geo_data_hash.merge!(JSON.parse(response.toJson()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require "json" is needed for this call

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually works without it. I guess it is loaded implicitly somehow?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cherry-picked your code into the master branch, so my conditions are different. I get :exception=>#<NameError: uninitialized constant LogStash::Filters::GeoIP::JSON> at this line without including the json module.
Anyway, I think we should work on getting the PR to work against the mater branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't know when I will be able to do that. If someone wants to continue from here, fine, if not, I will try to find a time for it in my next sprint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a better way to get the things out of the response than having it parse some immediately-generated JSON.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordansissel I am not aware of a better way. The alternative would be to either hard code each field method (e.g. getAutonomousSystemOrganization()) for every possible DB, or use some reflection hacks. toJson() seems like MaxMind's opinionated way of getting all the fields. This resembles what you will get from their API.
The obvious benefit is that we don't need to be aware of the possible fields in advance, and if they add more fields, we will seamlessly support it.

@sh4t
Copy link

sh4t commented Jan 14, 2017

Is this going to allow us to use the ISP databse (paid sub) from mm?

@sh4t
Copy link

sh4t commented Jan 18, 2017

I'm happy to test this in my environment as well.

@SMAPPER
Copy link

SMAPPER commented Jan 29, 2017

Any idea when this will this will be supported? This is the number one reason some of my clients cannot upgrade to version 5. The ability to use the ISP database is highly important for security logs as well as for filtering. I have to keep telling people to hold off on going to 5 as many of their visualizations and dashboards are dependent on the ISP database support. I appreciate your help on this and would also be willing to test.

@jjm
Copy link

jjm commented Feb 23, 2017

I would also really like to see this support logstash plugin, in fact just raised a Elastic support request about how is it possible to use this database.

@@ -88,6 +86,12 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base
# is still valid GeoJSON.
config :target, :validate => :string, :default => 'geoip'

# Specify the database type. By default "city" is selected (This is the type of the built-in GeoLite2 City database)
config :dbtype, :validate => ['country', 'city', 'anonymousIp', 'connectionType', 'domain', 'enterprise', 'isp' ], :default => 'city'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the database file seems to infer the database type, can we avoid this setting and use the database file to determine what the database type is?

This inference is what the geoip plugin did before we upgraded to geoip2:

https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L104-L115

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it. The geoip gem is doing some low level logic to understand the type (https://github.com/cjheath/geoip/blob/master/lib/geoip.rb#L546) and as far as I can tell, this is not possible with the Java SDK. In the Java SDK, you can get the database type via the metadata, but I couldn't find a full list of possible values for the databases_type field (https://maxmind.github.io/MaxMind-DB/). Therefore, I am leaving it as is for now, unless someone can provide me with a full list of values, or a better way to detect it.

config :dbtype, :validate => ['country', 'city', 'anonymousIp', 'connectionType', 'domain', 'enterprise', 'isp' ], :default => 'city'

# Should we merge the results into target. If set to false, we will override the value of the target field
config :merge, :validate => :boolean, :default => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior prior to geoip2 was merging by default with no option to turn it off (reference :https://github.com/logstash-plugins/logstash-filter-geoip/blob/v2.0.0/lib/logstash/filters/geoip.rb#L147-L161)

Should this default to true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change

event[@target] = geo_data_hash
if @merge
event[@target] ||= {}
event[@target].merge!(geo_data_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.merge! is not a safe thing to do on a Logstash Event. You should insert each key separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Out of curiosity - why isn't it safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the api for accessing an event is only available as 'event.getandevent.set`. Getting a value, then modifying that value, instead of using just 'get' and 'set' APIs is not supported and not guaranteed to work correctly.

geo_data_hash["longitude"] = location.getLongitude()
else
raise Exception.new("[#{field}] is not a supported field option.")
end
geo_data_hash.merge!(JSON.parse(response.toJson()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a better way to get the things out of the response than having it parse some immediately-generated JSON.

@s1sfa
Copy link

s1sfa commented Mar 14, 2017

I received error while trying to use it in 5: "Direct event field references (i.e. event['field']) have been disabled in favor of using event get and set methods (e.g. event.get('field')). Please consult the Logstash 5.0 breaking changes documentation for more details"

After this change is approved/merged the same change can be applied to plugin-api-v2 just updating the event references? Or maybe different ways that other things should be done too?
It's pretty easy to change the event references to .set or .get.
I changed the "if @merge" section a bit to make it work there is no event.merge().

    if @merge
      merge_event = event.get([@target])
      merge_event ||= {}
      merge_event.merge!(geo_data_hash)
      event.set(@target,merge_event)
    else
      event.set(@target,geo_data_hash)
    end

Seems to work great in 5.2.2 after only this change and updating the various other event references to .set or .get.

Thanks for this so far @imriz !

@suyograo
Copy link
Contributor

@imriz thanks for this contribution! For a long time, I wanted to move GeoIP to pure Java since we already use the Java APIs (and not the ruby code). I decided to do this and took a stab at supporting ISP database in the same PR -- #111.

#111 could supersede this PR, if you are ok with it.

@imriz
Copy link
Author

imriz commented Apr 20, 2017

@suyograo I have no objections. My 2 cents are that the new commit again follows the path of using explicit methods to fetch explicit fields. This will require changing code if MaxMind add fields. I find this rather limiting and not really required.
Also, until the documentation is amended, it is also inconsistent with how the default behaviour is described - it says that if you leave it empty, you get all the fields, and that is not the case.

@suyograo suyograo closed this May 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants