New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added private_domains functionality #6
Conversation
lib/logstash/filters/tld.rb
Outdated
@@ -2,10 +2,32 @@ | |||
require "logstash/filters/base" | |||
require "logstash/namespace" | |||
|
|||
# This example filter will replace the contents of the default | |||
# message field with whatever you specify in the configuration. | |||
# This filter is a domain name parser based on the https://publicsuffix.org/[Public Suffix List] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs live here: https://github.com/logstash-plugins/logstash-filter-tld/blob/master/docs/index.asciidoc
(This is a recent change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, not familiar with modifying asciidocs directly - i thought they were auto-generated based on inline comments. I can change these as well.
lib/logstash/filters/tld.rb
Outdated
require 'public_suffix' | ||
PublicSuffix::List.private_domains = @private_domains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting setting. It appears to be a global setting which means the last tld plugin to initialize will win. This impacts users who may have multiple pipelines (coming in Logstash 6.0, iirc) where two pipelines may use a tld filter.
Thoughts?
Maybe we have to modify the upstream library to not use globals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll modify according to how the author does it here: https://github.com/weppos/publicsuffix-ruby#private-domains
lib/logstash/filters/tld.rb
Outdated
config :target, :validate => :string, :default => "tld" | ||
|
||
# Allows private (non-ICANN) domain parsing | ||
config :private_domains, :validate => :boolean, :default => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confess to not understanding what this setting enables. From reading the tests, it considers "s3.amazonaws.com" to be a tld? I'm confused.
To ask a more pointed question, how will a user know what this setting does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vast majority of customers / users just want public TLD parsing - to see the .com, .net, etc. They are always surprised (and confused) when they see a TLD like s3.amazonaws.com because they expect .com, not the whole thing.
Search "===begin private domains===" in the list at https://publicsuffix.org/list/public_suffix_list.dat and take a look for yourself.
for all Machine Learning, cyber security, anomaly detection use cases, customers just want public -- NOT private
Thanks for helping improve this plugin :) I've left some comments on the PR. |
@jordansissel I updated the filter to disable private domains by default. Users can enable it using the config |
I have this PR on my todo list for this week. |
Gemfile.lock
Outdated
@@ -0,0 +1,118 @@ | |||
PATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add Gemfile.lock to the git repo. Can you remove this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is still here, according to github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops.. deleted it. sorry about that.
overall LGTM. I left one comment asking that Gemfile.lock file be removed (or tell me a story why you added it). Once this is resolved, I will merge. Thank you for working on this :) |
This PR is failing on Logstash 5.5 branch:
Logstash master (will become 6.0.0) branch is on JRuby 9k (which provides Ruby v2.3 compatibility) is has tests that are passing. |
ok I'm back. It looks like since the earlier versions of logstash use 1.9.3, we are limited to using a really old version of public_suffix (1.4.6). In 1.4.6, the private / public parsing is all or nothing. You cannot choose to parse private/public on a per-use filter basis. Still though, the default really should be to ignore the private domains. Submitting my changes, and will wait for Logstash 6.0 branch to use the new and improved public_suffix version that has more control over domain parsing. |
@jordansissel standing by for your input on this request, let me know if there's anything else you need |
LGTM |
any word if this will get pulled? Asking on behalf of a few customers. |
any update on this? @jordansissel |
I am also interested in using the updated version of this plugin. Any ETA on when this gets merged? Also, the public_suffix gem has been updated several times since this PR. Any chance this could be included as well? |
I changed the default TLD parsing behavior to ignore private domains. If users want private domains, they can set a new config parameter called private_domains to true.
Tests were added, as was documentation. I haven't tested how the inline documentation will render.
I also got rid of the default example filter text found in the inline documentation.