Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added private_domains functionality #6

Closed
wants to merge 16 commits into from
Closed

added private_domains functionality #6

wants to merge 16 commits into from

Conversation

jaredmcqueen
Copy link

I changed the default TLD parsing behavior to ignore private domains. If users want private domains, they can set a new config parameter called private_domains to true.

Tests were added, as was documentation. I haven't tested how the inline documentation will render.

I also got rid of the default example filter text found in the inline documentation.

@@ -2,10 +2,32 @@
require "logstash/filters/base"
require "logstash/namespace"

# This example filter will replace the contents of the default
# message field with whatever you specify in the configuration.
# This filter is a domain name parser based on the https://publicsuffix.org/[Public Suffix List]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, not familiar with modifying asciidocs directly - i thought they were auto-generated based on inline comments. I can change these as well.

require 'public_suffix'
PublicSuffix::List.private_domains = @private_domains
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting setting. It appears to be a global setting which means the last tld plugin to initialize will win. This impacts users who may have multiple pipelines (coming in Logstash 6.0, iirc) where two pipelines may use a tld filter.

Thoughts?

Maybe we have to modify the upstream library to not use globals.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll modify according to how the author does it here: https://github.com/weppos/publicsuffix-ruby#private-domains

config :target, :validate => :string, :default => "tld"

# Allows private (non-ICANN) domain parsing
config :private_domains, :validate => :boolean, :default => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess to not understanding what this setting enables. From reading the tests, it considers "s3.amazonaws.com" to be a tld? I'm confused.

To ask a more pointed question, how will a user know what this setting does?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vast majority of customers / users just want public TLD parsing - to see the .com, .net, etc. They are always surprised (and confused) when they see a TLD like s3.amazonaws.com because they expect .com, not the whole thing.

Search "===begin private domains===" in the list at https://publicsuffix.org/list/public_suffix_list.dat and take a look for yourself.

for all Machine Learning, cyber security, anomaly detection use cases, customers just want public -- NOT private

@jordansissel
Copy link
Contributor

Thanks for helping improve this plugin :)

I've left some comments on the PR.

@jaredmcqueen
Copy link
Author

@jordansissel I updated the filter to disable private domains by default. Users can enable it using the config ignore_private. I updated the asciidocs and tests.

@jordansissel
Copy link
Contributor

I have this PR on my todo list for this week.

Gemfile.lock Outdated
@@ -0,0 +1,118 @@
PATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add Gemfile.lock to the git repo. Can you remove this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is still here, according to github?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops.. deleted it. sorry about that.

@jordansissel
Copy link
Contributor

overall LGTM.

I left one comment asking that Gemfile.lock file be removed (or tell me a story why you added it). Once this is resolved, I will merge.

Thank you for working on this :)

@jordansissel
Copy link
Contributor

jordansissel commented Jun 28, 2017

This PR is failing on Logstash 5.5 branch:

public_suffix-2.0.5 requires ruby version >= 2.0, which is incompatible with the current version, ruby 1.9.3p551 (jruby 1.7.25)

Logstash master (will become 6.0.0) branch is on JRuby 9k (which provides Ruby v2.3 compatibility) is has tests that are passing.

@jaredmcqueen
Copy link
Author

jaredmcqueen commented Jul 6, 2017

ok I'm back. It looks like since the earlier versions of logstash use 1.9.3, we are limited to using a really old version of public_suffix (1.4.6). In 1.4.6, the private / public parsing is all or nothing. You cannot choose to parse private/public on a per-use filter basis.

Still though, the default really should be to ignore the private domains. Submitting my changes, and will wait for Logstash 6.0 branch to use the new and improved public_suffix version that has more control over domain parsing.

@jaredmcqueen
Copy link
Author

@jordansissel standing by for your input on this request, let me know if there's anything else you need

@jordansissel
Copy link
Contributor

LGTM

@jaredmcqueen
Copy link
Author

any word if this will get pulled? Asking on behalf of a few customers.

@jaredmcqueen
Copy link
Author

jaredmcqueen commented Sep 12, 2017

any update on this? @jordansissel

@jeromekleinen
Copy link

jeromekleinen commented Nov 20, 2017

I am also interested in using the updated version of this plugin. Any ETA on when this gets merged?

Also, the public_suffix gem has been updated several times since this PR. Any chance this could be included as well?

@jsvd jsvd deleted the branch logstash-plugins:master November 8, 2021 10:07
@jsvd jsvd closed this Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants