Parse URL and match domain name in filtering to avoid false positives by ks129 · Pull Request #1275 · python-discord/bot

ks129 · 2020-11-07T18:10:38Z

Before bot just used in check to find blacklisted domains, but this resulted false positives like is showed in linked issue. Now this find all URLs and parse them with urllib parser, then get domain name and check it against blacklist.

MarkKoz

This isn't too accurate because not all URLs posted by users will include the HTTP scheme. Unfortunately, this will cause problems with urllib even if the URL regex is adjusted:

Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

Furthermore, the netloc may include a port and/or a subdomain e.g. www.cwi.nl:80. Some filters may need to match only on specific subdomains while others will need to match all subdomains. Adding a separate filter for each possible subdomain is infeasible, so the code needs to somehow account for this.

ks129 · 2020-11-21T07:41:05Z

@MarkKoz #1276 this issue is for new filter that handles subdomains too, not only exact matches.

MarkKoz · 2020-11-24T08:27:35Z

Though I think that issue could be addressed together with this one, even if set aside, my other points still need to be addressed. Someone could easily spoof the filter by omitting the scheme or specifying a port.

ks129 · 2020-12-04T07:34:45Z

Leaving this to somebody smarter 😅

Parse URL and match domain name in filtering to avoid false positives

2e1adaa

ks129 requested a review from a team as a code owner November 7, 2020 18:10

ks129 requested review from MarkKoz and tagptroll1 and removed request for a team November 7, 2020 18:10

ghost added the needs 2 approvals label Nov 7, 2020

MarkKoz requested changes Nov 16, 2020

View reviewed changes

ghost added s: waiting for author Waiting for author to address a review or respond to a comment and removed needs 2 approvals labels Nov 16, 2020

ks129 closed this Dec 4, 2020

Akarys42 reopened this Dec 4, 2020

Akarys42 closed this Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse URL and match domain name in filtering to avoid false positives#1275

Parse URL and match domain name in filtering to avoid false positives#1275
ks129 wants to merge 1 commit into
python-discord:masterfrom
ks129:domain-match-fix

ks129 commented Nov 7, 2020

Uh oh!

MarkKoz left a comment

Uh oh!

ks129 commented Nov 21, 2020

Uh oh!

MarkKoz commented Nov 24, 2020

Uh oh!

ks129 commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ks129 commented Nov 7, 2020

Uh oh!

MarkKoz left a comment

Choose a reason for hiding this comment

Uh oh!

ks129 commented Nov 21, 2020

Uh oh!

MarkKoz commented Nov 24, 2020

Uh oh!

ks129 commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants