feat: added url parsing to the filter by Kronifer · Pull Request #1889 · python-discord/bot

Kronifer · 2021-10-18T14:13:08Z

Added some url parsing using urllib.parse to stop false positives. to remove false positives, I parsed the URLs and checked if the netloc was the same. If it wasn't, I passed. I also check if the netloc is prefixed with www. or any other subdomain should people try to circumvent the filter. An example false positive would be deliciouscookies.com triggering cookies.com, a blacklisted url (not linking actual example cause it's NSFW)

Kronifer · 2021-10-18T14:14:17Z

Realized I patched docker-compose.yml for my system, will remove that

mbaruh · 2021-10-18T14:34:31Z

Can you clarify what false positives you found and how your code deals with them?

Kronifer · 2021-10-18T15:23:01Z

@mbaruh to remove false positives, I parsed the URLs and checked if the netloc was the same. If it wasn't, I passed. I also check if the netloc is prefixed with www. should people try to circumvent the filter

mbaruh · 2021-10-18T15:25:31Z

Can you provide examples of false positives? Is there a related issue?

Kronifer · 2021-10-18T15:26:52Z

Can you provide examples of false positives? Is there a related issue?

@mbaruh yes, I'll link the issue and provide examples in the initial description

Kronifer · 2021-10-18T15:28:56Z

@mbaruh check the original description

ChrisLovering

Had the wrong option selected when requesting changes

ChrisLovering

Tested, looks good 👍

D0rs4n

It looks fine overall, there's only one single thing.

ChrisLovering

Revoking my approval due to subdomains not being filtered here.

See this message and the discussion preceding it for context
https://canary.discord.com/channels/267624335836053506/635950537262759947/902203112834629692

onerandomusername

As a heads up, poetry.lock got all of the dependencies updated.

To fix this, please be sure you are using poetry 1.1.x

First revert poetry.lock locally.

Ensure that pyproject.toml is the same, with tldextract in it.

Next, use poetry lock --no-update
This will relock poetry without updating all of the dependencies.

However, it may be worth an update seperately to the dependencies, as this updated redis, rapidfuzz, sentry, etc

Kronifer · 2021-12-03T17:57:31Z

As a heads up, poetry.lock got all of the dependencies updated.

To fix this, please be sure you are using poetry 1.1.x

First revert poetry.lock locally.

Ensure that pyproject.toml is the same, with tldextract in it.

Next, use poetry lock --no-update This will relock poetry without updating all of the dependencies.

However, it may be worth an update seperately to the dependencies, as this updated redis, rapidfuzz, sentry, etc

this has been deemed not to be a problem as of https://discord.com/channels/267624335836053506/635950537262759947/916168658651324487

Kronifer · 2021-12-09T19:09:57Z

Just a quick comment:

This PR was made to improve the URL filter by removing false positives, like delicious-cookies.com being deleted for triggering cookies.com in the blacklist. As this continued, we added support to remove subdomains from any sent URLs to prevent circumvention. For any wondering why this exists, here you go 😄

onerandomusername · 2021-12-15T06:44:12Z

this has been deemed not to be a problem as of https://discord.com/channels/267624335836053506/635950537262759947/916168658651324487

After looking into this further, markdownify cannot be updated, as it will nerf the results of the doc command.

onerandomusername · 2021-12-15T07:08:37Z

Fixed that change in GH-2014 👍

onerandomusername

forgot to come back, approved now

mbaruh

Looks great and seems to be working. Thanks!

mbaruh · 2021-12-23T14:59:14Z

@@ -481,7 +482,10 @@ async def _has_urls(self, text: str) -> Tuple[bool, Optional[str]]:
        for match in URL_RE.finditer(text):
            for url in domain_blacklist:
                if url.lower() in match.group(1).lower():


I'd save url.lower() and match.group(1).lower() into separate variables just because those values are used a couple of times each, but it's not a big deal here.

ChrisLovering

lgtm :D

Akarys42

Proxy approval

Kronifer requested review from Akarys42, Den4200, MarkKoz, jb3 and mbaruh as code owners October 18, 2021 14:13

ChrisLovering requested changes Oct 18, 2021

View reviewed changes

Comment thread bot/exts/filters/filtering.py Outdated

Comment thread bot/exts/filters/filtering.py Outdated

Kronifer requested a review from ChrisLovering October 18, 2021 18:38

ChrisLovering approved these changes Oct 18, 2021

View reviewed changes

Comment thread bot/exts/filters/filtering.py Outdated

Comment thread bot/exts/filters/filtering.py Outdated

ChrisLovering requested changes Oct 18, 2021

View reviewed changes

Kronifer requested a review from ChrisLovering October 21, 2021 18:32

ChrisLovering approved these changes Oct 21, 2021

View reviewed changes

ChrisLovering enabled auto-merge (squash) October 21, 2021 19:02

Xithrius requested a review from kosayoda October 23, 2021 01:44

D0rs4n suggested changes Oct 25, 2021

View reviewed changes

Comment thread bot/exts/filters/filtering.py Outdated

ChrisLovering requested changes Oct 25, 2021

View reviewed changes

auto-merge was automatically disabled December 3, 2021 03:04
Head branch was pushed to by a user without write access

Kronifer requested review from ChrisLovering and D0rs4n December 3, 2021 03:07

onerandomusername suggested changes Dec 3, 2021

View reviewed changes

Comment thread bot/exts/filters/filtering.py Outdated

onerandomusername suggested changes Dec 3, 2021

View reviewed changes

Comment thread bot/exts/filters/filtering.py

onerandomusername mentioned this pull request Dec 3, 2021

remove default thread archive time #1987

Merged

Kronifer removed the request for review from Akarys42 December 15, 2021 01:36

onerandomusername approved these changes Dec 21, 2021

View reviewed changes

mbaruh approved these changes Dec 23, 2021

View reviewed changes

HassanAbouelela mentioned this pull request Dec 23, 2021

Unfurl Redirects #1961

Closed

2 tasks

Xithrius requested review from MrHemlock and removed request for Den4200 and kosayoda December 24, 2021 00:00

ChrisLovering approved these changes Dec 26, 2021

View reviewed changes

Kronifer added 2 commits December 26, 2021 11:52

feat: added url parsing to filters with support for relative URLs

296e565

feat: changed to tldextract

546aee9

Akarys42 approved these changes Dec 26, 2021

View reviewed changes

ChrisLovering merged commit 24d0c4e into python-discord:main Dec 26, 2021

Xithrius removed the s: needs review Author is waiting for someone to review and approve label Feb 20, 2022

Uh oh!

Conversation

Kronifer commented Oct 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kronifer commented Oct 18, 2021

Uh oh!

mbaruh commented Oct 18, 2021

Uh oh!

Kronifer commented Oct 18, 2021

Uh oh!

mbaruh commented Oct 18, 2021

Uh oh!

Kronifer commented Oct 18, 2021

Uh oh!

Kronifer commented Oct 18, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChrisLovering left a comment

Choose a reason for hiding this comment

Uh oh!

ChrisLovering left a comment

Choose a reason for hiding this comment

Uh oh!

D0rs4n left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChrisLovering left a comment

Choose a reason for hiding this comment

Uh oh!

onerandomusername left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kronifer commented Dec 3, 2021

Uh oh!

Kronifer commented Dec 9, 2021

Uh oh!

onerandomusername commented Dec 15, 2021

Uh oh!

onerandomusername commented Dec 15, 2021

Uh oh!

onerandomusername left a comment

Choose a reason for hiding this comment

Uh oh!

mbaruh left a comment

Choose a reason for hiding this comment

Uh oh!

mbaruh Dec 23, 2021

Choose a reason for hiding this comment

Uh oh!

ChrisLovering left a comment

Choose a reason for hiding this comment

Uh oh!

Akarys42 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Kronifer commented Oct 18, 2021 •

edited

Loading