Skip to content

Prevent false-positives of the rich embed filter#450

Merged
sco1 merged 2 commits into
masterfrom
rich-embed-false-positive-fix
Sep 24, 2019
Merged

Prevent false-positives of the rich embed filter#450
sco1 merged 2 commits into
masterfrom
rich-embed-false-positive-fix

Conversation

@SebastiaanZ
Copy link
Copy Markdown
Contributor

Rich Embed Filter Changes

The rich embed filter is plagued by false positives now Discord has added more custom preview embeds for various websites. Since these embeds have the rich type instead of the link type, they triggered the filter we have in place.

This commit remedies that by using the existing URL regex pattern to list all the URLs contained in the message content and then checking if the embed url is a member of that list. If so, it's very likely
that the embed was auto-generated from that URL, so we should ignore it.

This does increase the probability of a false-negative, as a "true" user-generated rich embed could also have a URL that's contained in the message body. However, I've checked most of the triggers we have had in the past and none of the legitimate triggers would have been a false-negative under the new rules. Therefore, I think it's very reasonable to adopt this strategy.

The approach differs slightly from that outlined in #293, since the approach outlined there would would cause more false-negatives.

Pre-compiling regex patterns.

In addition to the change in behavior of the rich embed filter, I have also kaizened the existing regex patterns by compiling them at load time. Since we compare each message the bot receives against a lot of different regex patterns, compiling them only once when the filtering module is loaded should positively impact performance.

@SebastiaanZ SebastiaanZ force-pushed the rich-embed-false-positive-fix branch from b3ebca5 to fcb90a4 Compare September 24, 2019 08:40
#293

The rich embed filter is plagued by false positives now Discord has
added more custom preview embeds for various websites. Since these
embeds have the `rich` type instead of the `link` type, these embeds
triggered the filter we had in place.

This commit remedies that by using the existing URL regex pattern to
list all the URLs contained in the message content and then checking
if the embed url is a member of that list. If so, it's very likely
that the embed was auto-generated from that URL, so we should ignore
it. This approach deviates slightly from that outlined in #293.

This does increase the probability of a false-negative, as a "true"
user-generated rich embed could also have a url that's contained in
the message body. However, I've checked most of the triggers we have
had in the past and none of the legitimate triggers would have been a
false-negative under the new rules. Therefore, I think it's very
reasonable to adopt this strategy.

In addition to the change in behavior of the rich embed filter, I
have also kaizened the existing regex patterns by compiling them at
load time. Since we check a lot of regex patterns for every message
received by the bot, this should be beneficial for performance.
@SebastiaanZ SebastiaanZ force-pushed the rich-embed-false-positive-fix branch from fcb90a4 to 072a0a6 Compare September 24, 2019 08:42
@SebastiaanZ SebastiaanZ added a: filters Related to message filters: (antimalware, antispam, filtering, token_remover) p: 3 - low Low Priority type: enhancement labels Sep 24, 2019
Copy link
Copy Markdown
Contributor

@sco1 sco1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That filter tweak is smooth as silk 💯

@sco1 sco1 merged commit cea29d3 into master Sep 24, 2019
@sco1 sco1 deleted the rich-embed-false-positive-fix branch September 24, 2019 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a: filters Related to message filters: (antimalware, antispam, filtering, token_remover) p: 3 - low Low Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants