Prevent false-positives of the rich embed filter#450
Merged
Conversation
b3ebca5 to
fcb90a4
Compare
#293 The rich embed filter is plagued by false positives now Discord has added more custom preview embeds for various websites. Since these embeds have the `rich` type instead of the `link` type, these embeds triggered the filter we had in place. This commit remedies that by using the existing URL regex pattern to list all the URLs contained in the message content and then checking if the embed url is a member of that list. If so, it's very likely that the embed was auto-generated from that URL, so we should ignore it. This approach deviates slightly from that outlined in #293. This does increase the probability of a false-negative, as a "true" user-generated rich embed could also have a url that's contained in the message body. However, I've checked most of the triggers we have had in the past and none of the legitimate triggers would have been a false-negative under the new rules. Therefore, I think it's very reasonable to adopt this strategy. In addition to the change in behavior of the rich embed filter, I have also kaizened the existing regex patterns by compiling them at load time. Since we check a lot of regex patterns for every message received by the bot, this should be beneficial for performance.
fcb90a4 to
072a0a6
Compare
MarkKoz
approved these changes
Sep 24, 2019
sco1
approved these changes
Sep 24, 2019
Contributor
sco1
left a comment
There was a problem hiding this comment.
That filter tweak is smooth as silk 💯
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rich Embed Filter Changes
The rich embed filter is plagued by false positives now Discord has added more custom preview embeds for various websites. Since these embeds have the
richtype instead of thelinktype, they triggered the filter we have in place.This commit remedies that by using the existing URL regex pattern to list all the URLs contained in the message content and then checking if the embed url is a member of that list. If so, it's very likely
that the embed was auto-generated from that URL, so we should ignore it.
This does increase the probability of a false-negative, as a "true" user-generated rich embed could also have a URL that's contained in the message body. However, I've checked most of the triggers we have had in the past and none of the legitimate triggers would have been a false-negative under the new rules. Therefore, I think it's very reasonable to adopt this strategy.
The approach differs slightly from that outlined in #293, since the approach outlined there would would cause more false-negatives.
Pre-compiling regex patterns.
In addition to the change in behavior of the rich embed filter, I have also kaizened the existing regex patterns by compiling them at load time. Since we compare each message the bot receives against a lot of different regex patterns, compiling them only once when the filtering module is loaded should positively impact performance.