Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilterLists: Manage whitelisting and blacklisting via the bot #1058

Merged
merged 45 commits into from
Aug 3, 2020

Conversation

lemonsaurus
Copy link
Member

@lemonsaurus lemonsaurus commented Jul 19, 2020

Up until now, we have managed various whitelists and blacklists via adding constants to our default-config.yml file. However, it has often been suggested that we should instead be managing these via a bot command, and storing them in the database.

Related issues:

NOTE: This pull request must be merged after python-discord/site#305!

This is a big pull request, best reviewed commit-by-commit, which makes a number of changes to filtering, antimalware, constants, and implements a new cog for managing blacklists and whitelists.

Cache blacklists and whitelists, to minimize API requests.

We don't want to be making API request for every single message, but we need to scan all messages with our filters and see if they contain blacklisted words, or server invites. The solution is caching. This pull requests makes the bot cache all the AllowDenyList data from the site when the bot starts, and then updates this cache whenever the data changes.

Changes to the exception handler

Currently, we're not returning the error message whenever a BadArgument is raised from a command. This was done because some BadArguments would contain the input data, and we would then post that input data directly into the response from the bot when the exception was raised. This allowed users to exploit the !raw command with something like !raw @Developers in order to have the bot ping 65 000 users in our guild and create absolute chaos.

So, to prevent that, we just made BadArgument not return the exception message at all. But this isn't a great solution, because the exception message often contains valuable information that our users need in order to figure out how to pass a valid argument. For the cog I was making for the whitelist and blacklist, I felt I needed to provide this kind of information to the users, particularly to tell them which kinds of blacklists and whitelists we support.

This PR changes the exception handler so that all types of exceptions handled there will return an embed containing the full exception message. This strikes a nice balance between usability and security, since pings do not work from inside embeds.
image

New cog: FilterLists

The main attraction of this pull request is of course the new FilterLists cog. This cog allows us to add or remove items in the whitelist or the blacklist, and to show all the items that are currently there.

Adding file formats:
image

Adding guild invites:
image

Removing stuff:
image

Listing guild invites:
image

Listing file formats:
image

Changes to AntiMalWare and Filtering

These two existing cogs have gotten some kaizen, and have been refactored to utilize the blacklists and whitelists in the cache instead of relying on constants.

The constants they used before have been removed.

We no longer differentiate between word_watchlist and token_watchlist - if you need word boundaries before and after your regex, you just add a \b. This new system, simply called filter_tokens, allows you then to input any regex you want with no implicit magic.

New converters

In order to validate list types, I've created a converter called ValidAllowDenyListType, which will make an API request to figure out exactly which kinds of blacklists and whitelists we support. This means that the available blacklists and whitelists is managed in realtime from a single location - an enum inside the model definition on the site. This is the DRYest way I could think of to do this.

In order to validate guild invites, I've created a converter called ValidDiscordServerInvite, which does the following:

  • Checks the string against a compiled regex pattern which contains all legal variations of guild invites.
  • If it gets a match, it uses the Discord API to look up the invite and get information about the guild. The converter then returns this information as a dictionary, including stuff like the guild name. We always attach the guild name as a comment when adding a new guild invite to the blacklist or whitelist, so that you can see which Guild ID's belong to which guilds.
  • If the string contains a Guild ID, it raises a BadArgument saying Guild ID's can't be resolved.
  • Otherwise, it raises a BadArgument saying the invite is invalid.

New guild filtering logic

As discussed in https://github.com/python-discord/organisation/issues/261, we wanted a slightly more nuanced approach to filtering guild invites. This PR makes these changes, so that the following is true:

  • A blacklisted guild invite will always trigger an alert
  • A whitelisted guild invite will never trigger an alert
  • Partnered and Verified guild invites only trigger an alert if they are blacklisted.
  • All other guild invites will trigger an alert.

Leon Sandøy added 17 commits July 17, 2020 21:30
We shouldn't be making an API call for every single message posted, so
what we're gonna do is cache the data in the Bot, and then update the
cache whenever we make changes to it via our new AllowDenyList cog.

Since this cog will be the only way to make changes to this, this level
of lazy caching should be enough to always keep the cache up to date.
We'll use this to ensure the input is valid when people try to whitelist
or blacklist stuff. It will fetch its data from an Enum maintained on
the site, so that the types of lists we support will only need to be
maintained in a single place, instead of duplicating that data in the
bot and the site.
Currently, some types of errors are returning plain strings that repeat
the input (which can be exploited to deliver stuff like mentions), and
others are returning generic messages that don't give any exception
information.

This commit unifies our approach around putting as much information as
we can (including the exception message), but always putting it inside
an embed, so that stuff like pings will not fire.

This, combined with the 1.4.0a `allowed_mentions` functionality, seems
like a reasonable compromise between security and usability.
Instead of just dumping the JSON response from the site, we'll build a
data structure that it will be convenient to access from our new cog,
and from the Filtering cog.
This includes commands to add, remove and show the items in the
whitelists and blacklists for the different list types.

Commands are limited to Moderators+.
Instead of fetching the guild invite IDs from config-default.yml, we
will now be using the AllowDenyList cache to check these.
Also updates the tests for this cog.
This gives easier access to the Guild ID in the place where you're most
likely to want to use the whitelist command.
We will now validate and convert any standard discord server invite to a
guild ID, and automatically add the name of the server as a comment.

This will ensure that the list of whitelisted guild IDs will be readable
and nice.

This also makes minor changes to list output aesthetics.
We want to support deletion of both IDs and guild invites, so we need a
bit of special handling for that.
We now filter guild invites the following way:
- Whitelisted invites are always permitted.
- Blacklisted invites are never permitted.
- If the invite is not blacklisted, it is permitted only if it is a
Verified or a Partnered server, otherwise not.

This strategy was decided on during the June 7th staff meeting, see
https://github.com/python-discord/organisation/issues/261
@lemonsaurus lemonsaurus requested a review from a team as a code owner July 19, 2020 19:45
@lemonsaurus lemonsaurus requested review from eivl and MrHemlock and removed request for a team July 19, 2020 19:45
@ghost ghost added the needs 2 approvals label Jul 19, 2020
bot/converters.py Outdated Show resolved Hide resolved
bot/cogs/error_handler.py Show resolved Hide resolved
bot/converters.py Outdated Show resolved Hide resolved
bot/converters.py Outdated Show resolved Hide resolved
bot/cogs/allow_deny_lists.py Outdated Show resolved Hide resolved
bot/cogs/allow_deny_lists.py Outdated Show resolved Hide resolved
bot/cogs/allow_deny_lists.py Outdated Show resolved Hide resolved
bot/cogs/filtering.py Show resolved Hide resolved
bot/cogs/allow_deny_lists.py Outdated Show resolved Hide resolved
bot/cogs/filtering.py Outdated Show resolved Hide resolved
@ghost ghost added s: waiting for author Waiting for author to address a review or respond to a comment and removed needs 2 approvals labels Jul 19, 2020
bot/cogs/filtering.py Outdated Show resolved Hide resolved
bot/bot.py Outdated Show resolved Hide resolved
@@ -38,6 +38,16 @@ class AntiMalware(Cog):
def __init__(self, bot: Bot):
self.bot = bot

def _get_whitelisted_file_formats(self) -> list:
"""Get the file formats currently on the whitelist."""
return [item.get('content') for item in self.bot.allow_deny_list_cache['file_format.True']]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly unfortunate you have to use a comprehension every time you need a list of contents. Can the data be restructured to avoid this? In practice, you only need the content and the comments, so caching them in a dict with the former as keys and latter as values may work. Admittedly, it does sound a bit strange, but the contents are guaranteed to be unique already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you've got a point. I mean, this is cheap since this list will always be really really short, but we are running this list comprehension for every single message that contains an attachment and isn't sent by a staff member.

I suppose caching may be possible here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "caching", I meant modifying the structure of the cache you've implemented rather than creating a separate cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have no idea how you propose we do that.

Leon Sandøy added 2 commits July 29, 2020 20:15
This means we can validate invites that start with https://, whereas
before we could not.
Copy link
Member

@MarkKoz MarkKoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works really well. I really only have one requested change.

  • Do you think explicit feedback if a command fails would be useful? One can only assume if it lacks a reaction.
  • Having to include the dot for file extensions may trip people up, but I think changing the anti-malware code to support the lack of a dot would be annoying.
  • I wish there was a way to show which list types are supported in the help text for commands. Right now, the only way to see that is to put some bad list type and rely on the converter's error message.
  • I really dislike how big the error messages are now. I think it needs to stop displaying the help command if it's going to be using an embed for the error. Alternatively, somehow merge the help command and error message into one embed (sounds complicated and out of scope). If help isn't displayed, to figure out a command, users would have to 1. know the help command exists 2. move to a different channel to use it (unless they're staff).

bot/cogs/filter_lists.py Outdated Show resolved Hide resolved
@ghost ghost added s: waiting for author Waiting for author to address a review or respond to a comment and removed needs 2 approvals labels Jul 29, 2020
@ghost ghost added needs 2 approvals and removed s: waiting for author Waiting for author to address a review or respond to a comment labels Jul 29, 2020
We want the !help invocations to give you all the information you need
in order to use the command. That also means we need to provide the
valid filterlist types, which are subject to change.

So, we fetch the valid ones from the API and then dynamically insert
them into the docstrings.
For deleting and listing data, we now get some more feedback when things
fail.
@kwzrd kwzrd self-requested a review August 2, 2020 15:51
@Den4200 Den4200 self-requested a review August 2, 2020 16:24
Copy link
Member

@Den4200 Den4200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good and works well! Just a few minor suggestions.

bot/__main__.py Outdated Show resolved Hide resolved
bot/cogs/filter_lists.py Outdated Show resolved Hide resolved
@ghost ghost added s: waiting for author Waiting for author to address a review or respond to a comment and removed needs 1 approval labels Aug 2, 2020
Copy link
Contributor

@kwzrd kwzrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only stylistic nit-picks from an initial look ~ I'll be able to do a more thorough review later in the week if still necessary. But it's looking good.

bot/cogs/filter_lists.py Outdated Show resolved Hide resolved
bot/cogs/filter_lists.py Outdated Show resolved Hide resolved
@ghost ghost added needs 1 approval and removed s: waiting for author Waiting for author to address a review or respond to a comment labels Aug 3, 2020
@lemonsaurus lemonsaurus merged commit 1240fa7 into master Aug 3, 2020
@lemonsaurus lemonsaurus deleted the whitelist_system branch August 3, 2020 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a: API Related to or causes API changes a: filters Related to message filters: (antimalware, antispam, filtering, token_remover) p: 2 - normal Normal Priority t: feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

!whitelist command to whitelist stuff from our filters.
4 participants