-
-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilterLists: Manage whitelisting and blacklisting via the bot #1058
Conversation
We shouldn't be making an API call for every single message posted, so what we're gonna do is cache the data in the Bot, and then update the cache whenever we make changes to it via our new AllowDenyList cog. Since this cog will be the only way to make changes to this, this level of lazy caching should be enough to always keep the cache up to date.
We'll use this to ensure the input is valid when people try to whitelist or blacklist stuff. It will fetch its data from an Enum maintained on the site, so that the types of lists we support will only need to be maintained in a single place, instead of duplicating that data in the bot and the site.
Currently, some types of errors are returning plain strings that repeat the input (which can be exploited to deliver stuff like mentions), and others are returning generic messages that don't give any exception information. This commit unifies our approach around putting as much information as we can (including the exception message), but always putting it inside an embed, so that stuff like pings will not fire. This, combined with the 1.4.0a `allowed_mentions` functionality, seems like a reasonable compromise between security and usability.
Instead of just dumping the JSON response from the site, we'll build a data structure that it will be convenient to access from our new cog, and from the Filtering cog.
This includes commands to add, remove and show the items in the whitelists and blacklists for the different list types. Commands are limited to Moderators+.
Instead of fetching the guild invite IDs from config-default.yml, we will now be using the AllowDenyList cache to check these.
Also updates the tests for this cog.
This gives easier access to the Guild ID in the place where you're most likely to want to use the whitelist command.
We will now validate and convert any standard discord server invite to a guild ID, and automatically add the name of the server as a comment. This will ensure that the list of whitelisted guild IDs will be readable and nice. This also makes minor changes to list output aesthetics.
We want to support deletion of both IDs and guild invites, so we need a bit of special handling for that.
We now filter guild invites the following way: - Whitelisted invites are always permitted. - Blacklisted invites are never permitted. - If the invite is not blacklisted, it is permitted only if it is a Verified or a Partnered server, otherwise not. This strategy was decided on during the June 7th staff meeting, see https://github.com/python-discord/organisation/issues/261
bot/cogs/antimalware.py
Outdated
@@ -38,6 +38,16 @@ class AntiMalware(Cog): | |||
def __init__(self, bot: Bot): | |||
self.bot = bot | |||
|
|||
def _get_whitelisted_file_formats(self) -> list: | |||
"""Get the file formats currently on the whitelist.""" | |||
return [item.get('content') for item in self.bot.allow_deny_list_cache['file_format.True']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's slightly unfortunate you have to use a comprehension every time you need a list of contents. Can the data be restructured to avoid this? In practice, you only need the content and the comments, so caching them in a dict with the former as keys and latter as values may work. Admittedly, it does sound a bit strange, but the contents are guaranteed to be unique already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you've got a point. I mean, this is cheap since this list will always be really really short, but we are running this list comprehension for every single message that contains an attachment and isn't sent by a staff member.
I suppose caching may be possible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "caching", I meant modifying the structure of the cache you've implemented rather than creating a separate cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have no idea how you propose we do that.
This means we can validate invites that start with https://, whereas before we could not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works really well. I really only have one requested change.
- Do you think explicit feedback if a command fails would be useful? One can only assume if it lacks a reaction.
- Having to include the dot for file extensions may trip people up, but I think changing the anti-malware code to support the lack of a dot would be annoying.
- I wish there was a way to show which list types are supported in the help text for commands. Right now, the only way to see that is to put some bad list type and rely on the converter's error message.
- I really dislike how big the error messages are now. I think it needs to stop displaying the help command if it's going to be using an embed for the error. Alternatively, somehow merge the help command and error message into one embed (sounds complicated and out of scope). If help isn't displayed, to figure out a command, users would have to 1. know the help command exists 2. move to a different channel to use it (unless they're staff).
We want the !help invocations to give you all the information you need in order to use the command. That also means we need to provide the valid filterlist types, which are subject to change. So, we fetch the valid ones from the API and then dynamically insert them into the docstrings.
For deleting and listing data, we now get some more feedback when things fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good and works well! Just a few minor suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only stylistic nit-picks from an initial look ~ I'll be able to do a more thorough review later in the week if still necessary. But it's looking good.
Up until now, we have managed various whitelists and blacklists via adding constants to our
default-config.yml
file. However, it has often been suggested that we should instead be managing these via a bot command, and storing them in the database.Related issues:
!whitelist
command to whitelist stuff from our filters. #639NOTE: This pull request must be merged after python-discord/site#305!
This is a big pull request, best reviewed commit-by-commit, which makes a number of changes to filtering, antimalware, constants, and implements a new cog for managing blacklists and whitelists.
Cache blacklists and whitelists, to minimize API requests.
We don't want to be making API request for every single message, but we need to scan all messages with our filters and see if they contain blacklisted words, or server invites. The solution is caching. This pull requests makes the bot cache all the AllowDenyList data from the site when the bot starts, and then updates this cache whenever the data changes.
Changes to the exception handler
Currently, we're not returning the error message whenever a BadArgument is raised from a command. This was done because some BadArguments would contain the input data, and we would then post that input data directly into the response from the bot when the exception was raised. This allowed users to exploit the
!raw
command with something like!raw @Developers
in order to have the bot ping 65 000 users in our guild and create absolute chaos.So, to prevent that, we just made BadArgument not return the exception message at all. But this isn't a great solution, because the exception message often contains valuable information that our users need in order to figure out how to pass a valid argument. For the cog I was making for the whitelist and blacklist, I felt I needed to provide this kind of information to the users, particularly to tell them which kinds of blacklists and whitelists we support.
This PR changes the exception handler so that all types of exceptions handled there will return an embed containing the full exception message. This strikes a nice balance between usability and security, since pings do not work from inside embeds.
New cog: FilterLists
The main attraction of this pull request is of course the new FilterLists cog. This cog allows us to add or remove items in the whitelist or the blacklist, and to show all the items that are currently there.
Adding file formats:
Adding guild invites:
Removing stuff:
Listing guild invites:
Listing file formats:
Changes to AntiMalWare and Filtering
These two existing cogs have gotten some kaizen, and have been refactored to utilize the blacklists and whitelists in the cache instead of relying on constants.
The constants they used before have been removed.
We no longer differentiate between
word_watchlist
andtoken_watchlist
- if you need word boundaries before and after your regex, you just add a\b
. This new system, simply calledfilter_tokens
, allows you then to input any regex you want with no implicit magic.New converters
In order to validate list types, I've created a converter called
ValidAllowDenyListType
, which will make an API request to figure out exactly which kinds of blacklists and whitelists we support. This means that the available blacklists and whitelists is managed in realtime from a single location - an enum inside the model definition on the site. This is the DRYest way I could think of to do this.In order to validate guild invites, I've created a converter called
ValidDiscordServerInvite
, which does the following:New guild filtering logic
As discussed in https://github.com/python-discord/organisation/issues/261, we wanted a slightly more nuanced approach to filtering guild invites. This PR makes these changes, so that the following is true: