-
-
Notifications
You must be signed in to change notification settings - Fork 734
Cross-channel and deleted messages anti-spam #1760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The on_message event calculated the max interval value every time for no reason. The value is constant throughout the bot's up time.
The anti-spam cog now uses a cache instead of reading channel history. The cache is for all channels in the guild, and does not remove deleted messages. That means that the anti-spam logic now works cross-channel and counts deleted messages. The size of the cache is determined via a new field in the config YAML file. The cache was implemented as a separate class, MessageCache, which uses circular buffer logic. This allows for constant time addition and removal form either side, and lookup. The cache does not support removal from the middle of the cache. The cache additionally stores a mapping from message ID's to the index of the message in the cache, to allow constant time lookup by message ID. The commit additionally adds accompanying tests, and renames `cache.py` to `caching.py` to better distinguish it from the new `message_cache.py` and convey that it's for general caching utilities.
The anti-spam cog was amended to handle cross-channel spam.
Since the anti-spam now works cross-channels, it makes no sense to identify it by the channel in which it was invoked. The DeletionContext class was changed to accept a frozenset of members, and the message_deletion_queue dict uses the frozensets as keys. DeletionContext still accepts a channel on creation, because while it might get added more channels, there's only one channel in which the mute message will be sent. Using members as the key can run into the issue of one member becoming irrelevant to the filter while others still are, resulting in another log message being sent, but it's an unlikely edge case since the users should be muted almost immediately, and we're currently not using any multi-member filters in the first place.
Akarys42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is very well written, and the feature works perfectly well. Thanks a lot, Zig, favourite zebra ever!
Removed unused import, corrected docstring, and removed unnedded type annotation.
getitem based iteration included operations that aren't necessary when iterating over the cache continuously. Adding an iter method to the class seems to have improved iteration speed by several orders of magnitude.
| channel_messages = defaultdict(list) | ||
| for message in messages: | ||
| channel_messages[message.channel].append(message) | ||
| for channel, messages in channel_messages.items(): | ||
| await channel.delete_messages(messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this, you go through each message and add them to the list in the dictionary?
Then you go through this dictionary and delete those messages in bulk?
Why is the channels a default dict? I thought we already had all channels because of the set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm grouping the messages per channel to be able to use the delete_messages method on all messages of that channel at the same time.
| """Add the received message to the end of the cache.""" | ||
| if self._is_full(): | ||
| del self._message_id_mapping[self._messages[self._start].id] | ||
| self._start = (self._start + 1) % self.maxlen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps _start could be a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit?
| index = self._message_id_mapping.get(message_id, None) | ||
| return self._messages[index] if index is not None else None | ||
|
|
||
| def update(self, message: Message) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this used? This should be a reference to the same Message object, so it will be changed "automatically".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in the antispam's on_message_edit. The listener receives two different Message objects.
|
|
||
| def __iter__(self) -> t.Iterator[Message]: | ||
| if self._is_empty(): | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to return an empty iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iteration stops when you return. There's no difference between having something to iterate on first and then return, and having nothing to iterate on and then return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting, doesn't type checkers complain though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing that I've seen so far. Once you have a yield statement the function returns a generator which you use to get the values, and a generator is an iterator.
wookie184
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked through the MessageCache implementation too closely, but the rest looks great and seems to work perfectly.
Bluenix2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 🦓. Looks good!
Closes #622, closes #665, closes #1522
This PR gives the AntiSpam cog the ability to detect spam of users across channels, and even when their messages (or part of them) were deleted, either by them or automatically because of another filter.
Implementation
The feature was implemented by making the on_message listener search for messages in a cache, rather than the history of the channel the message was sent in.
The MessageCache class
The cache used is an instance of a class added in this PR, rather than the one provided by discord.py. This has several advantages:
Not all features of the MessageCache (particularly slicing) are currently needed, but I preferred making a complete feature that didn't require too many future alterations, as we were recently discussing doing more things with it.
Changes to DeletionContext
As a spam event is no longer defined by a single channel, I couldn't use it anymore to identify a deletion context in the
message_deletion_queue. I chose to identify a spam event by the users causing it instead, allowing additional channels to be added to the deletion context, instead of allowing adding additional members as was before.This can run into the edge-case of one member becoming irrelevant to the filter while others still are, resulting in another log message being sent, but it's an unlikely edge case since the users should be muted almost immediately, and we're currently not using any multi-member filters in the first place. If new members are causing spam in the same channels in the period of time between the creation of the deletion context, and the alerting, alerting for the new members separately seems acceptable.