Cross-channel and deleted messages anti-spam #1760

mbaruh · 2021-08-21T12:44:49Z

Closes #622, closes #665, closes #1522

This PR gives the AntiSpam cog the ability to detect spam of users across channels, and even when their messages (or part of them) were deleted, either by them or automatically because of another filter.

Implementation

The feature was implemented by making the on_message listener search for messages in a cache, rather than the history of the channel the message was sent in.

The MessageCache class

The cache used is an instance of a class added in this PR, rather than the one provided by discord.py. This has several advantages:

The collection of messages to search in can be defined to be significantly smaller, as the cog only cares about the messages sent in the last few seconds.
discord.py removes deleted messages from the cache, while the new cache implementation does not. This allows easily handling deleted messages in the anti-spam cog.
Because we don't remove elements from the middle of the sequence, the implementation of the cache could be made such that appending and popping from either side of it, as well as lookup by index and by message ID, can be implemented to run in constant time.

Not all features of the MessageCache (particularly slicing) are currently needed, but I preferred making a complete feature that didn't require too many future alterations, as we were recently discussing doing more things with it.

Changes to DeletionContext

As a spam event is no longer defined by a single channel, I couldn't use it anymore to identify a deletion context in the message_deletion_queue. I chose to identify a spam event by the users causing it instead, allowing additional channels to be added to the deletion context, instead of allowing adding additional members as was before.

This can run into the edge-case of one member becoming irrelevant to the filter while others still are, resulting in another log message being sent, but it's an unlikely edge case since the users should be muted almost immediately, and we're currently not using any multi-member filters in the first place. If new members are causing spam in the same channels in the period of time between the creation of the deletion context, and the alerting, alerting for the new members separately seems acceptable.

The on_message event calculated the max interval value every time for no reason. The value is constant throughout the bot's up time.

The anti-spam cog now uses a cache instead of reading channel history. The cache is for all channels in the guild, and does not remove deleted messages. That means that the anti-spam logic now works cross-channel and counts deleted messages. The size of the cache is determined via a new field in the config YAML file. The cache was implemented as a separate class, MessageCache, which uses circular buffer logic. This allows for constant time addition and removal form either side, and lookup. The cache does not support removal from the middle of the cache. The cache additionally stores a mapping from message ID's to the index of the message in the cache, to allow constant time lookup by message ID. The commit additionally adds accompanying tests, and renames `cache.py` to `caching.py` to better distinguish it from the new `message_cache.py` and convey that it's for general caching utilities.

The anti-spam cog was amended to handle cross-channel spam.

Since the anti-spam now works cross-channels, it makes no sense to identify it by the channel in which it was invoked. The DeletionContext class was changed to accept a frozenset of members, and the message_deletion_queue dict uses the frozensets as keys. DeletionContext still accepts a channel on creation, because while it might get added more channels, there's only one channel in which the mute message will be sent. Using members as the key can run into the issue of one member becoming irrelevant to the filter while others still are, resulting in another log message being sent, but it's an unlikely edge case since the users should be muted almost immediately, and we're currently not using any multi-member filters in the first place.

Akarys42

The code is very well written, and the feature works perfectly well. Thanks a lot, Zig, favourite zebra ever!

bot/utils/message_cache.py

Removed unused import, corrected docstring, and removed unnedded type annotation.

bot/utils/message_cache.py

tests/bot/utils/test_message_cache.py

getitem based iteration included operations that aren't necessary when iterating over the cache continuously. Adding an iter method to the class seems to have improved iteration speed by several orders of magnitude.

bot/exts/filters/antispam.py

Bluenix2 · 2021-08-23T09:54:35Z

bot/exts/filters/antispam.py

+                channel_messages = defaultdict(list)
+                for message in messages:
+                    channel_messages[message.channel].append(message)
+                for channel, messages in channel_messages.items():
+                    await channel.delete_messages(messages)


I don't think I understand this, you go through each message and add them to the list in the dictionary?

Then you go through this dictionary and delete those messages in bulk?

Why is the channels a default dict? I thought we already had all channels because of the set?

I'm grouping the messages per channel to be able to use the delete_messages method on all messages of that channel at the same time.

Bluenix2 · 2021-08-23T10:00:08Z

bot/utils/message_cache.py

+        """Add the received message to the end of the cache."""
+        if self._is_full():
+            del self._message_id_mapping[self._messages[self._start].id]
+            self._start = (self._start + 1) % self.maxlen


Perhaps _start could be a property?

What is the benefit?

bot/utils/message_cache.py

Bluenix2 · 2021-08-23T10:04:06Z

bot/utils/message_cache.py

+        index = self._message_id_mapping.get(message_id, None)
+        return self._messages[index] if index is not None else None
+
+    def update(self, message: Message) -> bool:


When is this used? This should be a reference to the same Message object, so it will be changed "automatically".

This is used in the antispam's on_message_edit. The listener receives two different Message objects.

Bluenix2 · 2021-08-23T10:07:15Z

bot/utils/message_cache.py

+
+    def __iter__(self) -> t.Iterator[Message]:
+        if self._is_empty():
+            return


Do we not need to return an empty iterator?

Iteration stops when you return. There's no difference between having something to iterate on first and then return, and having nothing to iterate on and then return.

Ah interesting, doesn't type checkers complain though?

Nothing that I've seen so far. Once you have a yield statement the function returns a generator which you use to get the values, and a generator is an iterator.

wookie184

I haven't looked through the MessageCache implementation too closely, but the rest looks great and seems to work perfectly.

Bluenix2

Thank you 🦓. Looks good!

mbaruh and others added 5 commits August 17, 2021 17:58

Move max_interval to init

20aea4f

The on_message event calculated the max interval value every time for no reason. The value is constant throughout the bot's up time.

AntiSpam deletes from all spammed channels

e1e104d

The anti-spam cog was amended to handle cross-channel spam.

Fix MessageCache slicing bugs, improve tests

b4ddc0b

mbaruh requested review from Akarys42, Den4200 and jb3 as code owners August 21, 2021 12:44

mbaruh added a: filters Related to message filters: (antimalware, antispam, filtering, token_remover) p: 1 - high High Priority labels Aug 21, 2021

Akarys42 approved these changes Aug 21, 2021

View reviewed changes

bot/utils/message_cache.py Outdated Show resolved Hide resolved

Clean up code

aa2a6b4

Removed unused import, corrected docstring, and removed unnedded type annotation.

bast0006 reviewed Aug 21, 2021

View reviewed changes

bot/utils/message_cache.py Show resolved Hide resolved

bast0006 reviewed Aug 21, 2021

View reviewed changes

tests/bot/utils/test_message_cache.py Outdated Show resolved Hide resolved

mbaruh and others added 2 commits August 21, 2021 20:52

Additional comments and tests for slicing

0531b1e

Improve cache iteration speed

d0d1404

getitem based iteration included operations that aren't necessary when iterating over the cache continuously. Adding an iter method to the class seems to have improved iteration speed by several orders of magnitude.

Xithrius requested review from MarkKoz, SebastiaanZ and bast0006 August 23, 2021 06:55

Xithrius added the t: feature New feature or request label Aug 23, 2021

Bluenix2 reviewed Aug 23, 2021

View reviewed changes

wookie184 approved these changes Aug 23, 2021

View reviewed changes

Merge branch 'main' into mbaruh/anti-spam

c59f13a

Bluenix2 approved these changes Aug 23, 2021

View reviewed changes

mbaruh merged commit 697c0da into main Aug 23, 2021

mbaruh deleted the mbaruh/anti-spam branch August 23, 2021 18:55

Uh oh!

Cross-channel and deleted messages anti-spam #1760

Cross-channel and deleted messages anti-spam #1760

Uh oh!

Conversation

mbaruh commented Aug 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

The MessageCache class

Changes to DeletionContext

Uh oh!

Akarys42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbaruh Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wookie184 left a comment

Choose a reason for hiding this comment

Uh oh!

Bluenix2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mbaruh commented Aug 21, 2021 •

edited

Loading

mbaruh Aug 23, 2021 •

edited

Loading