Skip to content

Clean Cog Rework#1793

Merged
jchristgit merged 61 commits into
mainfrom
cleanrework
Oct 30, 2021
Merged

Clean Cog Rework#1793
jchristgit merged 61 commits into
mainfrom
cleanrework

Conversation

@mbaruh
Copy link
Copy Markdown
Member

@mbaruh mbaruh commented Aug 29, 2021

This pull request restructures the clean cog and adds additional features to it.

Changes

  • !clean message renamed to !clean until
  • Added !clean between which accepts two limits to clean between.
  • The limits are now exclusive. Meaning, unlike now where !clean message <message> deletes the message specified, !clean until will delete the messages and stop at the message specified without deleting it. Same way with between.
  • The limits can now be specified not just via messages, but through ISO datetimes and duration deltas as well. For example, !clean between 5M 15M will delete all message in the channel older than 5 minutes but younger than 15 minutes. The order of the two limits doesn't matter.
  • Added more cancellation checkpoints throughout the deletion process, to be able to stop even if deletion already started. discord.py's purge is no longer used, but instead single-message deletion and per-channel deletion are used where possible, with a check for the value of self.cleaning after every delete call.
  • For the user, regex, bots, and all subcommands, all channels can now be specified for deletion using the * value.
  • As the task of iterating over a set of messages in every single text channel is costly, using the * value will make user of the message cache by default. The commands can be forced to not use the cache through an added flag in those commands. It should be noted that when not using all channels, there is no cache usage in any case.
  • Command invocation is no longer deleted in a mod channel. When the command is done, a checkmark reaction will be added to the command invocation.
  • When cancelling a clean, the message will be deleted after 5 seconds in a non-mod channel, and will not be deleted in a mod channel.
  • The clean cog was moved to the moderation extension. It's a moderation tool that only moderators can use.
  • A master command was added (the top-level command now receives arguments), from which the functionality of all subcommands can be derived. I tried to keep the subcommands simple and applicable to the most common use cases, but that doesn't mean we don't need to be able to use some arbitrary combination of the predicates available through the subcommands. For exmaple, deleting the messages of a certain user which match a certain regex pattern. Or delete the messages from the bots but only in a specific time range. The master command also offers extra functionalities such as accepting more than one user.

The Master Command Usage

An extreme example of the master command being used to delete messages of two users between the specified ISO datetime and creation date of the specified message, but only those messages which contain digits:
image

Examples of using subcommand functionalities through the master command

  • !clean all 10 => !clean 10
  • !clean user <user id> 15 <channel1> channel2> => !clean 15 <user id> <channel1> <channel2>
  • !clean between <limit1> <limit2> => !clean <limit1> <limit2>

Questionable Choices

  • As discord.py's Greedy is an object, not a type, I was unable to use in a Union to create Union[Literal["*"], Greedy[TextChannel]] for the channels. As such I created a special converter for it and placed the channels argument as positional only, as discord.py's treats such a command argument as a dump of all the remaining arguments.
  • The master command having the regex as an optional argument is an issue, as it's a string that can contain anything, including the values of other arguments such as channel mentions. I had to find a way to distinguish the regex pattern from other arguments, and I did so by enforcing the pattern to be enclosed in backticks. For example: \d+. This came after some discussion with moderators, and useful anyway as it cancels formatting while trying to write a pattern, but I'm open to alternatives or to more convenient ways to denote a regex pattern. To make it consistent I made the change in the clean regex subcommand as well.
  • I added an Age converter which takes a time delta and returns the date-time of now minues the delta. Meaning, the opposite of DurationDelta. I placed it in converters.py even though it's not used anywhere else, as because of the added converter annotations, the imported Duration converter is interpreted by type checkers as a timedelta, and so they can't resolve it as having the converter method.

Known Issues

I found a bug which causes logging to fail. It happened after canceling a clean, and the command gave an error that the messages were already added to the DB as deleted. However, reloading the cog made the clean command work properly again. It happened only a couple of times and I was unable to reproduce it by demand.

EDIT: Managed to reproduce a bug, not sure if it's the same one:
image
Addressed in python-discord/site#585

Special Thanks

To @Senjan21 who made the initiative to improve the cog and implemented some of the functionality and to @ChrisLovering who helped fix some of the issues.

Senjan21 and others added 30 commits November 22, 2020 12:31
new name should be more selfexplanatory
The cog is moderation related and all commands are exclusive to moderators.
Between the concurrency check and setting the cleaning flag to True there was an await statement, which could potentially cause race conditions.The setting of the flag was moved to right below the concurrency check.
Added a cog_command_error method that sets cleaning to False when a command ends on an exception.
I don't have anything in mind that might cause this, but it will ensure that in any case the cog will still be usable.
Copy link
Copy Markdown
Contributor

@Shivansh-007 Shivansh-007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well and looks great!

Copy link
Copy Markdown
Contributor

@wookie184 wookie184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, love the new interface and look forward to this being merged 👌, just a couple of comments.

I managed to get this error

bot_1        | 2021-09-19 15:44:56 | bot.exts.backend.error_handler | DEBUG | API responded with 400 for command clean bots: {'deletedmessage_set': [{'author': ['Invalid pk "814417186864496711" - object does not exist.']}, {'author': ['Invalid pk "814417186864496711" - object does not exist.']}, {'author': ['Invalid pk "814417186864496711" - object does not exist.']}]}.

When running !clean bots (don't think the fact it was bots is relevant though) on a webhook. I think 814417186864496711 may have been the message ID, but not completely sue.

Also, when "stress testing" by trying to run a bunch of commands at the same time, I managed to get this error

bot_1       | 2021-09-19 12:56:02 | bot.exts.backend.error_handler | DEBUG | API responded with 400 for command clean all: {'deletedmessage_set': [{}, {}, {'id': ['deleted message with this ID already exists.']}, {'id': ['deleted message with this ID already exists.']}, {}, {}, {}, {}, {}, {}]}.

I'm thinking this one could be because I triggered antispam while running clean commands so clean and antispam were both trying to delete the same messages.

I'm not sure if these errors are particularly relevant to this PR, or should be fixed elsewhere.

Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py
@mbaruh
Copy link
Copy Markdown
Member Author

mbaruh commented Sep 19, 2021

Looks great, love the new interface and look forward to this being merged 👌, just a couple of comments.

I managed to get this error

bot_1        | 2021-09-19 15:44:56 | bot.exts.backend.error_handler | DEBUG | API responded with 400 for command clean bots: {'deletedmessage_set': [{'author': ['Invalid pk "814417186864496711" - object does not exist.']}, {'author': ['Invalid pk "814417186864496711" - object does not exist.']}, {'author': ['Invalid pk "814417186864496711" - object does not exist.']}]}.

When running !clean bots (don't think the fact it was bots is relevant though) on a webhook. I think 814417186864496711 may have been the message ID, but not completely sue.

Also, when "stress testing" by trying to run a bunch of commands at the same time, I managed to get this error

bot_1       | 2021-09-19 12:56:02 | bot.exts.backend.error_handler | DEBUG | API responded with 400 for command clean all: {'deletedmessage_set': [{}, {}, {'id': ['deleted message with this ID already exists.']}, {'id': ['deleted message with this ID already exists.']}, {}, {}, {}, {}, {}, {}]}.

I'm thinking this one could be because I triggered antispam while running clean commands so clean and antispam were both trying to delete the same messages.

I'm not sure if these errors are particularly relevant to this PR, or should be fixed elsewhere.

Yeah I mentioned this kind or error in the PR description. Honestly not sure how to address it

But it makes sense that it's related to a race condition with other commands/cogs. Thanks for looking into it.

@mbaruh
Copy link
Copy Markdown
Member Author

mbaruh commented Sep 19, 2021

I see two possible avenues to handle these errors:

  • Hold a cache in the modlog to store deleted message ID's. Before relaying messages to the API it will filter out messages already cached. This has the issue of storing another cache which is meh, and also I'm not sure that the messages are deleted correctly when this error occurs.
  • Change the site to ignore duplicate keys. This has the issue that we need to scope all of the places in our code where we use this API endpoint to understand whether this will affect something it shouldn't.

@wookie184
Copy link
Copy Markdown
Contributor

I see two possible avenues to handle these errors:

* Hold a cache in the modlog to store deleted message ID's. Before relaying messages to the API it will filter out messages already cached. This has the issue of storing another cache which is meh, and also I'm not sure that the messages are deleted correctly when this error occurs.

* Change the site to ignore duplicate keys. This has the issue that we need to scope all of the places in our code where we use this API endpoint to understand whether this will affect something it shouldn't.

I haven't looked into modlog too much yet, but i'd guess we could probably get away with changing site to ignore duplicate keys (or getting bot to handle the errors site raises because of this). That would probably be the easiest option anyway, I wouldn't mind a cache but it would be nice not to have to add one, especially since this is something that would probably happen so rarely.

I think the webhook-related error I posted is separate though, as webhooks don't have authors it might just be that site just isn't built to handle them at the moment, not sure.

@mbaruh mbaruh requested a review from wookie184 September 20, 2021 20:12
@mbaruh
Copy link
Copy Markdown
Member Author

mbaruh commented Sep 20, 2021

I'll leave handling the 400 errors to a separate PR (probably on the site repo). As far as I can see those are bugs which already exist in prod.

Copy link
Copy Markdown
Contributor

@wookie184 wookie184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Some of the line-breaks make the help command look a bit off
Discord_YmvGEcZT85
This could be fixed by adding \s to the end of lines we don't want newlines on in the docstring, but I believe this is a greater issue across the bot, and i'm not sure what the current status of #1285 is, so it might be better to decide on a solution across the bot first I guess.

@Xithrius Xithrius requested a review from jchristgit October 10, 2021 08:56
Copy link
Copy Markdown
Contributor

@jchristgit jchristgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. I have a few comments, but I like it.

Comment thread bot/converters.py
Comment thread bot/exts/moderation/clean.py
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py
Comment thread bot/exts/moderation/clean.py
Comment thread bot/exts/moderation/clean.py
Comment thread bot/exts/moderation/clean.py Outdated
Comment thread bot/exts/moderation/clean.py

# Reverse the list to have reverse chronological order
log_messages = reversed(messages)
log_url = await self.mod_log.upload_log(log_messages, ctx.author.id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a display for deleted log messages? I think we do, but I don't remember. How does that display deal with multiple channels?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't show the channels, this is something that I noticed the anti-spam needs too. I'm leaving it for a separate PR (needs to be on site as far as I can tell).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Is there an issue open for anti-spam already?

@jchristgit jchristgit enabled auto-merge October 10, 2021 10:51
Copy link
Copy Markdown
Contributor

@wookie184 wookie184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet point bits in the docstrings will need to be changed because of this: #1876, adding the \u2003 override should do the job.

@wookie184 wookie184 linked an issue Oct 17, 2021 that may be closed by this pull request
Discussion in the pull request raised some legitimate use cases for supplying a time range for multiple channels (e.g clean the last couple of minutes instead of specifying number of messages to traverse).
Copy link
Copy Markdown
Contributor

@jchristgit jchristgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rock solid.

Comment thread bot/exts/moderation/clean.py
Copy link
Copy Markdown
Contributor

@wookie184 wookie184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seems good to me :shipit:

@jchristgit jchristgit merged commit af8fe4a into main Oct 30, 2021
@jchristgit jchristgit deleted the cleanrework branch October 30, 2021 09:38
@Xithrius Xithrius removed the s: needs review Author is waiting for someone to review and approve label Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a: backend Related to internal functionality and utilities (error_handler, logging, security, utils and core) a: frontend Related to output and formatting a: moderation Related to community moderation functionality: (moderation, defcon, verification) p: 2 - normal Normal Priority t: enhancement Changes or improvements to existing features t: feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cancelling a message purge doesn't work and has a race condition

8 participants