Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat sanitization refactor #17313

Open
Tracked by #23246
Chief-Engineer opened this issue Jun 13, 2023 · 8 comments
Open
Tracked by #23246

Chat sanitization refactor #17313

Chief-Engineer opened this issue Jun 13, 2023 · 8 comments
Labels
Difficulty: 2-Medium A good amount of codebase knowledge required. Issue: Feature Request This issue is a feature request. Issue: Needs Refactor This issue needs a refactor to be fixed. Issue: UI Priority: 1-Urgent GET ON IT STAT

Comments

@Chief-Engineer
Copy link
Contributor

Chief-Engineer commented Jun 13, 2023

This issue has had some scope/intention changes since it was created. Please ping me on discord to talk to me about what admins want/need from it if you're interested in working on it. It's still mostly the same but I'd like to make sure that the intended use cases are going to be covered.

Description

The entire system could use a refactor to work with regex and have prototypes or something define the replacements instead of hardcoding them. It probably can/should also be client side instead of server side. If the client does sanitization before sending each message, the server doesn't have to worry about running sanitizations on each message.

Sanitization should be able to match based on any combination of:

  • Regex match
  • Chat type (speak/whisper/emote)
  • Chat channel (OOC/LOOC/Deadchat/IC)

Ideally, it'd support negative matches, (example: not a whisper) but that isn't super important. There should probably also be some way to order the substitution priority.

Sanitizations should be able to perform any combination of:

  • Replace regex match
  • Remove regex match
  • Block message
  • Send custom message to client
  • Send custom message to server (like making the client emote)
  • Play audio (accessibility and "visibility" feature)

Examples

lol

Conditions:

  • Regex match \blol\b
  • speak/whisper/emote
  • IC
    Actions:
  • Remove regex match
  • Send custom message to server: emote laughs

tbh

Conditions:

  • Regex match \btbh\b
  • speak/whisper/emote
  • IC
    Actions:
  • Replace regex match with to be honest

tbh for people too afraid to replace it

Conditions:

  • Regex match \btbh\b
  • speak/whisper/emote
  • IC
    Actions:
  • Send custom message to client: Please avoid text speak, use "to be honest" instead of "tbh"

Double spaces

Fixes double spaces caused by removals, should happen last
Conditions:

  • Regex match {2,}
  • speak/whisper/emote
  • LOOC/OOC/IC/Deadchat
    Actions:
  • Remove regex match
@Chief-Engineer Chief-Engineer added Issue: Feature Request This issue is a feature request. Issue: Needs Refactor This issue needs a refactor to be fixed. labels Jun 13, 2023
@ShadowCommander ShadowCommander added Priority: 1-Urgent GET ON IT STAT Difficulty: 2-Medium A good amount of codebase knowledge required. Issue: UI labels Dec 30, 2023
@VasilisThePikachu
Copy link
Member

Is it really a good idea to move this to the client? If the client is hacked then they can bypass the sanitization from how I'm imagining this

@Chief-Engineer
Copy link
Contributor Author

Is it really a good idea to move this to the client? If the client is hacked then they can bypass the sanitization from how I'm imagining this

Originally it was intended to be used for stuff like correcting text speak. Since we want to use it for slurs now, it should have the ability to be server side, but having the option to make some of them client side could still be good because all the text speak stuff can be client side and it'll prevent the server from having to match the messages.

@Skarletto
Copy link
Contributor

Skarletto commented Jan 26, 2024

The steps I take when searching logs for slurs and other unacceptable words/sentences are as follow

  1. Regex match slurs and whatnot
  2. Look up the notes of whoever typed slurs and whatnot
  3. If no actions were taken against their behavior, search Chat weblogs using the date the issue happened to find the specific round in which the issue occured
  4. Read the 5 previous and 5 following messages
  5. Find out from context whether the issue is bannable or not
  6. Ban or send a message note depending on findings

@FairlySadPanda
Copy link
Contributor

For #24680: I'm religiously avoiding any form of regex for the chat sanitization work for two reasons:

  1. The number of people who understand how regex works is far smaller than the number of people capable of copy-and-pasting regex off of Stack Overflow into their project. "If you solve technical debt with regex, you create technical debt".
  2. Regex is insufficient for the needed work. For example, there's a responsibility to create a buildable sanitization job (via local events) that does different things, with the job being able to be modified easily via cvars. Dense regex makes this harder than it needs to be.

@Chief-Engineer
Copy link
Contributor Author

@FairlySadPanda The number of people who understand something is always far smaller than the number of people capable of copy/pasting it off of stack overflow. I think having basic text matching and highly encouraging people to use that over regex is fine, but I also think regex needs to be supported to enable filtering that isn't possible with basic text matching.

I don't think I'm familiar with any sort of half decent chat/text filtering system that doesn't support regex. In case it's unclear, I'm not saying that the filtering system should be based on regex substitutions, just that matching with regex should be supported

@FairlySadPanda
Copy link
Contributor

OK: I think the best compromise here is allow regex expressions to be specified as a filter step, but having a robust (ahem) filtration system otherwise.

Regex is just one of those things that tends to creep in as an innocent change. See also SQL 😉

@Chief-Engineer
Copy link
Contributor Author

Ya, the way I imagine it being used is as a condition for filtering

When something like
If SimpleTextMatch Then DoSomething
isn't good enough,
If RegexMatch Then DoSomething
can be used. DoSomething doesn't need to involve any regex in it. As nice as it'd be for it to support having capture groups passed to it, all it needs at a minimum is the full match (or match position), similar to how the simplematch would make DoSomething aware of where the match is so that, if replacement or something needs to happen, it knows where to do it. I think other than replacement, none of the possible filter actions need to know what was matched, just what the filter rule wants done

@FairlySadPanda
Copy link
Contributor

I've closed #24680 and will put a PR in for specifically text filtration that'll have regex as a specific option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: 2-Medium A good amount of codebase knowledge required. Issue: Feature Request This issue is a feature request. Issue: Needs Refactor This issue needs a refactor to be fixed. Issue: UI Priority: 1-Urgent GET ON IT STAT
Projects
Status: 🔖 Defined
Development

No branches or pull requests

5 participants