Skip to content

Update username checks#1892

Merged
ChrisLovering merged 5 commits into
mainfrom
Update-Username-Checks
Dec 6, 2021
Merged

Update username checks#1892
ChrisLovering merged 5 commits into
mainfrom
Update-Username-Checks

Conversation

@ChrisLovering
Copy link
Copy Markdown
Member

@ChrisLovering ChrisLovering commented Oct 18, 2021

Closes #1891

Currently, we check people's usernames when they send a message on the server, to ensure it isn't inappropriate. To get more coverage, we should actually be checking member join and member update events, as people can still have their nickname be visible on server without ever sending a message. They can also update their usernames after sending a message.

I have also:

  • Added a check for the normalised version of the name too, using unicodedata.normalize, to ensure unicode versions of names don't bypass our filters.
  • Returned the first bad match we find, rather than collecting them all, so we don't run it against all of our filters if not needed. If a single bad match is found, I trust our mods to actually read the user's name
  • Moved the check_send_alert() call to the first thing we do, as a redis call is cheaper than checking all our filter tokens

Comment thread bot/exts/filters/filtering.py Outdated
@ChrisLovering ChrisLovering force-pushed the Update-Username-Checks branch from 5616455 to fa501a9 Compare October 18, 2021 21:39
@ChrisLovering ChrisLovering added a: filters Related to message filters: (antimalware, antispam, filtering, token_remover) a: moderation Related to community moderation functionality: (moderation, defcon, verification) p: 2 - normal Normal Priority review: do not merge The PR can be reviewed but cannot be merged now s: needs review Author is waiting for someone to review and approve labels Oct 18, 2021
Copy link
Copy Markdown
Contributor

@HassanAbouelela HassanAbouelela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All clear! The alphabetical sorting was unexpected, but appreciated, nicely done.

@ChrisLovering ChrisLovering force-pushed the Update-Username-Checks branch from fa501a9 to 5645f79 Compare October 18, 2021 21:44
@wookie184
Copy link
Copy Markdown
Contributor

Normalizing strings before checking if they match on all filters removes our ability to match specific unnormalized forms. I guess this should be ok, although we should look into our current filters and see what can no longer match. I'm not sure how aggressive the normalization by confusables is.

@ChrisLovering ChrisLovering force-pushed the Update-Username-Checks branch 3 times, most recently from 6a5a3fa to 2fbcec6 Compare October 18, 2021 22:31
@wookie184
Copy link
Copy Markdown
Contributor

I had a bit more of a look into the string normalisation stuff, so I just want to outline my findings.

  1. Just as a note, we should definitely not use confusables.normalize on the whole string, it has... problems. For some letters it seems to have multiple possible confusable characters, and instead of choosing one, it returns a list of all the possible combinations. If somebody sent 100 characters to the bot (it seems to think it could be an f or an s 🤷), it would have tried to generate a list of length 2**100 (1267650600228229401496703205376).
  2. unicodedata.normalize seems pretty good, although it can't remove accents entirely by itself. However, we can remove them by using the NFKD form which decomposes accented letters into the letter and accent characters separately. We can then remove the accent by filtering whether it is a combining character with unicodedata.combining
''.join(
        c
        for c in unicodedata.normalize("NFKD", text)
        if not unicodedata.combining(c)
    )
  1. That still doesn't normalize quite a lot of other letters, e.g. cyrillic letters. If we did want to match that, we could use confusables.normalize as well but on individual characters at a time to avoid the problem mentioned above (although of course that still wouldn't be perfect).

Our solution doesn't need to be perfect so we can probably just go with the simple option (2) and revisit it if we feel it's necessary.

@ChrisLovering ChrisLovering force-pushed the Update-Username-Checks branch from f12b5f2 to 7e395ae Compare October 19, 2021 16:03
@ChrisLovering ChrisLovering removed the review: do not merge The PR can be reviewed but cannot be merged now label Oct 22, 2021
Comment thread bot/exts/filters/filtering.py Outdated
Comment thread bot/exts/filters/filtering.py Outdated
Copy link
Copy Markdown
Contributor

@onerandomusername onerandomusername left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, keep up the moderation tooling! :shipit:

(side: I agree with bluenix's comments)

Copy link
Copy Markdown
Contributor

@jchristgit jchristgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of unicodedata 👍

Comment thread bot/exts/filters/filtering.py Outdated
@ChrisLovering ChrisLovering force-pushed the Update-Username-Checks branch from 7e395ae to 16f389e Compare December 6, 2021 23:07
@ChrisLovering ChrisLovering force-pushed the Update-Username-Checks branch from 16f389e to d0dc7a0 Compare December 6, 2021 23:08
@ChrisLovering ChrisLovering merged commit d56e1b0 into main Dec 6, 2021
@ChrisLovering ChrisLovering deleted the Update-Username-Checks branch December 6, 2021 23:21
@Xithrius Xithrius removed the s: needs review Author is waiting for someone to review and approve label Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a: filters Related to message filters: (antimalware, antispam, filtering, token_remover) a: moderation Related to community moderation functionality: (moderation, defcon, verification) p: 2 - normal Normal Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Username Checks

7 participants