enhancement(filters): use a stricter bot token regex#2006
Conversation
c676f1a to
b5f1305
Compare
|
I'll be fixing this commit history tonight |
033ca53 to
941ea8c
Compare
Bluenix2
left a comment
There was a problem hiding this comment.
Works as usual - I can't get access to an MFA token to test though. Thanks for this, I just have one comment that won't affect my review.
Akarys42
left a comment
There was a problem hiding this comment.
LGTM, just a small style change
wookie184
left a comment
There was a problem hiding this comment.
Just a couple things, could you also make the message match the other token message a bit closer, particularly:
- having the user as a clickable mention and the ID in a codeblock afterwards
- having the channel as a clickable mention
- putting the censored token in a codeblock
- adding the pfp of the user that send the message as a thumbnail
Co-authored-by: wookie184 <wookie1840@gmail.com>
|
This has been inactive for a while so I'll put it up for grabs. |
What's left to do? Addressing your comment and fixing the merge conflict? |
Yeah, merge main, match the output embed to the existing one, and fix the bug |
|
As far as I can tell the mfa code is no longer necessary as discord doesn't have mfa tokens like that now. |
|
There's also a bit of the matter that this regex is now out of date. I don't know what the current token length is. |
3234e2a to
8f99065
Compare
I can't recall any problems with false positives, even with the current regex, so we can afford to be very general with what we match. Could just check that the first and last parts are >= 10 and middle is >= 5 characters in length or something. |
|
I agree with wookie, I think having extensive matching for the sake of even future-proofing is okay, considering additional checks are being performed |
|
The only inaccurate part is the last section is too short as it is, but because of how the parsing works, that isn't an issue, i suppose |
If that's the case I think we should just have no upper limit on the length of the last section. Otherwise the code is somewhat misleading in how it works. |
|
Hey @onerandomusername, thanks for your work so far. We needed to get this merged so went ahead and implemented the comments. Sorry the MFA part didn't end up being used 😅 |
Requested changes were related to the removed MFA part.
| # Each part only matches base64 URL-safe characters. | ||
| # These regexes were taken from discord-developers, which are used by the client itself. | ||
| TOKEN_RE = re.compile(r"([a-z0-9_-]{23,28})\.([a-z0-9_-]{6,7})\.([a-z0-9_-]{27})", re.IGNORECASE) | ||
| TOKEN_RE = re.compile(r"([\w_-]{10,})\.([\w_-]{5,})\.([\w_-]{10,})", re.IGNORECASE) |
There was a problem hiding this comment.
way too lenient, this makes clearly impossible tokens match, which was the entire point of this pr.
There was a problem hiding this comment.
Way too lenient is not defined by what it can match, but what it will match in practice, which so far has not been an issue. This PR made the regex a little more flexible in terms of false-positives, without increasing the possibility of false-negatives or creating something that has to be modified in the future.
Approval: https://canary.discord.com/channels/267624335836053506/635950537262759947/919203386660884560
Enhances the regex of the token remover to use the same regex that discord itself uses, with a slight modification. The
mfasection was removed, but depending on an updated #1421, may be implemented. Additionally, the sections were grouped to keep working with the current code.I kept the existing validation to keep false positives at a minimum. The current code checks the user resolves, the timestamp is valid, and the last section has at least 3 different characters.
As per @jb3, to implement:
MFA User Token filter #1421now irrelevant.Rejected: