-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow special characters in usernames #6830
Comments
The regex for detecting mentions is widespread between different systems and it does not expect a dot in the first part of the username. So this really cannot be changed. I suggest maybe stripping out the dot, so it becomes firstnamelastname, or firstname_lastname |
I understand the difficulty but it is quite annoying if mastodon is the only service that have a specific username. For regular accounts, it is possible to login via email. In this case I see no limitation on the email adress. What about the following workflow for LDAP connection:
|
As far as I understand, this issue is not really fix so it should not be closed. @Gargron, What do you think of the workflow above? |
this definitely needs to be fixed! |
Is there any chance of opening this back up for discussion? I found this issue after trying to set up a username that used a character I don't think of as particularly 'special': à. An inclusive web needs better representation than 'only Latin script without diacritics'. We should be past the days of technological limitations preventing use of characters outside of an anglo-centric set of letters. |
Indeed. For not the entirety of UTF-8 to be supported by default is genuinely insane, when surely all that is necessary is sanitization by quotation of the username string when it stored and processed. |
I retract this, I just finished reading #8503. There are security concerns. Summarizing for future non-dev readers: Unicode contains many invisible and other tricky characters that would make phishing easier (We are already seeing the consequences of Note: It is not specified why not the "problematic" characters only are removed instead of limiting usernames to just |
@jaacko-torus, surely the solution then is to use a blacklist rather than a whitelist (by default)? Or, if an instance's administrators are seriously concerned about security, allow them to switch the rules to a whitelist instead. Consider a personal server for a Chinese or Arabian family. Why would we restrict them using their own names? To force them to be Latinized seems genuinely dismissive of their needs simply because we are not them. |
@jaacko-torus, I think you referenced the wrong issue? The one you linked is about the UX of the follow button on a profile that has been migrated. If that's the right issue, I'll need to more context on how you believe that's demonstrating the issues you're talking about, if you can 😅 A user handle is a user's identity on a service. It's not about having characters in a username, it's about being able to genuinely represent identity. In most definitions, 'à' is not even a special character. Special characters are usually defined as something along the lines of 'non alphanumeric characters' or 'letters and numbers'. 'À' is a letter. I understand the security issues inherent in having all of Unicode available. I suppose that you referencing the Crucially, the character which makes that phishing tactic so strong, "∕" (U+2215), is not an alphanumeric character. It is indeed a special character. It's not commonly used to form words, and isn't commonly used as a way to define a person's identity. There are literal names that use the character 'à'. I agree with @RokeJulianLockhart: settling on "it's good enough" because you can meet your needs for expressing your identity within the limits seems dismissive. When we talk about what is "good enough", I think it's always important to consider "for whom". Is the character set being limited to the set of Latin characters used in English and some other languages good enough for all users? |
But: I do not have any security background, and it's quite the serious and heavy topic. As such, I believe non developers can be satisfied. The reason is simple, the Why are dots We can talk about it, or we can fix it. Let's go. First of all, I would like us to define "special characters" as anything that is NOT in the regex please, otherwise, give me an alternative, I don't want to say "anything that is NOT in the regex" every time. The regex for user names is the following: USERNAME_RE = /[a-z0-9_]+([a-z0-9_\.]+[a-z0-9_]+)?/i
# EDIT: It also appears that `"jaacko.torus" == "jaackotorus"` for `"."`.
# We might have to make similar considerations, when adding some of these characters.
# For later: https://github.com/kibousoft/mastodon/blob/72d8d3160a63e61ba2af59bf9e90006048bce835/app/validators/unique_username_validator.rb It can be found in this file, which was modified by @Gargron (remember to give the man some love, he added Once more, I am no security expert, and as such, any recommendations as to what the actual characters that we are blacklisting are, should be thoroughly inspected by you guys. Let's define scope. I believe all "word" characters should be included, that is, everything that can be rendered by, say Noto Sans, minus symbols and characters that can be used for phishing. I recommend: Have a whitelist, AND a blacklist, with the blacklist running after the whitelist. Looking through the character classes, I would say the whitelist should include "by default": Ll, Lm, Lo, Lt, Lu, Mn, Nd, Nl, No, Pc, Pd, and Sk. As a little side note: I agree with a previous comment by @RokeJulianLockhart that this stuff should be configurable. As I am no Mastodon admin, idk where that code lies, but I think that should be a separate issue. For now though, it might be good to come up with a set of good defaults. Does that sound good? |
Ah, almost forgot, the database that collects this information would also need to be modified, since most databases don't allow willy nilly UTF characters in strings without some more custom setup 🙄 (This is another one of those, idk why, but it's that way problems). Here's a Stack Overflow QA that goes over the issue. (I believe Mastodon only uses PostgreSQL) |
Currently, the username can only be letters, numbers and underscores.
The users in my LDAP directory have uids like "firstname.lastname" with their mail "firstname.lastname@example.com".
With the current implementation, mastodon does not create a profile for the users as there is a "dot" in the username.
I think that my situation can be quite common, especially with LDAP integration as sysadmins will not want to create a new ID just for mastodon.
Why doesn't mastodon support extended usernames? Am I missing some obvious reason?
As another solution, it should be possible to authenticate to mastodon with the email adress from the LDAP directory. With the current solution (setting LDAP_UID=mail), the user creation fails with the error regarding the "letters/numbers only".
master
(If you're a user, don't worry about this).The text was updated successfully, but these errors were encountered: