-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure unicode characters are handled correctly #282
Comments
Ok, this is quite confusing. Currently ENS manager counts the characters as 7 chars as they are double bytes characters. When I copy&pasted the url, the URL bar shows as 3 characters. However, if I copy&pste the emoji part into console, I can see that it is actually 5 chars because there are 2 whitespace which separates 3 characters. I can further verify that by converting the unicode into array which is suggested as an easy way to count unicode characters.
If I want to add fuel into the fire, 👩❤👨and 👩❤️👨 are actually the same according to this. Can you point to the rule our smart contract is following with some example emojis and their expected number lengths so that I can try my best to match that.🤯 |
Should 💩💩💩💩.eth be counted as 4 chars or 8 chars? Apparently Alex Van de Sande owned it and just migrated http://localhost:3000/name/%F0%9F%92%A9%F0%9F%92%A9%F0%9F%92%A9%F0%9F%92%A9.eth |
We should be counting unicode characters. That means that 'wide' characters only count as one (contrary to JS). This is what the contract does, too - it accepts UTF-8 encoded characters and counts codepoints. 👩❤️👨 is five characters (👩❤👨 with ZWJ characters between them). Alex's name is four. |
Then is converting unicode into array as below cover all the cases or are there any edge cases I have to consider?
|
I'm not a JS expert, but if it converts it to unicode codepoints, then you're sorted. |
They look different and url are also different but the namehash looks the same. emoji 👩❤️👨👩❤️👨👩❤️👨 emoji 👩❤👨👩❤👨👩❤👨 Because the namehash are the same, ens manger treat them the same. Is this a bug on |
What's happening is that We should make sure we're checking the length of a name after normalisation. |
So 👩❤️👨 is displayed as 👩❤👨 and counted as 3? Also do we need to normalize URL string? |
That's right.
Yes; any time a user visits a non-normalised URL, we should normalise it and redirect them. |
Ok, here is a bit more detailed analysis. Please have a look at this. 👩❤️👨actually has 6 chars because there are 2 zwj between the heart and man. Parsing it with idna-uts46 lib (which is put by @jefflau ) takes 1 zwj char away to deconstruct the combined char into 3 chars but there are still 2 zwj chars. If I manually remove all zwj chars (eg: Now I have the following options.
When I try to check how many they will be counted at Solidity using remix with this gist they are counted as same. As A&B generates same namehash while c is different. I will convert all into (b), making it count as 5: FYI, 🏳️🌈looks also combined emoji of 🏳🌈but we can retain the rainbow flag shape after being parsed. Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞ . gets malformed. |
All that needs to be done is this:
Whatever UTS-46 normalisation does is correct, and what we should do everywhere. For normalisation and hashing alike, you can use the eth-ens-namehash library, rather than importing |
For example, https://manager.ens.domains/name/%F0%9F%91%A9%E2%80%8D%E2%9D%A4%EF%B8%8F%E2%80%8D%F0%9F%91%A8.eth should be treated as too short (it's 4 unicode characters) - but the UI doesn't say it's too short, or offer the opportunity to register it.
The text was updated successfully, but these errors were encountered: