-
-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode Character Expansion #42
Comments
I'm fine with this, as long as it expands to the correct Unicode normalization form, whatever the specs say IRIs are expanded to. |
@dougwilson Awesome, I'll start an investigation on that 👍 |
Looks like it's RFC 3987 section 3.1 :) Also, if it helps, Mojolicious already does this stuff, so it'd be interesting to know how they are doing their transformation, since it has worked well for a long time. |
Very nice, thank you 😄 |
@dougwilson Just looking through the implementation, it looks like they decode it at the framework level before passing it to route matches? Did I understand that correctly? http://mojolicio.us/perldoc/Mojo/Path#to_route Is that something that could be considered for Express 5.0? |
I'll have to look more at their source code (which can be browsed at https://metacpan.org/source/SRI/Mojolicious-6.0/lib), but I'm open to whatever we want to do for Express 5.0. Currently Express 5.0 just passes the raw But really, what I wanted to know is how do they do the UTF-8 -> URI transformation. The characters in JavaScript source code are UCS-2 and so we need to have some kind of way to transform source code strings to URIs reliably. For example, if I type Example: decodeURIComponent('%C3%BA') // -> ú
decodeURIComponent('u%CC%81') // -> ú Should a user have to understand that the ú they type in their editor may not match the ú that comes in the URI? |
This is a great idea, especially if there is a way how to get it working reliably for all edge cases. It should solve the problem where there are multiple ways how to encode a single character, be it It may be possible to implement it in a backward-compatible way if we url-decode the path inputs passed by API users too. |
@dougwilson So ES6 has a built in method for this: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize decodeURIComponent('u%CC%81').normalize() === decodeURIComponent('%C3%BA') Edit: Only |
Any news on this? |
@nicooprat This issue seems like the opposite to what you're looking for (this was about unicode inputs to path matching). From kadirahq/flow-router#599, it seems like what you'd like is the ability to specify the way to |
By the way, I took another stab at doing this for the For a simple example, type |
Here's a framework level function that should work (but could use some improvements): function encoding (ignore) {
return {
encode: function (value) {
return encodeURIComponent(value)
},
decode: function (value) {
return value.replace(/(?:%[ef][0-9a-f](?:%[0-9a-f]{2}){2}|%[cd][0-9a-f]%[0-9a-f]{2}|%[0-9a-f]{2})/ig, function (m) {
const char = decodeURIComponent(m)
if (ignore.indexOf(char) > -1) return m
return char
})
}
}
} Note: You can not technically just use |
I have added a basic
|
Unfortunately that was a quick 9 hours. I've released a 5.x which removes Note for implementors: if you're using |
I'm opening this issue to create a conversation around supporting unicode and other illegal characters in the path string. For example, matching 我 currently requires using
%E6%88%91
as part of the path in the current version of the module. Any thoughts?The text was updated successfully, but these errors were encountered: