New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix åäö in url #215

Closed
wants to merge 3 commits into
base: develop
from

Conversation

Projects
None yet
6 participants
@codler
Contributor

codler commented Dec 23, 2011

Make router to accept åäö in "(:any)"

@sparksp

This comment has been minimized.

Show comment
Hide comment
@sparksp

sparksp Dec 24, 2011

@codler what will happen if you try to use any other characters, like ƒф for a random example? Would it be better to pass the URL through Str::ascii before matching it instead of adding a handful of characters to the pattern?

sparksp commented on 8cccfff Dec 24, 2011

@codler what will happen if you try to use any other characters, like ƒф for a random example? Would it be better to pass the URL through Str::ascii before matching it instead of adding a handful of characters to the pattern?

@codler

This comment has been minimized.

Show comment
Hide comment
@codler

codler Dec 25, 2011

Contributor

This fix will only fix for å, ä and ö.

Yes it would be better if you some how could use the ascii list instead of adding characters to the pattern. But this is quick fix for åäö :)

Contributor

codler commented Dec 25, 2011

This fix will only fix for å, ä and ö.

Yes it would be better if you some how could use the ascii list instead of adding characters to the pattern. But this is quick fix for åäö :)

@ianlandsman

This comment has been minimized.

Show comment
Hide comment
@ianlandsman

ianlandsman Dec 27, 2011

Contributor

Stripping URL's to ascii may not be a great long term solution (or adding selected characters as accepted) as UTF8 URL's are fairly well accepted and limiting URL's to ascii sucks for every non-latin language where people want clean URL's in Japanese or whatever. I think it will probably be a better long term solution to properly handle UTF8 URL's, I know this will be a requirement we have for the projects we intend to use Laravel for at UserScape.

More on URL characters in HTML5 http://lists.w3.org/Archives/Public/public-html/2009Mar/att-0444/draft.html

Contributor

ianlandsman commented Dec 27, 2011

Stripping URL's to ascii may not be a great long term solution (or adding selected characters as accepted) as UTF8 URL's are fairly well accepted and limiting URL's to ascii sucks for every non-latin language where people want clean URL's in Japanese or whatever. I think it will probably be a better long term solution to properly handle UTF8 URL's, I know this will be a requirement we have for the projects we intend to use Laravel for at UserScape.

More on URL characters in HTML5 http://lists.w3.org/Archives/Public/public-html/2009Mar/att-0444/draft.html

Show outdated Hide outdated laravel/routing/router.php Outdated
@ericlbarnes

This comment has been minimized.

Show comment
Hide comment
@ericlbarnes

ericlbarnes Dec 27, 2011

Member

I did some more research on this and here is what I came up with:
https://gist.github.com/1525052

That seems to allow any form of character to be used in routes. I don't believe this is the cleanest method but it is working none the less. :)

Member

ericlbarnes commented Dec 27, 2011

I did some more research on this and here is what I came up with:
https://gist.github.com/1525052

That seems to allow any form of character to be used in routes. I don't believe this is the cleanest method but it is working none the less. :)

@codler

This comment has been minimized.

Show comment
Hide comment
@codler

codler Dec 28, 2011

Contributor

I made some changes and now it is really any character.

Contributor

codler commented Dec 28, 2011

I made some changes and now it is really any character.

@sparksp

This comment has been minimized.

Show comment
Hide comment
@sparksp

sparksp Dec 28, 2011

Contributor

@codler if anything you should probably check it's not "/", ";" or "?" as these are the reserved characters in a URL, from the RFC...

Within the and components, "/", ";", "?" are reserved.

Also in the URL RFC...

In addition, octets may be encoded by a character triplet consisting
of the character "%" followed by the two hexadecimal digits (from
"0123456789ABCDEF") which forming the hexadecimal value of the octet.
(The characters "abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.

That is to say that only ASCII-127 characters are valid in a URL, everything else must be % encoded. With this in mind @ericbarnes solution is probably safest, when it's working.

Contributor

sparksp commented Dec 28, 2011

@codler if anything you should probably check it's not "/", ";" or "?" as these are the reserved characters in a URL, from the RFC...

Within the and components, "/", ";", "?" are reserved.

Also in the URL RFC...

In addition, octets may be encoded by a character triplet consisting
of the character "%" followed by the two hexadecimal digits (from
"0123456789ABCDEF") which forming the hexadecimal value of the octet.
(The characters "abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.

That is to say that only ASCII-127 characters are valid in a URL, everything else must be % encoded. With this in mind @ericbarnes solution is probably safest, when it's working.

@ianlandsman

This comment has been minimized.

Show comment
Hide comment
@ianlandsman

ianlandsman Dec 28, 2011

Contributor

If you go to the spec link above and search on octets you'll come to the relevant section. So encoding should do it and the browsers will then handle it and I believe modern ones will display the text in the address bar as the UTF8 characters even though it was encoded. I'm still not 100% clear on what happens if I just manually type in my address bar a set of UTF8 characters. For example, if I was japanese and I know the URL I want is company.com/最近の出来事 what happens? Not sure, I think it may not work but then again I'm not clear on if it's supposed to work or if in fact that's not valid to do. It sounds like it may not be or that the browser should encode it for me which perhaps some do and some don't.

I'll try and check it out later today if I have time.

Contributor

ianlandsman commented Dec 28, 2011

If you go to the spec link above and search on octets you'll come to the relevant section. So encoding should do it and the browsers will then handle it and I believe modern ones will display the text in the address bar as the UTF8 characters even though it was encoded. I'm still not 100% clear on what happens if I just manually type in my address bar a set of UTF8 characters. For example, if I was japanese and I know the URL I want is company.com/最近の出来事 what happens? Not sure, I think it may not work but then again I'm not clear on if it's supposed to work or if in fact that's not valid to do. It sounds like it may not be or that the browser should encode it for me which perhaps some do and some don't.

I'll try and check it out later today if I have time.

@ericlbarnes

This comment has been minimized.

Show comment
Hide comment
@ericlbarnes

ericlbarnes Dec 28, 2011

Member

Just throw some more info out I have an old customer that is now a friend that runs this site - http://q8board.com/

For his site I had to make some changes but he uses utf-8 URI's. I will ping him today as well and see any gotchas that should be considered. I am sure he would know if IE 6 or 7 supports this but I wouldn't think he would be using this if didn't.

Member

ericlbarnes commented Dec 28, 2011

Just throw some more info out I have an old customer that is now a friend that runs this site - http://q8board.com/

For his site I had to make some changes but he uses utf-8 URI's. I will ping him today as well and see any gotchas that should be considered. I am sure he would know if IE 6 or 7 supports this but I wouldn't think he would be using this if didn't.

@ianlandsman

This comment has been minimized.

Show comment
Hide comment
@ianlandsman

ianlandsman Dec 28, 2011

Contributor

ah, great example. Yes, most bigger sites I've seen encode the URL's and the browser displays them but if you view source of course you see the encoding in the link. His site has the straight UTF8 chars in the links. Would be useful to know his practical experience with it.

Contributor

ianlandsman commented Dec 28, 2011

ah, great example. Yes, most bigger sites I've seen encode the URL's and the browser displays them but if you view source of course you see the encoding in the link. His site has the straight UTF8 chars in the links. Would be useful to know his practical experience with it.

@q8coder

This comment has been minimized.

Show comment
Hide comment
@q8coder

q8coder Dec 28, 2011

Hi, i am very proud to invite me here to partcipate in this issue. utf-8 url's is the best solution to accepted all languages in browser. i have used Eric script for 3 years with out any problem with utf-8 url's

q8coder commented Dec 28, 2011

Hi, i am very proud to invite me here to partcipate in this issue. utf-8 url's is the best solution to accepted all languages in browser. i have used Eric script for 3 years with out any problem with utf-8 url's

@ericlbarnes

This comment has been minimized.

Show comment
Hide comment
@ericlbarnes

ericlbarnes Dec 28, 2011

Member

@q8coder - I don't remember exactly what all changes needed to be made on your setup. Do you know if we had to decode your urls or if all browsers handled the decoding automatically? Also have you ran into any issues with your current setup?

Member

ericlbarnes commented Dec 28, 2011

@q8coder - I don't remember exactly what all changes needed to be made on your setup. Do you know if we had to decode your urls or if all browsers handled the decoding automatically? Also have you ran into any issues with your current setup?

@q8coder

This comment has been minimized.

Show comment
Hide comment
@q8coder

q8coder Dec 28, 2011

only set the htaccess file to accpeted all characters (,) in the url . then all browser will display fine

q8coder commented Dec 28, 2011

only set the htaccess file to accpeted all characters (,) in the url . then all browser will display fine

@taylorotwell

This comment has been minimized.

Show comment
Hide comment
@taylorotwell

taylorotwell Feb 3, 2012

Member

Wildcards now can handle UTF-8 segments.

Member

taylorotwell commented Feb 3, 2012

Wildcards now can handle UTF-8 segments.

flap152 pushed a commit to flap152/laravel that referenced this pull request Nov 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment