-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDN SITE_URL is not converted to Punycode #1644
Comments
Interesting. It looks easy-ish :-) |
This is equivalent to:
To solve this, we could just |
PS. the issue is caused by a dumb algorithm (in lxml?) that is handling links like it’s 1999:
This is the problem. Firefox wouldn’t mind the punycode form, and it also wouldn’t mind the real unicode:
On a side note, Chrome supports the percent-escaped link, IE and Safari also fail. test: https://dl.dropboxusercontent.com/u/1933476/IDN.html This is not a statement of support for the Russian Federation |
As a side node, maybe it would be good to write the variables in conf.py direclty in UTF-8. Escaping everything reduces readability to 0 and is not necessary, because conf.py's encoding is given as utf8 either way. |
Requires UTF-8 input on Python 2. Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Done in fb3a7db. Requires UTF-8 input for this to work. |
Found that the bug has a slightly larger impact when using Isso as a comment system. "script src" in the output html file will be incorrect and then the comment file is not loaded, and comments don't work. Solution/workaround: Same as above, write the Domain in Punycode in COMMENT_SYSTEM_ID. |
It looks like the best solution would be to fix things in |
Well, in my (humble) opinion, it is best if Nikola would not convert the UTF-8 to anything: No Punycoding, no escaping. All current web browsers should understand URLs like http://президент.рф/президент.html. Therefore I don't see the necessity to convert this to http://xn--d1abbgf6aiiy.xn--p1ai/%D0%BF%D1%80%D0%B5%D0%B7%D0%B8%D0%B4%D0%B5%D0%BD%D1%82.html. It would greatly improve readability if no conversions would be made. First, in conf.py: If you actually want to read what you have entered in SITE_URL, non-Punycode is better. The HTML sourcecode is, of course, not a problem, but this would be easier to read as well (and save some bytes :D) ... but of course I understand that this may need extreme overhaul. |
We can’t fix this on our own. lxml or one of their upstreams can — talk to the appropriate vendor if you want this fixed nicely. |
Ah ok, sorry for the misunderstanding :) I'll try it. |
@Kwpolska so, if I understand this correctly there's nothing more we can do? Close it? |
@ralsina Possible solutions include: (a) trying to get the link replacer to fix this (which will probably not fix everything); Which one do we choose? |
I'd say b) which looks much easier. |
I tried to fix it with (a) and I failed. Not only did the aforementioned isso src links blow up, it also looks like the URL replacer does not touch the logo link and many others. But, we could leave the patch in for when people want to link to IDN domain names and have Unicode input. Fix in #1668. |
fix #1644 -- work around issues with IDNs
The SITE_URL is not converted correctly to Punycode. For example, when initialising a new Blog and writing:
Site URL [http://getnikola.com/]: http://exämple.com/täst/
, this will result in conf.py to:
SITE_URL = "http://ex\u00e4mple.com/t\u00e4st/"
Correct should be that the domain name is converted to Punycode:
SITE_URL = "http://xn--exmple-cua.com/t\u00e4st/"
The result is that for example Firefox throws an error when clicking on the logo.
I guess that (only) the domain part needs to be isolated from SITE_URL and then converted with "exämple.com".encode("idna") to xn--exmple-cua.com.
Nikola should also keep in mind that the user may edit the SITE_URL in conf.py directly and write the IDN without punycode directly, so for example:
SITE_URL = "http://exämple.com/täst"
Therefore the Punycode convert should best be applied while building, not in the blog init.
The text was updated successfully, but these errors were encountered: