This repository has been archived by the owner. It is now read-only.

Handling IDNA by adding Punycode encoding in urlParse. #1149 #1174

Closed
wants to merge 7 commits into
from

Conversation

Projects
None yet
4 participants

jeremys commented Jun 13, 2011

  • Adding punycode implementation from http://stackoverflow.com/questions/183485/useless/301287#301287
  • Always encode during urlParse.
  • Adding test cases to test this new encoding.
  • Code pass JSLint.
  • Disabled 2 tests cases. Because of IDNA encoding and the fact that the hostname is not clearly separated. We encode part of the path. This probably could be fixed somewhat but I'm not even sure we're supposed to support this syntax. For example Safari and Chrome consider those 2 URLs incorrect.

We could also export both functions toASCII and toUnicode.

ry commented Jun 13, 2011

Looking good. Punycode stuff should be moved to a separate file with its own copyright header and the LICENSE file should be updated.

isaacs commented Jun 13, 2011

I would like to not remove those two tests. They were added because people use the url.parse function to screen out "valid" urls from user input, so keeping invalid characters out of the hostname is important.

I agree with @ry that punycode should be in lib/punycode.js. We can update the invalid host chars regexps to only screen out the invalid characters that are in the ascii range.

jeremys added some commits Jun 14, 2011

Move punycode in its own file. Update LICENSE accordingly. Enable bac…
…k 2 test cases and update urlParse to make sure we skip only invalid chars in the ASCII range.

jeremys commented Jun 14, 2011

I've enable back the 2 test cases. I've tweak urlParse to skip only chars within the ASCII range, maybe the way I did it is a little bit crappy? By doing this I can apply IDNA encode after the validation and I'm sure the encoding is done on the hostname only.

ry commented Jun 14, 2011

I'm uncomfortable with pulling code off a random website like this and attaching a random copyright header to it. Especially since the author says he created it by modifying a python version. Who created the python version - what's its license? Does it allow that?

Can we please rewrite this from scratch using only the RFC?

jeremys commented Jun 14, 2011

He derived his work from the C algorithm that come from the RFC (see: http://tools.ietf.org/html/rfc3492#appendix-C)
About headers, I've put the same headers as the one in the C implementation of Punycode in libidn, see: http://www.gnu.org/software/libidn/doxygen/punycode_8c_source.html

If you think it's not enough, I'll rewrite the whole implementation of this based on the RFC and the C sample provided in it.

Member

bnoordhuis commented Jun 14, 2011

@ry Punycode or full IDNA?

Member

bnoordhuis commented Jun 14, 2011

@jeremys I have some C code I could tidy up and publish but I'll hold off if you pick up this one.

Member

bnoordhuis commented Jun 20, 2011

The one-stop MIT-licensed solution to all your JS punycode encoding and decoding needs: https://gist.github.com/1035853

jeremys commented Jun 20, 2011

Ok, I'll replace the concerned code with your gist into my pull request. Thank you very much @bnoordhuis.

jeremys commented Jun 20, 2011

So I've replaced the punycode lib with @bnoordhuis's one. I've just added one more test case. And updated your code to pass JSLint. I've also moved the parsing of the domain and detection to use punycode into url.js.

Let me know if it needs anything else.

jeremys commented Jun 27, 2011

Hey there, is there something wrong with my pull request that I should change?

Member

bnoordhuis commented Jun 27, 2011

@jeremys: Ryan is on holiday so it's quiet on the node front.

jeremys commented Jun 27, 2011

Ok @bnoordhuis, thanks for the info, I was just wondering if I was holding up something :)

jeremys commented Jul 6, 2011

@ry with the release of 0.5.0 should I update my pull request? or is there something wrong with it?

isaacs commented Jul 6, 2011

@jeremys Nothing wrong with it. Just been buried under some other things, sorry.

jeremys commented Jul 6, 2011

No worries!

ry commented Jul 6, 2011

I haven't tested - but after a cursory glance it looks good to me.

@bnoordhuis or @isaacs can test and land at will.

isaacs commented Jul 6, 2011

merging this now.

@isaacs isaacs closed this in 2a848fa Jul 6, 2011

isaacs commented Jul 6, 2011

Squashed into 2a848fa

isaacs added a commit to isaacs/node-v0.x-archive that referenced this pull request Jul 19, 2011

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.