-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle unicode/punycode #83
Comments
whois has supported IDN since 2003. I am quite sure that .dk used to work and indeed the code does contain "special rules" for it, so maybe they changed something on their side. Feel free to investigate, or else I will get to this later. Check the source and use "whois --verbose". |
I'll investigate. Thanks for your quick response. |
Works for me. Probably you compiled whois without IDN support. |
Yes, apparently Ubuntu 18.04 does. The one I have available on Debian 10 is fine. |
TL;DR: Is there any interest in conversion to punycode in the
whois
program?The lead-up to this question:
When I
whois ål.dk
, DK-Hostmaster's WHOIS server sends backeven though their web-interface says that the domain is taken.
There are three ways I should be able to get a positive result here:
Had I converted to punycode,
whois xn--l-1fa.dk
does work.Had I used ISO-8859-15, e.g.
that would have worked as well.
(Doesn't work, notified.) Following DK-Hostmaster's own UTF-8 recommendations,
Now, I really don't know if
--charset
is common in WHOIS servers. In fact, since.dk
is one of the very few TLDs that have any special rules inwhois
(--show-handles
), I actually doubt it. I assume we're in absolutely-no-standards-land and any general support here is futile.So without having done extensive surveys (we could), an alternative to sending arbitrary unicodey bytestrings and hope for the best, one could punycode them. This is much easier in my Haskell library since I can assume UTF-8, and not so easy for the present
whois
program, since we also have to figure out the calling terminal's encoding first (and possibly forego conversion if we can't.)Some TLDs limit allowable
non-standardextended characters which makes guessing without knowing the encoding easier. For example, if .dk only allows æøåöäüé, I doubt there is any overlap in the way those letters are encoded. Still, I'd prefer a generalised method over any TLD-specific knowledge, since there are so many TLDs with special behavior to keep track of.The text was updated successfully, but these errors were encountered: