Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unicode/punycode #83

Closed
sshine opened this issue Dec 20, 2019 · 4 comments
Closed

Handle unicode/punycode #83

sshine opened this issue Dec 20, 2019 · 4 comments

Comments

@sshine
Copy link

sshine commented Dec 20, 2019

TL;DR: Is there any interest in conversion to punycode in the whois program?


The lead-up to this question:

When I whois ål.dk, DK-Hostmaster's WHOIS server sends back

No entries found for the selected source.

even though their web-interface says that the domain is taken.

There are three ways I should be able to get a positive result here:

  1. Had I converted to punycode, whois xn--l-1fa.dk does work.

  2. Had I used ISO-8859-15, e.g.

    $ whois "$(echo -n "ål.dk" | iconv -f UTF-8 -t ISO-8859-15)"
    

    that would have worked as well.

  3. (Doesn't work, notified.) Following DK-Hostmaster's own UTF-8 recommendations,

    # Assuming the 'å' is made with UTF-8, this should work but doesn't
    $ whois " --charset=utf8 ål.dk"
    

Now, I really don't know if --charset is common in WHOIS servers. In fact, since .dk is one of the very few TLDs that have any special rules in whois (--show-handles), I actually doubt it. I assume we're in absolutely-no-standards-land and any general support here is futile.

So without having done extensive surveys (we could), an alternative to sending arbitrary unicodey bytestrings and hope for the best, one could punycode them. This is much easier in my Haskell library since I can assume UTF-8, and not so easy for the present whois program, since we also have to figure out the calling terminal's encoding first (and possibly forego conversion if we can't.)

Some TLDs limit allowable non-standard extended characters which makes guessing without knowing the encoding easier. For example, if .dk only allows æøåöäüé, I doubt there is any overlap in the way those letters are encoded. Still, I'd prefer a generalised method over any TLD-specific knowledge, since there are so many TLDs with special behavior to keep track of.

@rfc1036
Copy link
Owner

rfc1036 commented Dec 20, 2019

whois has supported IDN since 2003. I am quite sure that .dk used to work and indeed the code does contain "special rules" for it, so maybe they changed something on their side.

Feel free to investigate, or else I will get to this later. Check the source and use "whois --verbose".

@sshine
Copy link
Author

sshine commented Dec 20, 2019

I'll investigate. Thanks for your quick response.

@rfc1036
Copy link
Owner

rfc1036 commented Dec 31, 2019

Works for me. Probably you compiled whois without IDN support.

@rfc1036 rfc1036 closed this as completed Dec 31, 2019
@sshine
Copy link
Author

sshine commented Jan 3, 2020

Yes, apparently Ubuntu 18.04 does. The one I have available on Debian 10 is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants