Displaying internationalized domain names in Unicode format #763

Open
iiska opened this Issue Jun 9, 2014 · 11 comments

Projects

None yet

9 participants

@iiska
iiska commented Jun 9, 2014

It would be nice if IDN domains could be displayed in Unicode format when listing verified domains.

Currently Keybase website displays them using ascii punycode like shown in attached screenshot.

Eg. Punycode.js library could be used for conversion.

keybase_-_2014-06-09_10 57 45

@malgorithms
Contributor

hi @iiska - we don't present non-ascii characters for the same reason we present usernames, domains, etc., all in lowercase. See #397 - we'd rather things looked worse but were less likely to fail a human review for matching. For example, it would be easy for you to pretend you owned keybase.io or some other domain just by having a unicode char in there that looked pretty much the same as one of the ascii chars.

We might revisit this in the future - for example, presenting it in unicode but highlighting any chars that aren't in ascii, but for now we think this is the right move for security reasons.

@boegh
boegh commented Jun 9, 2014

Couldn't a working solution be to show both parts, with the decoded version in parenthesis?
So it would show 'xn--niemel-gua.fi (decodes to: http://niemelä.fi/)'?

I would think that the xn-something-something version would easier fail a human review, than the translated version (most people know the translated version) and when I click the link I end up on http://niemelä.fi/ rather than xn--niemel-gua.fi anyway (Firefox 29.0.1). Having both makes for good checking options both web-wise and DNS-wise.

@malgorithms
Contributor

Couldn't a working solution be to show both parts, with the decoded version in parenthesis

What we've been talking about, most likely, as a solution to this is (1) show the unicode version on mouseover (perhaps in parens, as you suggest, on touch devices), and (2) link to the unicode version.

Showing both side by side is presentation-wise too much, I think, and I fear it might make people ignore the ascii representation.

I would think that the xn-something-something version would easier fail a human review

Hmm....I think I'll disagree on this. Allowing unicode as the default display allows 2 completely different domains be printed with effectively no difference between them. If we simply printed the unicode version with no other indicator, it would be trivial for you to impersonate almost any site, to a pixel. With the way we're doing it now, faking an ascii site (which most sites are) is impossible; faking a unicode site is possible if you can convince the reader not to check the ascii representation of it (but still perhaps harder than if we printed the unicode.)

But either way your point highlights the seriousness of this -- we need to minimize the attack angles here and take this very seriously.

@malgorithms
Contributor

(I am leaving this open as we think about it; the discussion definitely isn't closed)

@zQueal
Member
zQueal commented Jun 10, 2014

(1) show the unicode version on mouseover (perhaps in parens, as you suggest, on touch devices)

Although my opinion on the matter probably has little sway, I think this could be a happy medium.

@rummik
rummik commented Jun 10, 2014

I definitely like the unicode version in parens idea.

@orcmid
orcmid commented Jun 15, 2014

I think you should be using the conventions for Internationalized Resource Identifiers. There is a canonical way of folding IRIs so that they are ASCII-clean URIs using %-escaping and being completely unambiguous. (This will not solve the http: vs. https: scheme name problem, in another issue, however.) This is likely happening on the wire anyhow, since HTTP headers are all ASCII. The %-escaping is for non-ASCII UTF-8 byte codes and for the few ASCII codes that are not permitted in URIs unless escaped.

The applicable specification is RFC 3987, at http://www.ietf.org/rfc/rfc3987.txt

@Bengt
Bengt commented Oct 25, 2014

While I do not follow all of the discussion here, I would like to argue that the ascii representation of internationalized domain names are somewhat hard to read for humans, too. I still do not know how to spell xn--bengtlers-v9a.de (bengtlüers.de) correctly and I have had that domain for 10 years or so.

People who are german-speaking and get to know me, I am "Bengt Lüers with an Ü and an E". That immediately sticks because it is odd. When they click that link their browser most likely translates the domain name to an unicode representation that they can easily validate visually. I do not know how well that might work for other names, but in my case the obvious solution was to have another non-international domain name (bengt.me) pointing to the same place, by that converting an ascii domain name to an domainname with an Umlaut. This is not only useful in situations where one does not have umlauts on the keyboard at hand but also for visual validation.

@habi
habi commented Feb 26, 2015

I would also like to have a "nicer" representation of IDN domain names, since I am also one of the people with a middle-european funky Umlaut-name.
I think the representation in parens or punycode on mouse-over would be nice.

screen shot 2015-02-26 at 20 04 54

@abjugard
abjugard commented Feb 6, 2016

How about just allowing certain characters for certain top level domains? This is how Google Chrome does it. For .se the characters åäö are allowed to be displayed as they are part of our alphabet.

E.g. if xn--domnnamn-2za.se is tld .se, and conversion from punycode (domännamn.se) only contains [a..z]+[åäö], then it's ok, display domännamn.se instead of punycode. But if for example xn--bengtlers-v9a.se was added, it would not be ok because it contains a character not in the swedish alphabet, in this case xn--bengtlers-v9a.se would be displayed instead of the domain converted from punycode.

@Bengt
Bengt commented Feb 7, 2016

@abjugard Allowing certain characters for certain tlds is an interesting idea. There are however many cornercases to be considered. For example, xn--spa-7ka.de would be valid, because "ß" is in the German alphabeth. On the other hand, xn--spa-7ka.ch would not be valid, because the "ß" is not used in switzerland, anymore, although they technically use the german alphabeth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment