Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode domain names #10

Closed
makew0rld opened this issue Dec 4, 2020 · 3 comments
Closed

Support Unicode domain names #10

makew0rld opened this issue Dec 4, 2020 · 3 comments
Labels
bug Something isn't working question Further information is requested

Comments

@makew0rld
Copy link
Owner

makew0rld commented Dec 4, 2020

This all applies ONLY to domain names. Only IDNs are being talked about here, not IRIs.


For example for gemini://gémeaux.bortzmeyer.org/:

Amfora claims the domain name does not exist (it does exist), "Failed to connect to the server: dial tcp: lookup gémeaux.bortzmeyer.org: no such host."

Source

The solution to this is not standardized yet, but it should probably be:

  • Punycode all domains for DNS lookup
  • Convert to Unicode for domains to server This is a weird mix that will create invalid URLs. I no longer think it makes sense.

This should be applied no matter whether the domain is actually punycoded already or in Unicode already, etc. Punycoding an already punycoded domain has no ill-effect.

Domains in link lines should maybe also be subject to this, meaning that for domains only, Unicode would be allowed in a gemtext link line? Probably not, this would be an IRI.

Punycoding can be done with golang.org/x/net/idna.

@makew0rld makew0rld transferred this issue from makew0rld/amfora Dec 6, 2020
@makew0rld
Copy link
Owner Author

There's now a summary of the issue here: gemini://gemini.bortzmeyer.org/gemini/idn.gmi.

For certs, I think both the punycoded and Unicode version of the domain should be accepted, for compatibility.

@makew0rld makew0rld added bug Something isn't working question Further information is requested labels Dec 6, 2020
@makew0rld
Copy link
Owner Author

makew0rld commented Dec 7, 2020

The idea of Unicode normalization is still being discussed. If it should happen, golang.org/x/text/unicode/norm should be used.

If normalization should happen, then it should be applied at the highest level, right after receiving user input and before anything else like sending the URL or punycoding it. (It should not happen in go-gemini.)

Only NFC normalization should be used. Not NFD, NFKC, or NFKD.

Relevant article: https://blog.golang.org/normalization

Probably what should happen is that Unicode normalization is something a client can do to be nice, but isn't required by the spec.

Questions

  • Should Unicode normalization only happen to domains, or to the whole URL?
  • What if the user named a domain/file/folder in a non-NFC way? Now does the server need to support NFC as well, and apply it to vhost recognition or local file paths to correctly match requests? That seems wrong. But so does the user entering something visually identical to what the the the sysadmin typed, and things not working.

@makew0rld
Copy link
Owner Author

This was added in 1aaf92e.

  • Domains and hosts are punycoded before being sent over the network
  • Punycode, then Unicode, is tried for verifying certs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant