Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS name binary encoding #94

Closed
ntninja opened this issue Jul 9, 2019 · 4 comments
Closed

DNS name binary encoding #94

ntninja opened this issue Jul 9, 2019 · 4 comments

Comments

@ntninja
Copy link
Contributor

ntninja commented Jul 9, 2019

As mentioned in #22, /dns, /dns4, /dns6 & /dnsaddr do not currently have an official binary encoding defined.

In py-multiaddr we currently use Unicode for the text representation and IDNA-2008/Punnycode for the binary representation. Obviously this distinction is only relevant for domains with labels containing non-ASCII characters.

How is this handled in other implementations? Is the current behaviour something that could be standardized or is another behaviour more desirable?

@Stebalien
Copy link
Member

I believe the current de-facto standard in go and js is to just encode the unicode to utf-8. IIRC, punnycode is only really needed for compatibility with the DNS protocol.

However, this may have canonization issues.

@ntninja
Copy link
Contributor Author

ntninja commented Aug 11, 2019

However, this may have canonization issues.

Canonicalizing domain name strings is specified by UTS46. In particular that specification requires one to [ship a mapping table] and decode all domain name labels starting with xn--using Punnycode to validate & lower-case their contents. So while could store the end result of this process as UTF-8 rather then IDNA, we'd still have to implement/ship most IDNA stuff anyways to ensure canonicalization of the generated values.

@Stebalien
Copy link
Member

(belatedly)

I believe most DNS resolvers will do this internally. I'm fine saying that any domain in a /dns string must have already been converted from punycode to utf-8.

(unless I'm missing something important)

@ntninja
Copy link
Contributor Author

ntninja commented Jan 12, 2020

@Stebalien and me went for standardizing UTF-8 encoding with UTS-46 normalization/canonicalization in #101.

@ntninja ntninja closed this as completed Jan 12, 2020
ntninja added a commit to ntninja/py-multiaddr that referenced this issue Jan 12, 2020
ntninja added a commit to ntninja/py-multiaddr that referenced this issue Aug 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants