Composable and future-proof network addresses
Captain: @lgierth
Multiaddr aims to make network addresses future-proof, composable, and efficient.
Current addressing schemes have a number of problems.
- They hinder protocol migrations and interoperability between protocols.
- They don't compose well. There are plenty of X-over-Y constructions, but only few of them can be addressed in a classic URI/URL or host:port scheme.
- They don't multiplex: they address ports, not processes.
- They're implicit, in that they presume out-of-band values and context.
- They don't have efficient machine-readable representations.
Multiaddr solves these problems by modelling network addresses as arbitrary encapsulations of protocols.
- Multiaddrs support addresses for any network protocol.
- Multiaddrs are self-describing.
- Multiaddrs conform to a simple syntax, making them trivial to parse and construct.
- Multiaddrs have human-readable and efficient machine-readable representations.
- Multiaddrs encapsulate well, allowing trivial wrapping and unwrapping of encapsulation layers.
Multiaddr was originally thought up by @jbenet.
Multiaddrs are parsed from left to right, but they should be interpreted right
to left. Each component of a multiaddr wraps all the left components in its
context. For example, the multiaddr /dns4/example.com/tcp/1234/tls/ws/tls
(ignore the double encryption for now) is interpreted by taking the first tls
component from the right and interpreting it as the libp2p security protocol to
use for the connection, then passing the rest of the multiaddr to the websocket
transport to create the websocket connection. The websocket transport sees
/dns4/example.com/tcp/1234/tls/ws/
and interprets the tls
in this context to
mean that this is going to be a secure websocket connection. The websocket
transport also gets the host to dial along with the tcp port from the rest of
the multiaddr.
Components to the right can also provide parameters to components to the left,
since they are in charge of the rest of the multiaddr's interpretation. For
example, in /ip4/1.2.3.4/tcp/1234/tls/p2p/QmFoo
the p2p
component has the
value of the peer id and it passes it to the next component, in this case the
tls
security protocol, as the expected peer id for this connection. Another
example is /ip4/.../p2p/QmR/p2p-circuit/p2p/QmA
, here p2p/QmA
is passed to
p2p-circuit
and then the p2p-circuit
component knows it needs to use the
rest of the multiaddr as the information to connect to the relay node.
This enables nesting and arbitrary parameters. A component can parse
arbitrary data with some encoding and pass it as a parameter to the next
component of the multiaddr. For example, we could reference a specific HTTP path
by composing path
and urlencode
components along with an http
component.
This would look like
/dns4/example.com/http/path/percentencode/somepath%2ftosomething
. The
percentencode
parses the data and passes it as a parameter to path
, which
passes it as a named parameter (path=somepath/tosomething
). A user may not
like percentencode for their use case and may prefer to use lenprefixencode
to
have the multiaddr instead look like
/dns4/example.com/http/path/lenprefixencode/20_somepath/tosomething
. This
would work the same and require no changes to the path
or http
component.
It's important to note that the binary representation of the data in
percentencode
and lenprefixencode
would be the same. The only difference is
how it appears in the human-readable representation.
-
TODO: unpack the shortcomings of URLs
- example: hostnames in https://
- can't sidestep DNS
- can't use different SNI vs. Host headers
- can't do http-over-utp
- TODO check out how http/1.1 vs. http/2 is distinguished
- rift between filesystem, web, and databases
- example: hostnames in https://
-
TODO: case study: domain fronting
-
TODO: case study: tunnelling
-
TODO: case study: http proxying
-
TODO: case study: multi-hop circuit relay
-
TODO: case study: protocol migrations (e.g. ip4/ip6, 4in6, 6in4)
Although multiaddrs are self-describing, it's possible to further encapsulate them based on context. For example in a web browser, it's obvious that, given a hostname, HTTP should be spoken. The specifics of this HTTP connection are not important (except maybe the use of TLS), and will be derived from the browser's capabilities and configuration.
- example.com/index.html
- /http/example.com/index.html
- /tls/sni/example.com/http/example.com/index.html
- /dns4/example.com/tcp/443/tls/sni/example.com/http/example.com/index.html
- /ip4/1.2.3.4/tcp/443/tls/sni/example.com/http/example.com/index.html
The resulting layers of encapsulation reflect exactly how the bidirectional stream between client and server is constructed.
Now you can imagine how based on the browser's configuration, the multiaddr might look different. For example you could use HTTP proxying or SOCKS proxying, or use domain fronting to evade censorship. This kind of proxying is of course possible without multiaddr, but only with multiaddr do we have a way of consistently addressing these networking constructions.
- Human-readable multiaddr:
(/<protoName string>/<value string>)+
- Example:
/ip4/127.0.0.1/udp/1234
- Example:
- Machine-readable multiaddr:
(<protoCode uvarint><value []byte>)+
- Same example:
0x4 0x7f 0x0 0x0 0x1 0x91 0x2 0x4 0xd2
- Values are usually length-prefixed with a uvarint
- Same example:
Multiaddr and all other multiformats use unsigned varints (uvarint). Read more about it in multiformats/unsigned-varint.
TODO: specify the encoding (byte-array to string) procedure
TODO: specify the decoding (string to byte-array) procedure
See protocols.csv for a list of protocol codes and names, and protocols/ for specifications of the currently supported protocols.
TODO: most of these are way underspecified
- /ip4, /ip6
- /ipcidr
- /dns4, /dns6
- /dnsaddr
- /tcp
- /udp
- /utp
- /tls
- /ws, /wss
- /ipfs
- /p2p-circuit
- /p2p-webrtc-star, /p2p-webrtc-direct
- /p2p-websocket-star
- /onion
- js-multiaddr - stable
- kotlin-multiaddr - stable
- go-multiaddr - stable
- java-multiaddr - stable
- haskell-multiaddr - stable
- py-multiaddr - stable
- rust-multiaddr - beta
- cs-multiaddress - alpha
- net-ipfs-core - stable
- swift-multiaddr - stable
- elixir-multiaddr - alpha
multiaddr
sub-module of Python module multiformats - alpha
TODO: reconsider these alpha/beta/stable labels
Contributions welcome. Please check out the issues.
Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.
Small note: If editing the README, please conform to the standard-readme specification.
This repository is only for documents. All of these are licensed under the CC-BY-SA 3.0 license, © 2016 Protocol Labs Inc. Any code is under a MIT © 2016 Protocol Labs Inc.