Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add HTTP spec #508

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
bc1aa59
add HTTP spec
marten-seemann Jan 22, 2023
1f075f6
2nd attempt for server auth
marten-seemann Jan 29, 2023
12f86b8
require client to authenticate the server when doing client auth
marten-seemann Jan 29, 2023
146c09a
better motivation for libp2p+HTTP (#515)
marten-seemann Feb 14, 2023
5398f5d
fix a few typos
marten-seemann Feb 14, 2023
b6c1bc2
http: use .well-known/libp2p.json for configuration
marten-seemann Mar 2, 2023
8a57943
http: nest libp2p.json config to allow for future configuration
marten-seemann Mar 2, 2023
d506145
Merge pull request #529 from libp2p/http-well-known-configuration
MarcoPolo Jun 1, 2023
946f516
Reformat the spec from the Point of View of an implementer
MarcoPolo Jul 7, 2023
3681472
Add link
MarcoPolo Jul 7, 2023
dd5d07c
Merge comments
MarcoPolo Jul 10, 2023
46d1857
Merge pull request #556 from libp2p/marco/http-update
MarcoPolo Jul 10, 2023
ebe612c
Add note about how this is just one possible auth mechanism
MarcoPolo Jul 10, 2023
7e5a077
Add lidel to interest group
MarcoPolo Jul 14, 2023
db2b3b5
Update http/README.md
MarcoPolo Jul 17, 2023
6319458
Formatting
MarcoPolo Jul 17, 2023
c7c9c43
Add thomas
MarcoPolo Jul 17, 2023
454e25c
Use metadata map and call it protocols
MarcoPolo Jul 17, 2023
a25267b
Add mermaid diagrom for HTTP semantics vs transport
MarcoPolo Jul 17, 2023
3014b22
Grammar fixes
MarcoPolo Jul 17, 2023
f96359b
Lidel suggestions
MarcoPolo Jul 17, 2023
1e87960
Define where the libp2p-token will be
MarcoPolo Jul 17, 2023
d0f0d93
Grammar fix
MarcoPolo Jul 17, 2023
8fbd64a
Specify IX vs NX in auth scheme
MarcoPolo Jul 19, 2023
71415b0
Add SNI and HTTP_libp2p_token to Noise extensions
MarcoPolo Jul 19, 2023
4a03bb0
Reword Namespace section a bit
MarcoPolo Aug 2, 2023
877899d
Remove SNI and token from extensions
MarcoPolo Aug 2, 2023
d8850aa
update protocol name for IPFS gateway
marten-seemann Oct 4, 2023
78e8ca1
Be clear about no pipelining
MarcoPolo Mar 14, 2024
d30efda
Use SHOULD instead of MUST
MarcoPolo Mar 18, 2024
8628b5a
Update RFC for connection: close
MarcoPolo Apr 3, 2024
3c0ac40
Rename well-known
MarcoPolo Apr 3, 2024
75bc635
Add sentence on why POST and other mappings
MarcoPolo Apr 3, 2024
f95e4db
Sukun's review comments
MarcoPolo Apr 15, 2024
e3eb9dc
Small typo fixes
MarcoPolo Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
161 changes: 161 additions & 0 deletions http/README.md
@@ -0,0 +1,161 @@
# libp2p + HTTP: the spec

| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|--------------------------|--------|-----------------|
| 1A | Working Draft | Active | r0, 2023-01-23 |

Authors: [@marten-seemann]

Interest Group: [@MarcoPolo]

[@marten-seemann]: https://github.com/marten-seemann
[@MarcoPolo]: https://github.com/MarcoPolo

MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
## Introduction

This document defines how libp2p nodes can offer a HTTP endpoint next to (or instead of) their full libp2p node. Services can be offered both via traditional libp2p protocols and via HTTP, allowing a wide variety of nodes to access these services. Crucially, this for the first time, allows browsers to access libp2p services without spinning up a Web{Socket, Transport, RTC} connection first. It also allows interacting with libp2p services from environments where plain HTTP is the only option, e.g. curl from the command line, and certain cloud edge workers and lambdas.

At the same time, nodes that are already connected via a libp2p connection, will be able to (re)use this connection to issue the same kind of requests, without dialing a dedicated HTTP connection.

Any protocol that follows request-response semantics can easily be mapped onto HTTP (mapping protocols that don’t follow a request-response flow can be more challenging). Protocols are encouraged to follow best practices for building REST APIs. Once a mapping has been defined, a single implementation can be used to serve both traditional libp2p as well as libp2p-HTTP clients.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Addressing

Nodes may advertise HTTP multiaddresses to signal support for libp2p over HTTP. An address might look like this: `/ip4/1.2.3.4/tcp/443/tls/sni/example.com/http/p2p/<peer id>` (for HTTP/1.1 and HTTP/2), or `/ip4/1.2.3.4/udp/443/quic/sni/example.com/http/p2p/<peer id>` (for HTTP/3).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Nodes MUST use HTTPS (i.e. they MUST NOT use unencrypted HTTP). It is RECOMMENDED to use HTTP/2 and HTTP/3, but the protocols also work over HTTP/1.1.

Note that the peer ID in this address is 1. optional and 2. advisory and not (necessarily) verified during the HTTP handshake (depending on the HTTP client). If and when desired, clients can cryptographically verify the peer ID once the HTTP connection has been established, see [Authentication] for details on peer authentication.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Nodes can also link to a specific resource directly, similar to how a URL includes a path. This will require us to resolve [https://github.com/multiformats/multiaddr/issues/63](https://github.com/multiformats/multiaddr/issues/63) first. For example, the URL of a specific CID might be: `/ip4/1.2.3.4/tcp/443/tls/sni/example.com/http/<my data transfer protocol>/{/path/to/<cid>}`.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Namespace

libp2p does not squat the global namespace. By convention, all libp2p services are located at a well-known URL: `http://example.com/.well-known/libp2p/<service name>/<path (optional)>`.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

Putting the service name into the URL allows for future extensibility. It is easy to define new protocols, and the replace existing protocols by newer versions.

Applications MAY expose services under different URIs. For example, an application might decide to generate nicer-looking (and probably more SEO-friendly) URLs, and map paths under `[https://example.com/dht/](https://example.com/dht/)` to `https://example.com/.well-known/libp2p/kad-dht-v1/`.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

### Service Names

Traditionally, libp2p protocols have used path-like protocol identifiers, e.g. `/libp2p/autonat/1.0.0`. Due to the use of `/`s, this doesn’t work well with the naming convention defined above.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Protocols that wish to use the libp2p request-response mechanism MUST define a service name that is a valid URI component (according to RFC 8820).

In practice, this isn’t expected cause too much friction, since current libp2p protocols were not designed to use the request-reponse mechanism, and will need to make arrangements to support it anyway (e.g. define how requests and responses are serialized).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

### Privacy Properties

This leads to some very desirable properties:

1. It is possible to run libp2p alongside a normal HTTP web service, i.e. on the same domain and port, without having to worry about collisions.
1. As an on-path observer only sees SNI and ALPN, this effectively hides the fact that a client is establishing a connection in order to speak libp2p.
2. Since authentication is flexible (see below), this enables servers to
1. require authentication to (some) paths below `.well-known/libp2p`, and to enforce ACLs
2. stealth mode: return 404 for paths below `.well-known/libp2p`, *unless* the client has already authenticated itself, thereby hiding the fact that it runs a libp2p server, even if probed explicitly

## Certificates

libp2p doesn’t prescribe how nodes obtain the TLS certificate to secure the HTTPS connection. Since browsers are expected to connect to the node, the certificate’s trust chain must end in the browser’s trust store.

This is somewhat tricky in a p2p context, as nodes might not have a (sub)domain, which for many CAs is a requirement to obtain a certificate. Specifically, Let’s Encrypt doesn’t support IP certificates at the moment. ZeroSSL does, however, this requires setting up a (free) account.

To speed of server authentication, a node MAY include the libp2p TLS extension in its certificate. Note that this is currently not possible when using Let’s Encrypt, since the libp2p TLS extension is not whitelisted by LE. Not every HTTP client will have access to the TLS certificate (for example, browsers usually don’t expose an API for that), but if an HTTP client does, it SHOULD use that information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any CA that does allow this extension? (I don't think so)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there isn't any, as far as I'm aware. Once we have a strong use case and a spec, we could start reaching out to CAs to see if it's possible to be included in their whitelist.


## Authentication

Traditionally, libp2p was built on the assumption that both peers authenticate each other during the libp2p handshake. libp2p+HTTP acknowledges that this isn’t always possible, or even desirable, and that different use cases call for different authentication modes. For example, a server might offer a certain set of services to any client, like a HTTP webserver does.

### Server Authentication

This document defines a simple request-response protocol to authenticate a server. This protocol is run on an already established connection. The service name of the protocol is `server-auth` (and the URI therefore is `.well-known/libp2p/server-auth`).

The client send a POST request containing a random payload of at least 8 and up to 1024 bytes:

```json
{
"random": <multibase-encoded random bytes>
}
```

The server signs the concatenation of `libp2p-server-auth:` (in ASCII) and that payload using its host key. It then send the following response:

```json
{
"peer-id": <peer ID, in string reprensentation>,
"signature": <multibase-encoded signature>
}
```

TODO: is this the best way to encode the information? We could also put `peer-id` and / or `signature` into an HTTP header.

### Client Authentication
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

When an unauthenticated client tries to access a resource that requires authentication, the server SHOULD use a [401 HTTP status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401). The client MAY then authenticate itself using the protocol described below, and then retry the request.

Support for client authentication is an optional feature. It is expected that only a subsection of clients will implement it. For example, browser simply retrieving a few (elements of) web pages from IPFS probably won’t have any need to even generate a libp2p identity in the first place.

The protocol defined here takes 2 RTTs to authenticate the client. It is designed to be stateless on the server side. In the first round-trip, the client obtains a (pseudo-) random value from the server, which it then signs with its host key and sends back to the server, which then issues an authentication token (acting somewhat like a cookie) which can be included on future requests.

The service name is `client-auth`. For the first step, the client sends a GET request to this HTTP endpoint, and the server responds with at least 8 and up to 1024 bytes of pseudorandom data:

```json
{
"random": <multibase-encoded random bytes>
}
```

In order to keep this exchange stateless, the server SHOULD 1. include the current timestamp or an expiry data and 2. a signature in that data. This allows it to check in step 2 that it actually generated that data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused. Are you saying that the server needs to include the expiry data and their signature as part of the random field? Are you suggesting they put the useful state in the field and encrypt it? And that's the random field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this basically offloads that state to the client, in a way that the client can't manipulate.


The client signs the data received in step 1, and sends a POST request with the following JSON object to the server:

```json
{
"data": <the random bytes received from the server, multibase encoded>,
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
"peer-id": <peer ID, in string reprensentation>,
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
"signature": <multibase-encoded signature>
}
```

The server verifies the signature and issues an authentication token. In order to allow stateless operation, at the very minimum, the authentication token SHOULD contain the peer ID. It SHOULD also contain an expiry date and it MAY be bound to the client’s IP address. The token is sent in the response body.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

The client uses the auth token on requests that require client authentication, by setting the `libp2p-auth-token` HTTP header.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Mapping to libp2p Streams

libp2p services whose service is specified as request-response protocols can use a single protocol implementation to make the service available over HTTP as well as on top of libp2p streams.

The libp2p protocol identifier is `/http1.1`. After negotiating this protocol using multistream-select, nodes treat the stream as a HTTP/1.1 stream for a single HTTP request (i.e. nodes MUST NOT use request pipelining).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Outlook: Interaction with Intermediaries

One of the advantages of running HTTP is that there’s widely deployed caching infrastructure (CDNs). Content-addressed data is infinitely cacheable. Assuming a properly design data transfer protocol, retrieval for CIDs could be cached by the CDN and made available via a POP (geographically) close to the user, dramatically reducing retrieval latencies.

Services SHOULD specify the caching properties (if any), and set the appropriate cache headers (according to RFC 9111).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

CDNs can also be used to increase censorship resistance, since the CDN effectively hides the IP address of the origin server. With the upcoming introduction of ECHO (Encrypted ClientHello) in TLS, all that an on-path observer will be able to see is that a client is establishing a connection to a certain CDN, but not to which domain name.

The level of delegation between the origin node and the CDN can be adjusted. In the simplest configuration, the origin node is the only node that holds the libp2p private key, thus requests to the `server-auth` protocol would be forwarded from the CDN to the origin server. In a more advanced configuration, it would be possible to move the private key to a worker on the edge of the CDN, and perform the signing operation there (thereby reducing the request latency for `server-auth` requests).

## FAQ

### Why not gRPC?

This would be the perfect fit, allowing both request-response schemes as well as variations with multiple requests and multiple responses. However, it’s not possible to use gRPC from the browser.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

### Why tie ourselves to HTTP when mapping onto libp2p? Can’t we have a more general serialization format?

We could, but rolling our own serialization comes with some costs. First of all, we’d have define how HTTP request and response header, bodies, trailers are serialized onto the wire. Most likely, we’d define a Protobuf for that. Second, once we add more features to that format, they would need to be back-ported to HTTP, so that nodes that only speak HTTP can make use of them as well.

It’s just simpler to commit to HTTP.

### Why not use HTTP/3 for the libp2p mapping?

I’d love to! This would allow us to use HTTP header compression using QPACK, and a binary format instead of a text-based one. However, HTTP/3 requires the peers to exchange HTTP/3 SETTINGS frames first, and it’s not immediately obvious when / how this would be done in libp2p. It’s also not clear how easy it would be to use HTTP/3 in JavaScript.

The good news is that once we’ve come up with a solution for these two problems, it will be rather easy to add support for HTTP/3: nodes will just offer `/http3` in addition (and one day, instead of) `/http1.1`, and nodes that support can hit that endpoint. Nothing in the implementation of the protocols will need to change, since protocols only deal with (deserialized) HTTP requests and responses.

### Can I run QUIC, WebTransport and an HTTP/3 server on the same IP and port?

Yes, once [https://github.com/libp2p/specs/issues/507](https://github.com/libp2p/specs/issues/507) is resolved.