diff --git a/IPIP/0337-delegated-routing-http-api.md b/IPIP/0337-delegated-routing-http-api.md new file mode 100644 index 00000000..ab9df067 --- /dev/null +++ b/IPIP/0337-delegated-routing-http-api.md @@ -0,0 +1,118 @@ +# IPIP-337: Delegated Content Routing HTTP API + +- Start Date: 2022-10-18 +- Related Issues: + - https://github.com/ipfs/specs/pull/337 + +## Summary + +This IPIP specifies an HTTP API for delegated content routing. + +## Motivation + +Idiomatic and first-class HTTP support for delegated routing is an important requirement for large content routing providers, +and supporting large content providers is a key strategy for driving down IPFS content routing latency. +These providers must handle high volumes of traffic and support many users, so leveraging industry-standard tools and services +such as HTTP load balancers, CDNs, reverse proxies, etc. is a requirement. +To maximize compatibility with standard tools, IPFS needs an HTTP API specification that uses standard HTTP idioms and payload encoding. +The [Reframe spec](https://github.com/ipfs/specs/blob/main/reframe/REFRAME_PROTOCOL.md) for delegated content routing is an experimental attempt at this, +but it has resulted in a very unidiomatic HTTP API which is difficult to implement and is incompatible with many existing tools. +The cost of a proper redesign, implementation, and maintenance of Reframe and its implementation is too high relative to the urgency of having a delegated content routing HTTP API. + +Note that this does not supplant nor deprecate Reframe. Ideally in the future, Reframe and its implementation would receive the resources needed to map the IDL to idiomatic HTTP, +and implementations of this spec could then be rewritten in the IDL, maintaining backwards compatibility. + +We expect this API to be extended beyond "content routing" in the future, so additional IPIPs may rename this to something more general such as "Delegated Routing HTTP API". + +## Detailed design + +See the [Delegated Content Routing HTTP API spec](../routing/DELEGATED_CONTENT_ROUTING_HTTP.md) included with this IPIP. + +## Design rationale + +To understand the design rationale, it is important to consider the concrete Reframe limitations that we know about: + +- Reframe [method types](../reframe/REFRAME_KNOWN_METHODS.md) using the HTTP transport are encoded inside IPLD-encoded messages + - This prevents URL-based pattern matching on methods, which makes it hard and expensive to do basic HTTP scaling and optimizations: + - Configuring different caching strategies for different methods + - Configuring reverse proxies on a per-method basis + - Routing methods to specific backends + - Method-specific reverse proxy config such as timeouts + - Developer UX is poor as a result, e.g. for CDN caching you must encode the entire request message and pass it as a query parameter + - This was initially done by URL-escaping the raw bytes + - Not possible to consume correctly using standard JavaScript (see [edelweiss#61](https://github.com/ipld/edelweiss/issues/61)) + - Shipped in Kubo 0.16 + - Packing a CID into a struct, encoding it with DAG-CBOR, multibase-encoding that, percent-encoding that, and then passing it in a URL, rather than merely passing the CID in the URL, is needlessly complex from a user's perspective, and has already made it difficult to manually construct requests or interpret logs + - Added complexity of "Cacheable" methods supporting both POSTs and GETs +- The required streaming support and message groups add a lot of implementation complexity, but streaming does not currently work for cachable methods sent over HTTP + - Ex for FindProviders, the response is buffered anyway for ETag calculation + - There are no limits on response sizes nor ways to impose limits and paginate + - This is useful for routers that have highly variable resolution time, to send results as soon as possible, but this is not a use case we are focusing on right now and we can add it later +- The Identify method is not implemented because it is not currently useful + - This is because Reframe's ambition is to be a generic catch-all bag of methods across protocols, while delegated routing use case only requires a subset of its methods. +- Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions + - Example: [bug reported by implementer](https://github.com/ipld/edelweiss/issues/62), and [another one](https://github.com/ipld/edelweiss/issues/61) +- The Go implementation is [complex](https://github.com/ipfs/go-delegated-routing/blob/main/gen/proto/proto_edelweiss.go) and [brittle](https://github.com/ipfs/go-delegated-routing/blame/main/client/provide.go#L51-L100), and is currently maintained by IPFS Stewards who are already over-committed with other priorities +- Only the HTTP transport has been designed and implemented, so it's unclear if the existing design will work for other transports, and what their use cases and requirements are + - This means Reframe can't be trusted to be transport-agnostic until there is at least a second transport implemented (e.g. as a reframe-over-libp2p protocol) +- There's naming confusion around "Reframe, the protocol" and "Reframe, the set of methods" + +So this API proposal makes the following changes: + +- The Delegated Content Routing API is defined using HTTP semantics, and can be implemented without introducing Reframe concepts nor IPLD +- There is a clear distinction between the RPC protocol (HTTP) and the API (Deleged Content Routing) +- "Method names" and cache-relevant parameters are pushed into the URL path +- Streaming support is removed, and default response size limits are added. + - We will add streaming support in a subsequent IPIP, but we are trying to minimize the scope of this IPIP to what is immediately useful +- Bodies are encoded using idiomatic JSON, instead of using IPLD codecs, and are compatible with OpenAPI specifications +- The JSON uses human-readable string encodings of common data types + - CIDs are encoded as CIDv1 strings with a multibase prefix (e.g. base32), for consistency with CLIs, browsers, and [gateway URLs](https://docs.ipfs.io/how-to/address-ipfs-on-web/) + - Multiaddrs use the [human-readable format](https://github.com/multiformats/multiaddr#specification) that is used in existing tools and Kubo CLI commands such as `ipfs id` or `ipfs swarm peers` + - Byte array values, such as signatures, are multibase-encoded strings (with an `m` prefix indicating Base64) +- The "Identify" method and "message groups" are not included +- The "GetIPNS" and "PutIPNS" methods are not included + +### User benefit + +The cost of building and operating content routing services will be much lower, as developers will be able to maximally reuse existing industry-standard tooling. +Users will not need to learn a new RPC protocol and tooling to consume or expose the API. +This will result in more content routing providers, each providing a better experience for users, driving down content routing latency across the IPFS network +and increasing data availability. + +### Compatibility + +#### Backwards Compatibility + +IPFS Stewards will implement this API in [go-delegated-routing](https://github.com/ipfs/go-delegated-routing), using breaking changes in a new minor version. +Because the existing Reframe spec can't be safely used in JavaScript and we won't be investing time and resources into changing the wire format implemented in edelweiss to fix it, +the experimental support for Reframe in Kubo will be deprecated in the next release and delegated content routing will subsequently use this HTTP API. +We may decide to re-add Reframe support in the future once these issues have been resolved.- + +#### Forwards Compatibility + +Standard HTTP mechanisms for forward compatibility are used: + +- The API is versioned using a version number prefix in the path +- The `Accept` and `Content-Type` headers are used for content type negotiation, allowing for backwards-compatible additions of new MIME types, hypothetically such as: + - `application/cbor` for binary-encoded responses + - `application/x-ndjson` for streamed responses + - `application/octet-stream` if the content router can provide the content/block directly +- New paths+methods can be introduced in a backwards-compatible way +- Parameters can be added using either new query parameters or new fields in the request/response body. +- Provider records are both opaque and versioned to allow evolution of schemas and semantics for the same transfer protocol + +As a proof-of-concept, the tests for the initial implementation of this HTTP API were successfully tested with a libp2p transport using [libp2p/go-libp2p-http](https://github.com/libp2p/go-libp2p-http), demonstrating viability for also using this API over libp2p. + +### Security + +- All CID requests are sent to a central HTTPS endpoint as plain text, with TLS being the only protection against third-party observation. +- While privacy is not a concern in the current version, plans are underway to add a separate endpoint that prioritizes lookup privacy. Follow the progress in related pre-work in [IPIP-272 (double hashed DHT)](https://github.com/ipfs/specs/pull/373/) and [ipni#5 (reader privacy in indexers)](https://github.com/ipni/specs/pull/5). +- The usual JSON parsing rules apply. To prevent potential Denial of Service (DoS) attack, clients should ignore responses larger than 100 providers and introduce a byte size limit that is applicable to their use case. + +### Alternatives + +- Reframe (general-purpose RPC) was evaluated, see "Design rationale" section for rationale why it was not selected. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). diff --git a/routing/DELEGATED_CONTENT_ROUTING_HTTP.md b/routing/DELEGATED_CONTENT_ROUTING_HTTP.md new file mode 100644 index 00000000..1f441230 --- /dev/null +++ b/routing/DELEGATED_CONTENT_ROUTING_HTTP.md @@ -0,0 +1,175 @@ +# Delegated Content Routing HTTP API + +![reliable](https://img.shields.io/badge/status-reliable-green.svg?style=flat-square) Delegated Content Routing HTTP API + +**Author(s)**: + +- Gus Eggert + +**Maintainer(s)**: + +* * * + +**Abstract** + +"Delegated content routing" is a mechanism for IPFS implementations to use for offloading content routing to another process/server. This spec describes an HTTP API for delegated content routing. + +## API Specification + +The Delegated Content Routing Routing HTTP API uses the `application/json` content type by default. + +As such, human-readable encodings of types are preferred. This spec may be updated in the future with a compact `application/cbor` encoding, in which case compact encodings of the various types would be used. + +## Common Data Types + +- CIDs are always string-encoded using a [multibase](https://github.com/multiformats/multibase)-encoded [CIDv1](https://github.com/multiformats/cid#cidv1). +- Multiaddrs are string-encoded according to the [human-readable multiaddr specification](https://github.com/multiformats/multiaddr#specification) +- Peer IDs are string-encoded according [PeerID string representation specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) +- Multibase bytes are string-encoded according to [the Multibase spec](https://github.com/multiformats/multibase), and *should* use base64. +- Timestamps are Unix millisecond epoch timestamps + +Until required for business logic, servers should treat these types as opaque strings, and should preserve unknown JSON fields. + +### Versioning + +This API uses a standard version prefix in the path, such as `/v1/...`. If a backwards-incompatible change must be made, then the version number should be increased. + +### Provider Records + +A provider record contains information about a content provider, including the transfer protocol and any protocol-specific information useful for fetching the content from the provider. + +The information required to write a record to a router (*"write" provider records*) may be different than the information contained when reading provider records (*"read" provider records*). + +For example, indexers may require a signature in `bitswap` write records for authentication of the peer contained in the record, but the read records may not include this authentication information. + +Both read and write provider records have a minimal required schema as follows: + +```json +{ + "Protocol": "", + "Schema": "", + ... +} +``` + +Where: + +- `Protocol` is the multicodec name of the transfer protocol or an opaque string (for experimenting with novel protocols without a multicodec) +- `Schema` denotes the schema to use for encoding/decoding the record + - This is separate from the `Protocol` to allow this HTTP API to evolve independently of the transfer protocol + - Implementations should switch on this when parsing records, not on `Protocol` +- `...` denotes opaque JSON, which may contain information specific to the transfer protocol + +Specifications for some transfer protocols are provided in the "Transfer Protocols" section. + +## API + +### `GET /routing/v1/providers/{CID}` + +#### Response codes + +- `200` (OK): the response body contains 0 or more records +- `404` (Not Found): must be returned if no matching records are found +- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints + +#### Response Body + +```json +{ + "Providers": [ + { + "Protocol": "", + "Schema": "", + ... + } + ] +} +``` + +Response limit: 100 providers + +Each object in the `Providers` list is a *read provider record*. + +## Pagination + +This API does not support pagination, but optional pagination can be added in a backwards-compatible spec update. + +## Streaming + +This API does not currently support streaming, however it can be added in the future through a backwards-compatible update by using a content type other than `application/json`. + +## Error Codes + +- `501` (Not Implemented): must be returned if a method/path is not supported +- `429` (Too Many Requests): may be returned along with optional [Retry-After](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After) header to indicate to the caller that it is issuing requests too quickly +- `400` (Bad Request): must be returned if an unknown path is requested + +## CORS and Web Browsers + +Browser interoperability requires implementations to support +[CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS). + +JavaScript client running on a third-party Origin must be able to send HTTP +request to the endpoints defined in this specification, and read the received +values. This means HTTP server implementing this API must (1) support +[CORS preflight requests](https://developer.mozilla.org/en-US/docs/Glossary/Preflight_request) +sent as HTTP OPTIONS, and (2) always respond with headers that remove CORS +limits, allowing every site to query the API for results: + +```plaintext +Access-Control-Allow-Origin: * +Access-Control-Allow-Methods: GET, OPTIONS +``` + +## Known Transfer Protocols + +This section contains a non-exhaustive list of known transfer protocols (by name) that may be supported by clients and servers. + +### Bitswap + +Multicodec name: `transport-bitswap` +Schema: `bitswap` +Specification: [ipfs/specs/BITSWAP.md](https://github.com/ipfs/specs/blob/main/BITSWAP.md) + +#### Bitswap Read Provider Records + +```json +{ + "Protocol": "transport-bitswap", + "Schema": "bitswap", + "ID": "12D3K...", + "Addrs": ["/ip4/..."] +} +``` + +- `ID`: the [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md) to contact +- `Addrs`: a list of known multiaddrs for the peer + - This list may be incomplete or incorrect and should only be treated as *hints* to improve performance by avoiding extra peer lookups + +The server should respect a passed `transport` query parameter by filtering against the `Addrs` list. + +### Filecoin Graphsync + +Multicodec name: `transport-graphsync-filecoinv1` +Schema: `graphsync-filecoinv1` +Specification: [ipfs/go-graphsync/blob/main/docs/architecture.md](https://github.com/ipfs/go-graphsync/blob/main/docs/architecture.md) + +#### Filecoin Graphsync Read Provider Records + +```json +{ + "Protocol": "transport-graphsync-filecoinv1", + "Schema": "graphsync-filecoinv1", + "ID": "12D3K...", + "Addrs": ["/ip4/..."], + "PieceCID": "", + "VerifiedDeal": true, + "FastRetrieval": true +} +``` + +- `ID`: the peer ID of the provider +- `Addrs`: a list of known multiaddrs for the provider +- `PieceCID`: the CID of the [piece](https://spec.filecoin.io/systems/filecoin_files/piece/#section-systems.filecoin_files.piece) within which the data is stored +- `VerifiedDeal`: whether the deal corresponding to the data is verified +- `FastRetrieval`: whether the provider claims there is an unsealed copy of the data available for fast retrieval