Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-328: JSON and CBOR Response Formats on HTTP Gateways #328

Merged
merged 50 commits into from
Feb 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
bacd7d4
wip: cbor and json ipip
hacdias Oct 7, 2022
72b61e6
wip: update related issue
hacdias Oct 7, 2022
9c2aa8e
add new json and cbor response formats
hacdias Oct 18, 2022
2936f85
add removed header
hacdias Oct 18, 2022
bf97587
add summary and motivation
hacdias Oct 18, 2022
c442b6a
add detailed design
hacdias Oct 18, 2022
f0ca426
add design rationality and compatibility
hacdias Oct 18, 2022
bed5515
add some user benefit
hacdias Oct 18, 2022
be230bd
Update IPIP/0000-gateway-json-cbor-response-format.md
hacdias Oct 19, 2022
d7ff204
Update IPIP/0000-gateway-json-cbor-response-format.md
hacdias Oct 19, 2022
4c8e3ca
update application/json and /cbor
hacdias Oct 19, 2022
4563155
Update IPIP/0000-gateway-json-cbor-response-format.md
hacdias Oct 19, 2022
d2f4a61
Update IPIP/0000-gateway-json-cbor-response-format.md
hacdias Oct 19, 2022
948829b
"IPLD model" -> "IPLD Data Model"
hacdias Oct 19, 2022
08c3158
updated response payload
hacdias Oct 19, 2022
c0923c1
add some test fixtures
hacdias Oct 19, 2022
2cfe769
add links
hacdias Oct 19, 2022
c4f9fa5
add security section
hacdias Oct 20, 2022
a0c86bb
write alternatives section
hacdias Oct 20, 2022
f3dfb4b
Merge branch 'main' into feat/gateway-json-cbor
hacdias Nov 7, 2022
ce5eb3e
Merge branch 'main' into feat/gateway-json-cbor
hacdias Nov 10, 2022
628775a
fix: duplicate links
hacdias Nov 10, 2022
9581ac1
refactor: rename ipip
hacdias Nov 10, 2022
33b4bea
ipip: clarify no paths
hacdias Nov 11, 2022
378c5b9
info validation dag-json and dag-cbor
hacdias Nov 15, 2022
61f43ce
ipip: update path gateway sec
hacdias Nov 15, 2022
11a6b0d
Merge branch 'feat/gateway-json-cbor' of https://github.com/ipfs/spec…
hacdias Nov 15, 2022
566dabd
Update IPIP/0328-gateway-json-cbor-response-format.md
hacdias Nov 15, 2022
239b17b
Update IPIP/0328-gateway-json-cbor-response-format.md
hacdias Nov 16, 2022
79674a0
Update IPIP/0328-gateway-json-cbor-response-format.md
hacdias Nov 16, 2022
ea486b6
Update IPIP/0328-gateway-json-cbor-response-format.md
hacdias Nov 16, 2022
542315a
Update IPIP/0328-gateway-json-cbor-response-format.md
hacdias Nov 16, 2022
3a84f28
document current traversal behaviour
hacdias Nov 16, 2022
5720d6a
add toc link
hacdias Nov 16, 2022
79636c9
improve ipip and clarify things
hacdias Nov 16, 2022
ba09206
update path gateway
hacdias Nov 16, 2022
6d7f8f0
add 400
hacdias Nov 16, 2022
d8310e2
revert prev commit as it is already mentioned in traversal errors
hacdias Nov 16, 2022
a50536e
ipip-328: code review based on working code
lidel Nov 25, 2022
52a2caf
clarify alternatives section
hacdias Nov 28, 2022
51f4762
remove test fixtures todos
hacdias Nov 28, 2022
8007584
add UX and DX note
hacdias Nov 28, 2022
556e500
add some unixfs traversing info
hacdias Nov 28, 2022
b2ac007
ipip-328: fix title
lidel Nov 28, 2022
cfec9bd
fix: tar format
hacdias Dec 2, 2022
e4cbfd1
remove IPLD mentions from IPIP
hacdias Dec 6, 2022
e20095e
remove application/json and cbor accept types
hacdias Jan 19, 2023
46272ae
Revert "remove application/json and cbor accept types"
hacdias Jan 24, 2023
30f9631
Merge branch 'main' into feat/gateway-json-cbor
hacdias Jan 24, 2023
e61c242
ipip-328: final editorial pass
lidel Feb 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
174 changes: 174 additions & 0 deletions IPIP/0328-gateway-json-cbor-response-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# IPIP-328: JSON and CBOR Response Formats on HTTP Gateways
lidel marked this conversation as resolved.
Show resolved Hide resolved

- Start Date: 2022-10-07
- Related Issues:
- [ipfs/in-web-browsers/issues/182]
- [ipfs/specs/pull/328]
- [ipfs/kubo/issues/8823]
- [ipfs/kubo/pull/9335]
- [ipfs/go-ipfs/issues/7552]

## Summary

Add support for the [DAG-JSON], [DAG-CBOR], JSON and CBOR response formats in
the [HTTP Gateway](../http-gateways/).

## Motivation

Currently, the gateway supports requesting data in the [DAG-PB], RAW, [CAR] and
TAR formats. In addition, it allows for traversing of links encoded through CBOR
Tag 42, as long as they are intermediate links, and not the final document.
It works on both DAG-CBOR, and its JSON representation, DAG-JSON. However, it
should be possible to download deserialized versions of the final JSON/CBOR document
in raw format (not wrapped in UnixFS).

The main functional gap in the IPFS ecosystem is the lack of support for
non-UnixFS DAGs on HTTP gateways. Users are able to create custom DAGs based on
traversable DAG-CBOR thanks to [CBOR tag 42 being reserved for CIDs][cbor-42]
and DAG-JSON documents, but they are unable to load deserialized documents from
a local gateway, which is severely decreasing the utility of non-UnixFS DAGs.

Adding JSON and CBOR response types will also benefit UnixFS. DAG-PB has a
[logical format][dag-pb-format] which makes it possible to represent a DAG-PB
directory as a [DAG-JSON] document. This means that, if we support DAG-JSON in
the gateway, then we would support
[JSON responses for directory listings][ipfs/go-ipfs/issues/7552], which has been
requested by our users in the past.

In addition, this functionality is already present on the current Kubo CLI. By
bringing it to the gateways, we provide users with more power when it comes
to storing and fetching CBOR and JSON in IPFS.

## Detailed design

The solution is to allow the Gateway to support serializing data as [DAG-JSON],
[DAG-CBOR], JSON and CBOR by requesting them using either the `Accept` HTTP header
or the `format` URL query. In addition, if the resolved CID is of one of the
aforementioned types, the gateway should be able to resolve them instead of
failing with `node type unknown`.

## Test fixtures

- [`bafybeiegxwlgmoh2cny7qlolykdf7aq7g6dlommarldrbm7c4hbckhfcke`][f-dag-pb] is a
DAG-PB directory.
- [`bafkreidmwhhm6myajxlpu7kofe3aqwf4ezxxn46cp5fko7mb6x74g4k5nm`][f-dag-pb-json]
is the aforementioned DAG-PB directory's [Logical DAG-JSON representation][dag-pb-format] that
is expected to be returned when using `?format=dag-json`.

## Design rationale

The current gateway already supports different response formats via the
`Accept` HTTP header and the `format` URL query. This IPIP proposes adding
JSON and CBOR formats to that list.

In addition, the current gateway already supports traversing through DAG-CBOR
and DAG-JSON links if they are intermediary documents. With this IPIP, we aim
to be able to download the DAG-CBOR, DAG-JSON, JSON and CBOR documents
themselves, with correct `Content-Type` headers.

### User benefit
hacdias marked this conversation as resolved.
Show resolved Hide resolved

The user benefits from this change as they will now be able to retrieve
content encoded in the traversable DAG-JSON and DAG-CBOR formats. This is
something that has been [requested before][ipfs/go-ipfs/issues/7552].

In addition, both UX and DX are significantly improved, since every UnixFS directory can
now be inspected in a regular web browser via `?format=json`. This can remove the
need for parsing HTML with directory listing.

### Compatibility

This IPIP adds new response types and does not modify existing ones,
making it a backwards-compatible change.

### Security

Serializers and deserializers for the JSON and CBOR must follow the security
considerations of the original specifications, found in:

- [RFC 8259 (JSON), Section 12][rfc8259-sec12]
- [RFC 8949 (CBOR), Section 10][rfc8949-sec10]

DAG-JSON and DAG-CBOR follow the same security considerations as JSON and CBOR.
Note that DAG-JSON and DAG-CBOR are stricter subsets of JSON and CBOR, respectively.
Therefore they must follow their specification and error if the payload is not
strict enough:

- [DAG-JSON Spec][dag-json-spec]
- [DAG-CBOR Spec][dag-cbor-spec]

### Alternatives

#### Why four content types?

If we do not introduce DAG-JSON, DAG-CBOR, JSON and CBOR response formats in
the gateway, the usage of IPFS is constricted to files and directories represented
by UnixFS (DAG-PB) codec. Therefore, if a user wants to store JSON and/or CBOR
in IPFS, they have to wrap it as a UnixFS file in order to be able to fetch it
through the gateway. That adds size and processing overhead.

In addition, we could introduce only DAG-JSON and DAG-CBOR. However, not
supporting the generic variants, JSON and CBOR, would lead to poor UX. The
ability to retrieve DAG-JSON as `application/json` is an important step
for the interoperability of the HTTP Gateway with web browsers and other tools
that expect specific Content Types. Namely, `Content-Type: application/json` with
`Content-Disposition: inline` allows for JSON preview to be rendered in a web browser
and webdev tools.

#### Why JSON/CBOR pathing is limited to full blocks?

Finally, we considered supporting pathing within both DAG and non-DAG variants
of the JSON and CBOR codecs. Pathing within these documents could lead to responses
with extracts from the document. For example, if we have the document:

```json
{
"link" {
"to": {
"some": {
"cid2": <cbor tag 42 pointing at different CID>
}
}
}
}
```

With CID `bafy`, and we navigate to `/ipfs/bafy/link/to`, we would be able to
retrieve an extract from the document.

```json
{
"some": {
"cid2": <cbor tag 42 pointing at different CID>
}
}
```

However, supporting this raises questions whose answers are not clearly defined
or agreed upon yet. Right now, pathing is only supported over CID-based Links,
such as Tag 42 in CBOR. In addition, some HTTP headers regarding caching are based
on the CID, and adding extraction pathings would not be clear. Giving users the
possibility to retrieve JSON, CBOR, DAG-JSON AND DAG-CBOR documents through the
gateway is, in itself, a progress and will open the doors for new tools and explorations.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

[cbor-42]: https://github.com/core-wg/yang-cbor/issues/13#issuecomment-524378859
[DAG-PB]: https://ipld.io/docs/codecs/known/dag-pb/
[dag-pb-format]: https://ipld.io/specs/codecs/dag-pb/spec/#logical-format
[DAG-JSON]: https://ipld.io/docs/codecs/known/dag-json/
[DAG-CBOR]: https://ipld.io/docs/codecs/known/dag-cbor/
[CAR]: https://ipld.io/specs/transport/car/
[ipfs/in-web-browsers/issues/182]: https://github.com/ipfs/in-web-browsers/issues/182
[ipfs/specs/pull/328]: https://github.com/ipfs/specs/pull/328
[ipfs/kubo/issues/8823]: https://github.com/ipfs/kubo/issues/8823
[ipfs/kubo/pull/9335]: https://github.com/ipfs/kubo/pull/9335
[ipfs/go-ipfs/issues/7552]: https://github.com/ipfs/go-ipfs/issues/7552
[f-dag-pb]: https://dweb.link/ipfs/bafybeiegxwlgmoh2cny7qlolykdf7aq7g6dlommarldrbm7c4hbckhfcke
[f-dag-pb-json]: https://dweb.link/ipfs/bafkreidmwhhm6myajxlpu7kofe3aqwf4ezxxn46cp5fko7mb6x74g4k5nm
[rfc8259-sec12]: https://datatracker.ietf.org/doc/html/rfc8259#section-12
[rfc8949-sec10]: https://datatracker.ietf.org/doc/html/rfc8949#section-10
[dag-json-spec]: https://ipld.io/specs/codecs/dag-json/spec/
[dag-cbor-spec]: https://ipld.io/specs/codecs/dag-cbor/spec/
113 changes: 80 additions & 33 deletions http-gateways/PATH_GATEWAY.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ where client prefers to perform all validation locally.
- [Content resolution](#content-resolution)
- [Finding the content root](#finding-the-content-root)
- [Traversing remaining path](#traversing-remaining-path)
- [Traversing through UnixFS](#traversing-through-unixfs)
- [Traversing through DAG-JSON and DAG-CBOR](#traversing-through-dag-json-and-dag-cbor)
- [Handling traversal errors](#handling-traversal-errors)
- [Best practices for HTTP caching](#best-practices-for-http-caching)
- [Denylists](#denylists)
Expand Down Expand Up @@ -182,10 +184,10 @@ For example:
- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable raw [block](https://docs.ipfs.io/concepts/glossary/#block) to be returned
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables [IPLD codec deserialization](https://ipld.io/docs/codecs/), requests a verifiable [CAR](https://docs.ipfs.io/concepts/glossary/#car) stream to be returned
- [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing)) – returns UnixFS tree (files and directories) as a [TAR](https://en.wikipedia.org/wiki/Tar_(computing)) stream. Returned tree starts at a root item which name is the same as the requested CID. Produces 400 Bad Request for content that is not UnixFS.
<!-- TODO: https://github.com/ipfs/go-ipfs/issues/8823
- application/vnd.ipld.dag-json OR application/json – requests IPLD Data Model representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/)
- application/vnd.ipld.dag-cbor OR application/cbor - requests IPLD Data Model representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/)
-->
- [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-JSON format](https://ipld.io/docs/codecs/known/dag-json/). If the requested CID already has `dag-json` (0x0129) codec, data is validated as DAG-JSON before being returned as-is. Invalid DAG-JSON produces HTTP Error 500.
- [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor) – requests [IPLD Data Model](https://ipld.io/docs/data-model/) representation serialized into [DAG-CBOR format](https://ipld.io/docs/codecs/known/dag-cbor/). If the requested CID already has `dag-cbor` (0x71) codec, data is validated as DAG-CBOR before being returned as-is. Invalid DAG-CBON produces HTTP Error 500.
- [application/json](https://www.iana.org/assignments/media-types/application/json) – same as `application/vnd.ipld.dag-json`, unless the CID's codec already is `json` (0x0200). Then, the raw JSON block can be returned as-is without any conversion.
- [application/cbor](https://www.iana.org/assignments/media-types/application/cbor) – same as `application/vnd.ipld.dag-cbor`, unless the CID's codec already is `cbor` (0x51). Then, the raw CBOR block can be returned as-is without any conversion.

### `Range` (request header)

Expand Down Expand Up @@ -246,11 +248,14 @@ parameter, if present)

Optional, `format=<format>` can be used to request specific response format.

This is a URL-friendly alternative to sending
`Accept: application/vnd.ipld.<format>` header, see [`Accept`](#accept-request-header)
for more details.

In case of `Accept: application/x-tar`, the `?format=` equivalent is `tar`.
This is a URL-friendly alternative to sending an [`Accept`](#accept-request-header) header. These are the equivalents:
- `format=raw` → `Accept: application/vnd.ipld.raw`
- `format=car` → `Accept: application/vnd.ipld.car`
- `format=tar` → `Accept: application/x-tar`
- `format=dag-json` → `Accept: application/vnd.ipld.dag-json`
- `format=dag-cbor` → `Accept: application/vnd.ipld.dag-cbor`
- `format=json` → `Accept: application/json`
- `format=cbor` → `Accept: application/cbor`

<!-- TODO Planned: https://github.com/ipfs/go-ipfs/issues/8769
- `selector=<cid>` can be used for passing a CID with [IPLD selector](https://ipld.io/specs/selectors)
Expand Down Expand Up @@ -584,24 +589,38 @@ A good practice is to always return it with HTTP error [status codes](#response-

## Response Payload

Data sent with HTTP response depends on the type of requested IPFS resource:
Data sent with HTTP response depends on the type of the requested IPFS resource, and the requested response type.

By default, implicit deserialized response type is based on `Accept` header and the codec of the resolved CID:

- UnixFS (implicit default)
- File
- Bytes representing file contents
- UnixFS, either `dag-pb` (0x70) or `raw` (0x55)
- File or `raw` block
- Bytes representing file/block contents
- When `Range` is present, only the requested byte range is returned.
- Directory
- Generated HTML with directory index (see [additional notes here](#generated-html-with-directory-index))
- When `index.html` is present, gateway can skip generating directory index and return it instead
- Raw block
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)
- CAR
- Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
- TAR
- Deserialized UnixFS files and directories as a TAR file or a stream, see [application/x-tar](https://en.wikipedia.org/wiki/Tar_(computing))
<!-- TODO: https://github.com/ipfs/go-ipfs/issues/8823
- dag-json / dag-cbor
- See [https://github.com/ipfs/go-ipfs/issues/8823](https://github.com/ipfs/go-ipfs/issues/8823)
-->
hacdias marked this conversation as resolved.
Show resolved Hide resolved
- When `index.html` is present, gateway MUST skip generating directory index and return content from `index.html` instead.
- JSON (0x0200)
- Bytes representing a JSON file, see [application/json](https://www.iana.org/assignments/media-types/application/json).
- Works exactly the same as `raw`, but returned `Content-Type` is `application/json`
- CBOR (0x51)
- Bytes representing a CBOR file, see [application/cbor](https://www.iana.org/assignments/media-types/application/cbor)
- Works exactly the same as `raw`, but returned `Content-Type` is `application/cbor`
- DAG-JSON (0x0129)
- If the `Accept` header includes `text/html`, implementation should return a generated HTML with options to download DAG-JSON as-is, or converted to DAG-CBOR.
- Otherwise, response works exactly the same as `raw` block, but returned `Content-Type` is [application/vnd.ipld.dag-json](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-json)
- DAG-CBOR (0x71)
- If the `Accept` header includes `text/html`: implementation should return a generated HTML with options to download DAG-CBOR as-is, or converted to DAG-JSON.
- Otherwise, response works exactly the same as `raw` block, but returned `Content-Type` is [application/vnd.ipld.dag-cbor](https://www.iana.org/assignments/media-types/application/vnd.ipld.dag-cbor)

The following response types require an explicit opt-in, can only be requested with [`format`](#format-request-query-parameter) query parameter or [`Accept`](#accept-request-header) header:

- Raw Block (`?format=raw`)
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw).
- CAR (`?format=car`)
- Arbitrary DAG as a verifiable CAR file or a stream, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car).
- TAR (`?format=tar`)
- Deserialized UnixFS files and directories as a TAR file or a stream, see [IPIP-288](https://github.com/ipfs/specs/pull/288)

# Appendix: notes for implementers

Expand All @@ -627,13 +646,32 @@ and [DNSLINK_GATEWAY.md](./DNSLINK_GATEWAY.md)).

### Traversing remaining path

UnixFS pathing over files and directories is the implicit default used for
resolving content paths that start with `/ipfs/` and `/ipns/`. It allows for
traversal based on link names, which provides a better user experience than
low level logical pathing from IPLD:
After the content root CID is found, the remaining of the path should be traversed
and resolved. Depending on the data type, that may occur through UnixFS pathing,
or DAG-JSON, and DAG-CBOR pathing.

### Traversing through UnixFS

UnixFS is an abstraction over the low level [logical DAG-PB pathing][dag-pb-format]
from IPLD, providing a better user experience:

- Example of UnixFS pathing: `/ipfs/cid/dir-name/file-name.txt`

For more details regarding DAG-PB pathing, please read the "Path Resolution" section
of [this document](https://ipld.io/design/tricky-choices/dag-pb-forms-impl-and-use/#path-resolution).

### Traversing through DAG-JSON and DAG-CBOR

Traversing through [DAG-JSON][dag-json] and [DAG-CBOR][dag-cbor] is possible
through fields that encode a link:

- DAG-JSON: link are represented as a base encoded CID under the `/` reserved
namespace, see [specification](https://ipld.io/specs/codecs/dag-json/spec/#links).
- DAG-CBOR: links are tagged with CBOR tag 42, indicating that they encode a CID,
see [specification](https://ipld.io/specs/codecs/dag-cbor/spec/#links).

lidel marked this conversation as resolved.
Show resolved Hide resolved
Note: pathing into [IPLD Kind](https://ipld.io/docs/data-model/kinds/) other than Link (CID) is not supported at the moment. Implementations should return HTTP 501 Not Implemented when fully resolved content path has any remainder left. This feature may be specified in a future [IPIP that introduces data onboarding](https://github.com/ipfs/in-web-browsers/issues/189) and [IPLD Patch](https://ipld.io/specs/patch/) semantics.

### Handling traversal errors

Gateway MUST respond with HTTP error when it is not possible to traverse the requested content path:
Expand Down Expand Up @@ -693,15 +731,24 @@ It should be always fast, even when a directory has 10k of items.
The usual optimizations involve:

- Skipping size and type resolution for child UnixFS items, and using `Tsize`
from [logical format](https://ipld.io/specs/codecs/dag-pb/spec/#logical-format)
instead, allows gateway to respond much faster, as it no longer need to fetch
root nodes of child items.
- Additional information about child nodes can be fetched lazily
with JS, but only for items in the browser's viewport.
from [logical format][dag-pb-format] instead, allows gateway to respond much
faster, as it no longer need to fetch root nodes of child items.
- Instead of showing "file size" GUIs should show "IPFS DAG size". This
remains useful for quick inspection, but does not require fetching child
blocks, making directory listing fast, even with tens of thousands of
blocks. Example with 10k items:
`bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm`.
- Additional information about child nodes, such as exact file size without
DAG overhead, can be fetched lazily with JS, but only for items in the
browser's viewport.

- Alternative approach is resolving child items, but providing pagination UI.
- Opening a big directory can return HTTP 302 to the current URL with
additional query parameters (`?page=0&limit=100`),
limiting the cost of a single page load.
- The downside of this approach is that it will always be slower than
skipping child block resolution.

[dag-pb-format]: https://ipld.io/specs/codecs/dag-pb/spec/#logical-format
[dag-json]: https://ipld.io/specs/codecs/dag-json/spec/
[dag-cbor]: https://ipld.io/specs/codecs/dag-cbor/spec/