Proposed IPLD changes #111

davidar · 2016-06-13T02:21:01Z

This patch fixes some of the inconsistencies in the IPLD spec, as well as resolving some existing issues, namely:

Blocks don't have a nice size kubo#2053 - allow raw data in leaf nodes
JSON encoding corrupts non-UTF-8 binary data kubo#1582 - use EJSON for encoding (non-UTF8) binary data
mandate CBOR Strict Mode to avoid having to deal with duplicated keys

Also some possible changes to the examples, but these are more to do with suggested convention rather than IPLD itself:

change link to content (although we could also use data or something similar), as file content needn't be merkle-links, and can in fact be directly embedded into small files
removed the extraneous subfiles wrapper, as file content can be either (a merkle-link to) a bytestring, a unicode string, or a list thereof

I can split these into separate PRs if that would be more helpful.

This patch fixes some of the inconsistencies in the IPLD spec, as well as resolving some existing issues, namely: - ipfs/kubo#2053 - allow raw data in leaf nodes - ipfs/kubo#1582 - use EJSON for encoding (non-UTF8) binary data - mandate CBOR Strict Mode to avoid having to deal with duplicated keys - change `link` to `content` (although we could also `data` or something similar), as file content needn't be merkle-links, and can in fact be directly embedded into small files - removed the extraneous `subfiles` wrapper, as file content can be either (a merkle-link to) a bytestring, a unicode string, or a list thereof

davidar · 2016-06-14T07:02:03Z

@Kubuxu pointed out that this depends on whether the multicodec header is appended to data prior to hashing and storage. I'm assuming this will be the case in general (@jbenet please correct me if I've misunderstood), but I'd like to argue for an exemption for raw binary data (/bin).

Specifically, I propose that only the raw data (without the multicodec header) be stored in the repo (for the reasons discussed in ipfs/kubo#2053), with the /bin codec being marked out-of-band in some manner (cc @whyrusleeping). The multicodec header can be trivially added to the raw data when being transmitted over the wire.

I'd also like to propose that the hash for raw binary objects only be based on the raw data (I'm assuming the hash includes the multicodec header in general). This is mainly to promote interoperability with non-ipfs content addressing systems (cc @btrask). I think (?) this should be safe, as it wouldn't be possible for a peer to alter the codec without altering the hash anyway. (Alternatively multihash could reserve a bit to differentiate between raw hashes and multicodec-tagged-data hashes, perhaps?)

btrask · 2016-06-16T06:21:52Z

Sounds promising. Thanks @davidar!

mildred · 2016-06-16T09:16:49Z

I noticed the use of EJSON which is very similar to the links we now have.

To make it simple, I saw in the documentation that EJSON could represent dates and binary data this way:

{
  "d": {"$date": 1358205756553},
  "b": {"$binary": "c3VyZS4="}
}

We do represent links this way:

{
  "content": {"/": "QmCCC...222"},
  ...
}

Byt then, why not be compatible with EJSON and implement it this way?

{
  "content": {"$type": "merkle-link", "$value": "QmCCC...222"}
}

Or:

{
  "content": {"$merkle-link", "QmCCC...222"}
}

Also, don't forget that escape is needed in case there is a legitimate key in JSON that is names "$date" for example. the EJSON way of escaping is:

{
  "d": {"$escape": {"$date": "not a date"}}
}

For the format, see this page where I found more information.

Mildred Ki'Lya mildred@mildred.fr

mildred · 2016-06-16T09:22:24Z

Having native formats without custom formatting is bound to simplify implementations a lot. Basically, we would just have either CBOR with extra tags (for dates and links) or EJSON with an extra type for links. We do not invent our own syntax over JSON, we reuse EJSON.

davidar · 2016-06-16T10:08:12Z

@mildred I don't see the $type or $escape keys you mention on the page you linked?

Edit: found $escape and $type, unfortunately it seems to be poorly documented (although there is this)

mildred · 2016-06-16T12:32:30Z

On Thu, 16 Jun 2016 03:08:12 -0700
David A Roberts notifications@github.com wrote:

@mildred I don't see the $type or $escape keys you mention on the page you linked?

See this code: https://github.com/primus/ejson/blob/master/index.js#L282-331

And this README: https://github.com/sixolet/ejson/blob/master/README.md

I don't really like that EJSON does not seem to be formally described

Mildred Ki'Lya mildred@mildred.fr

mildred · 2016-06-16T14:55:52Z

Please note that in my previous example { "content": {"$merkle-link", "QmCCC...222"} } is probably not compatible with existing EJSON libraries (although it looks better). The actual compatible syntax seems to be { "content": {"$type": "merkle-link", "$value": "QmCCC...222"} }

I prefer the first syntax because it doesn't introduce a difference with native types that are represented like { "d": {"$date": 1358205756553} }. It will also be easier to parse in a stream oriented parser. Unfortunately it is not compatible with EJSON.

I'd say: let's stick to EJSON as it is implemented (even though I would prefer a slightly different syntax), but make a formal specification (because EJSON has none).

davidar · 2016-06-17T04:05:48Z

At the risk of bikeshedding over the issue of how to represent merkle links in JSON, I'm inclined to agree with @mildred's suggestion that we use an existing convention like EJSON rather than inventing our own.

I'll also note that

{ "content": {"$type": "merkle-link", "$value": "QmCCC...222"} }

maps quite naturally into YAML

content: !merkle-link QmCCC...222

whyrusleeping · 2016-06-17T05:57:44Z

ping @jbenet

nicola · 2016-07-11T12:04:28Z

how important is to be compatible with EJSON?

davidar · 2016-07-12T09:14:39Z

@nicola not sure, I just used it because it had already been mentioned in the original spec

Edit: We need some way of differentiating binary data from Unicode text to fix ipfs/kubo#1582 and EJSON seems sensible enough. Regarding merkle links, I'm inclined to agree with @mildred's argument, but I'm not overly fussed either way.

nicola · 2016-07-12T09:17:23Z

@davidar, to me ejson is just another encoding, that of course we should support. The way I see it is that a JSON with / and an EJSON with mildred construction should lead to the same CBOR, hence the same hash.

davidar · 2016-07-12T09:48:12Z

JSON with / and an EJSON with mildred construction should lead to the same CBOR, hence the same hash.

Yes, definitely.

daviddias · 2017-02-13T15:35:16Z

Closing the PR here in favour of moving the conversation to ipld/specs.

Created a issue to make sure this is handled there - ipld/specs#39

jbenet added the status/deferred Conscious decision to pause or backlog label Jun 13, 2016

This was referenced Jun 15, 2016

Added a "raw" tag for an edge node in the chunked file example. #93

Closed

Blocks don't have a nice size ipfs/kubo#2053

Closed

This was referenced Jun 16, 2016

Add NodeReader interface ipld/go-ipld-deprecated#19

Closed

Add EJSON spec #116

Closed

daviddias mentioned this pull request Feb 13, 2017

Handle IPLD changes proposed by @davidar and @mildred ipld/specs#39

Closed

daviddias closed this Feb 13, 2017

daviddias removed the status/deferred Conscious decision to pause or backlog label Feb 13, 2017

thisconnect mentioned this pull request Aug 25, 2017

reading data via object get vs. cat vs. object data ipfs/kubo#4115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed IPLD changes #111

Proposed IPLD changes #111

davidar commented Jun 13, 2016 •

edited

Loading

davidar commented Jun 14, 2016 •

edited

Loading

btrask commented Jun 16, 2016

mildred commented Jun 16, 2016

mildred commented Jun 16, 2016

davidar commented Jun 16, 2016 •

edited

Loading

mildred commented Jun 16, 2016

mildred commented Jun 16, 2016

davidar commented Jun 17, 2016

whyrusleeping commented Jun 17, 2016

nicola commented Jul 11, 2016

davidar commented Jul 12, 2016 •

edited

Loading

nicola commented Jul 12, 2016 •

edited

Loading

davidar commented Jul 12, 2016

daviddias commented Feb 13, 2017

Proposed IPLD changes #111

Proposed IPLD changes #111

Conversation

davidar commented Jun 13, 2016 • edited Loading

davidar commented Jun 14, 2016 • edited Loading

btrask commented Jun 16, 2016

mildred commented Jun 16, 2016

mildred commented Jun 16, 2016

davidar commented Jun 16, 2016 • edited Loading

mildred commented Jun 16, 2016

mildred commented Jun 16, 2016

davidar commented Jun 17, 2016

whyrusleeping commented Jun 17, 2016

nicola commented Jul 11, 2016

davidar commented Jul 12, 2016 • edited Loading

nicola commented Jul 12, 2016 • edited Loading

davidar commented Jul 12, 2016

daviddias commented Feb 13, 2017

davidar commented Jun 13, 2016 •

edited

Loading

davidar commented Jun 14, 2016 •

edited

Loading

davidar commented Jun 16, 2016 •

edited

Loading

davidar commented Jul 12, 2016 •

edited

Loading

nicola commented Jul 12, 2016 •

edited

Loading