Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed IPLD changes #111

Closed
wants to merge 1 commit into from
Closed

Proposed IPLD changes #111

wants to merge 1 commit into from

Conversation

davidar
Copy link
Member

@davidar davidar commented Jun 13, 2016

This patch fixes some of the inconsistencies in the IPLD spec, as well as resolving some existing issues, namely:

Also some possible changes to the examples, but these are more to do with suggested convention rather than IPLD itself:

  • change link to content (although we could also use data or something similar), as file content needn't be merkle-links, and can in fact be directly embedded into small files
  • removed the extraneous subfiles wrapper, as file content can be either (a merkle-link to) a bytestring, a unicode string, or a list thereof

I can split these into separate PRs if that would be more helpful.

This patch fixes some of the inconsistencies in the IPLD spec, as well as resolving some existing issues, namely:

- ipfs/kubo#2053 - allow raw data in leaf nodes
- ipfs/kubo#1582 - use EJSON for encoding (non-UTF8) binary data
- mandate CBOR Strict Mode to avoid having to deal with duplicated keys
- change `link` to `content` (although we could also `data` or something similar), as file content needn't be merkle-links, and can in fact be directly embedded into small files
- removed the extraneous `subfiles` wrapper, as file content can be either (a merkle-link to) a bytestring, a unicode string, or a list thereof
@jbenet jbenet added the status/deferred Conscious decision to pause or backlog label Jun 13, 2016
@davidar
Copy link
Member Author

davidar commented Jun 14, 2016

@Kubuxu pointed out that this depends on whether the multicodec header is appended to data prior to hashing and storage. I'm assuming this will be the case in general (@jbenet please correct me if I've misunderstood), but I'd like to argue for an exemption for raw binary data (/bin).

Specifically, I propose that only the raw data (without the multicodec header) be stored in the repo (for the reasons discussed in ipfs/kubo#2053), with the /bin codec being marked out-of-band in some manner (cc @whyrusleeping). The multicodec header can be trivially added to the raw data when being transmitted over the wire.

I'd also like to propose that the hash for raw binary objects only be based on the raw data (I'm assuming the hash includes the multicodec header in general). This is mainly to promote interoperability with non-ipfs content addressing systems (cc @btrask). I think (?) this should be safe, as it wouldn't be possible for a peer to alter the codec without altering the hash anyway. (Alternatively multihash could reserve a bit to differentiate between raw hashes and multicodec-tagged-data hashes, perhaps?)

@btrask
Copy link

btrask commented Jun 16, 2016

Sounds promising. Thanks @davidar!

@mildred
Copy link
Contributor

mildred commented Jun 16, 2016

I noticed the use of EJSON which is very similar to the links we now have.

To make it simple, I saw in the documentation that EJSON could represent dates and binary data this way:

{
  "d": {"$date": 1358205756553},
  "b": {"$binary": "c3VyZS4="}
}

We do represent links this way:

{
  "content": {"/": "QmCCC...222"},
  ...
}

Byt then, why not be compatible with EJSON and implement it this way?

{
  "content": {"$type": "merkle-link", "$value": "QmCCC...222"}
}

Or:

{
  "content": {"$merkle-link", "QmCCC...222"}
}

Also, don't forget that escape is needed in case there is a legitimate key in JSON that is names "$date" for example. the EJSON way of escaping is:

{
  "d": {"$escape": {"$date": "not a date"}}
}

For the format, see this page where I found more information.

Mildred Ki'Lya mildred@mildred.fr

@mildred
Copy link
Contributor

mildred commented Jun 16, 2016

Having native formats without custom formatting is bound to simplify implementations a lot. Basically, we would just have either CBOR with extra tags (for dates and links) or EJSON with an extra type for links. We do not invent our own syntax over JSON, we reuse EJSON.

@davidar
Copy link
Member Author

davidar commented Jun 16, 2016

@mildred I don't see the $type or $escape keys you mention on the page you linked?

Edit: found $escape and $type, unfortunately it seems to be poorly documented (although there is this)

@mildred
Copy link
Contributor

mildred commented Jun 16, 2016

On Thu, 16 Jun 2016 03:08:12 -0700
David A Roberts notifications@github.com wrote:

@mildred I don't see the $type or $escape keys you mention on the page you linked?

See this code: https://github.com/primus/ejson/blob/master/index.js#L282-331

And this README: https://github.com/sixolet/ejson/blob/master/README.md

I don't really like that EJSON does not seem to be formally described

Mildred Ki'Lya mildred@mildred.fr

@mildred
Copy link
Contributor

mildred commented Jun 16, 2016

Please note that in my previous example { "content": {"$merkle-link", "QmCCC...222"} } is probably not compatible with existing EJSON libraries (although it looks better). The actual compatible syntax seems to be { "content": {"$type": "merkle-link", "$value": "QmCCC...222"} }

I prefer the first syntax because it doesn't introduce a difference with native types that are represented like { "d": {"$date": 1358205756553} }. It will also be easier to parse in a stream oriented parser. Unfortunately it is not compatible with EJSON.

I'd say: let's stick to EJSON as it is implemented (even though I would prefer a slightly different syntax), but make a formal specification (because EJSON has none).

@davidar
Copy link
Member Author

davidar commented Jun 17, 2016

At the risk of bikeshedding over the issue of how to represent merkle links in JSON, I'm inclined to agree with @mildred's suggestion that we use an existing convention like EJSON rather than inventing our own.

I'll also note that

{ "content": {"$type": "merkle-link", "$value": "QmCCC...222"} }

maps quite naturally into YAML

content: !merkle-link QmCCC...222

@whyrusleeping
Copy link
Member

ping @jbenet

@nicola
Copy link
Member

nicola commented Jul 11, 2016

how important is to be compatible with EJSON?

@davidar
Copy link
Member Author

davidar commented Jul 12, 2016

@nicola not sure, I just used it because it had already been mentioned in the original spec

Edit: We need some way of differentiating binary data from Unicode text to fix ipfs/kubo#1582 and EJSON seems sensible enough. Regarding merkle links, I'm inclined to agree with @mildred's argument, but I'm not overly fussed either way.

@nicola
Copy link
Member

nicola commented Jul 12, 2016

@davidar, to me ejson is just another encoding, that of course we should support. The way I see it is that a JSON with / and an EJSON with mildred construction should lead to the same CBOR, hence the same hash.

@davidar
Copy link
Member Author

davidar commented Jul 12, 2016

JSON with / and an EJSON with mildred construction should lead to the same CBOR, hence the same hash.

Yes, definitely.

@daviddias
Copy link
Member

Closing the PR here in favour of moving the conversation to ipld/specs.

Created a issue to make sure this is handled there - ipld/specs#39

@daviddias daviddias closed this Feb 13, 2017
@daviddias daviddias removed the status/deferred Conscious decision to pause or backlog label Feb 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants