-
Notifications
You must be signed in to change notification settings - Fork 12
IPLD Data Model #8
Comments
The type information in CBOR is only meant for generic types, universally understandable. It allow tagging binary strings as bigint, or integer as timestamps. It doesn't allow arbitrary tagging of objects for application specific purpose like Person. Note: there is little room to add new tags in CBOR. The number of tags is limited to a few code points, and additional codes should be registered with IANA. This is exactly why the Linked Data directives were created, to link object semantic (and not really typing) to the actual data. Semantic is defined by type URI, like XML namespaces. The URI is the unique identifier that tells how to interpret the data. JSON-LD permits more than just the CBOR tags can (as for YAML, I don't know enough). The only problem with JSON-LD is that not anything can be encoded with it. Most JSON documents can be updated to fit the LD format, but this does not always come free. This is why we are trying to construct a JSON dialect that can encode both arbitrary JSON (without LD information) and JSON-LD, while allowing us to add meta information of our own on the data. |
Uh, the cbor spec says tags 256 to 18446744073709551615 are available for registration, so it's not that limited. In any case you only need one tag to specify (schema uri, data) pairs, which I doubt would be too hard to get registered. All I'm saying is I don't think ipld should be hobbled by json's deficiencies (lack of typing information or metadata in general), given that json is only one of several possible human readable representation formats, and json isn't even involved if you're decoding wire format directly into native data structures. If you want the full extensibility of jsonld, then you're free to use that on top of ipld. However, my personal opinion is that ipld at its core should only require the necessary features for it to be able to function properly, agnostic to any particular surface encoding or programming language data model (to the extent this is possible). Otherwise it's not future proof (first it was XML, now json, ...). |
Another quote from the spec:
Which I think fits with what we're doing. |
Yes, that's not so limited indeed (I didn't knew there were so much available ids). And as you say, you can't just use one id per schema. These are just raw integers. You actually need to embed the data in an array with the first item being the schema identifier. Do you know of a CBOR extension to allow that specifically ? If so, that would be better than using JSON-LD in the first place. And we could output JSON-LD using a conversion step. I think our data model should allow the Linked Data model as a first class citizen. This model is quite universal, is used in many places, and there is an extensive vocabulary defined for it. If you look in what I did already on IPLD, I think most of it could be implemented as tagging: #7 (instead of going to the lengths of escaping the @ character in JSON keys to allow adding arbitrary directives) |
Not off the top of my head, but I can certainly investigate.
What about separating IPLD objects into metadata and data components (so you don't need to reserve special keys in the data), and you could use the metadata part for storing LD directives, etc?
Sounds good to me :) By the way, I'm not trying to change any of the JSON(-LD) stuff you've been working on, I'm just trying to make sure the core IPLD data model is as general as possible. I guess we're both coming at it with different use cases in mind (I'm interested in transparently persisting native datastructures to ipfs). |
That's what I understood we wanted (and why JSON-LD was good but not quite what we wanted). Hence, this is why we came up until now to the JSON key escaping (with |
@mildred I can't see any existing CBOR extensions for dealing with this, so I think the best approach would be to register some tag(s) with IANA. @jbenet Thoughts? I know you don't want any CBOR magic, but I think that making this separation clear in the wire format is a much nicer solution than reserved/escaped keys within the data itself. This would make it easier to map to encodings like YAML that do have extra support for (some) object metadata. It's also fully compatible with JSON, we would just move the reserved/escaped keys into the JSON encoder/decoder rather than the wire format itself. |
Having a data model that is trivially expressible in any format is key. I.e. i should be able to take any object and go into JSON, CBOR, XML, and so on, one-to-one (i.e. roundtrip). The reason for the JSON data model is that JSON is super, super easy to work with and it's used all over. (i understand this was the same for XML once.) i want to make it extremely easy to use IPFS, particularly web devs. And embedding more complex representations -- i.e. typed -- onto json can be done, and is something we could solve with JSON-LD, JSON-Schema, and so on. One thing to put you at ease is that multicodec allows us to upgrade the protocol to a new format some day, just as we're upgrading from the first protobuf fmt. Another thing to keep in mind is that this is blocking IPNS improvements and a number of other things, so we decided to move fwd with JSON data model (which is a very safe choice) to make fwd progress. We can upgrade to add typing within that model, so it's likely fine even for wanting typed things. (If i'm not seeing what you mean though please give more examples?) |
@jbenet Sure, something is better than nothing, and we can always fix it later. However, I think a small change to make the wire format a little less ugly would be easier to do now than later once entrenched. So, my reading of one of your comments is that IPLD would require a If I'm wrong and Otherwise, if required for IPLD to function, it seems like it would make sense to make One of the examples @mildred posted could look like the following YAML (for example): ipld_object:
!unixfsdir
attrs:
mode: 0775
entries:
some-dir:
!unixfsdir
attrs: ...
entries:
file@.txt:
!unixfsfile
attrs: ...
content: !mlink "/ipfs/Qm..." Decoding into JS object (for example), you'd supply appropriate constructors for Edit: I'm not trying to make it harder for webdevs --- what I'm suggesting seems easier than handing back unstructured dictionaries that they have to manually decode? |
I second this. Especially since the JSON-LD spec (where the idea of @davidar The thing about adding a type information in YAML is very nice (and you could associate the type name to the type definition the context file). Unfortunately JSON doesn't support such type informaion, and adding a directive ( The other solution is do as we do now, consider that any object with a mlink property that is a string to be a merkle-link. In any case, @jbenet, is it possible to take a decision on these points :
|
Yes, but that would be a property of the JSON encoder rather than IPLD itself |
though i agree with you, most of the web dev community does not. this is why the many js class things continue to be unused, json serializing is all still raw objects, and protobuf is also unused. what i do agree with them on is that "the simple case should be as simple as possible" -- eg. i shouldn't have to create classes or constructors or anything to serialize simply
this is not trivial to do with nice programmatic interfaces. I don't see how this would be exposed to the user nicely. I think escaping is easier to reason about. please keep in mind that the reason the web is using json (and not rdf, and not xml, and not protobuf, and not ASN.1, and not XDR, and ... ) is the level of programmatic simplicity. this is paramount.
i support types. it's why i was drawn to JSON-LD in the first place. is there any cross-serialization-format typing definition? i.e. something that's the same in JSON, YML, and so on, and trivial re-coders would get right without any special work?
depends on how we want to handle mlinks.
if
i think if we go the escape route, we should escape all |
AFAIU, just a file like:
i think. but
Yeah... just handling mlink ourselves and ditching context altogether is certainly the simplest thing to do. sadface :/ :/ |
Lots of webdevs are familiar with and use YAML though. I agree that APIs are usually JSON, but YAML is also quite common for data serialisation. I believe YAML to also have good library support, and am not aware of any concerns over programmatic simplicity.
@jbenet Both YAML and CBOR both have a tagging system that can be used to specify types quite naturally. Some quotes from the YAML (1.2) spec:
JSON doesn't have native support, but there are a few existing conventions using reserved keys that I mentioned in OP ( |
In PR #7 We came to agreement that the @davidar what tagging would you like that isn't included in JSON-LD (the spec)? JSON-LD permits already to specify a type by adding a If you want something more, I suggest you imagine a scheme involving the special |
Also, JSON-LD has provisions for a |
@mildred I don't have a problem with
And the CBOR spec recommends using tags 0 and 1 for this purpose. It seems kind of pointless using CBOR as the wire format if we aren't even going to follow the spec. @jbenet A couple of other thoughts:
|
This could be a nice thing to use that would avoid escaping characters as we plan to do. Perhaps we can do as follows:
@jbenet, what do you think if the keys to the
I would like that :-) But we ruled that out I think :-/ See discussion in PR #7
|
👍
Ah, fair enough. The discussion in #7 is quite long so I missed that point :) |
Yeah i would appreciate the separation, but we need something for making sense of the
We're not using CBOR for all the CBOR features, just like we're not providing XML encoding for all the XML features. Everything we write MUST have a 1:1 mapping to JSON. That is a strict requirement.
This might be fine, as long as we can guarantee a straightforward 1:1 mapping between the CBOR rep and the JSON rep. Again, i'm not against using features of the native formats. I'm against breaking any 1:1 mapping across the formats. IPLD is not about maximizing the utility of each native format, it's about creating something extremely easy to work with across platforms and across physical computers. Supporting the pure JSON model is one reason we moved away from protobuf (though we kept a 1:1 mapping for those objects).
No, keys like that are very, very annoying to work with. Instead, just make the datastructure have special functions defined that return things of interest. Or define a subpackage (in this repo) with lots of these nice utils.
Hmm possibly. It is linked data, even if it isn't "Linked Data". One of my goals is making "linked data" much easier to work with, which might involve using IPLD as a stepping stone toward the RDF model (with the directives re-mapping arbitrary JSON structures to JSON-LD). Also worth doing soon is a trivial layering of RDF and Turtle on top of IPLD. (i.e. using IPLD as a transport, probably a large array or something). People already asking for this. |
This is difficult, mostly because there are so many ways to do that using JSON-LD. We have few solutions:
Now, there are two ways to implement this in JSON-LD:
We can get away and not put |
@davidar please don't use my handle to document your code :p thank you! |
I would suggest registering a cbor tag specifically for that purpose, which would minimise overhead at the wire level.
I never suggested otherwise. CBOR tags can be mapped 1:1 to whatever json escaping scheme you want.
👍 That's all I've been trying to do here. IMO cbor tags would be easier to work with than escaping keys at the wire level
Hence the confusion ;) |
? |
Crosslinking with ipfs/specs#37 |
I think this issue has been resolved elsewhere, so closing |
So, reading #4, there seems to be a few separate issues:
A major deficiency in JSON is its lack of (user-defined) datatypes. Several workarounds to this issue have been proposed, by reserving a special key in each JSON object:
_type
: https://www.npmjs.com/package/typed-json__proto__
: http://tobyho.com/2009/10/02/typed-deserialization-with/It looks like the
@context
key proposed in #4 is trying to achieve the same thing.Whilst this is a reasonable solution for encoding into JSON, I don't think it should be a fundamental part of IPLD, as other representations actually have proper support for representing type information:
It would be nice if these features could be supported by IPLD.
So, on the wire, we could have CBOR-tagged data, encoding this into JSON would give something like:
or in YAML:
!Person { name: David }
and mapping into native (say, JS) datatypes:
CC: @jbenet
The text was updated successfully, but these errors were encountered: