-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cbor tags? #7
Comments
You're holding it wrong! 😀 In refmt, the theory is that object mapping/marshalling/unmarshalling is completely decoupled from the serial format, so we don't have tags of "cbor" or "json". They're all just "refmt". So a pretty mundane example struct with tags from another project looks like this: type OutputSpec struct {
PackType string `refmt:"packtype"`
Filters string `refmt:",omitempty"`
} The format of tags is the same as the Go standard library json tags; they're just named "refmt". (Additionally, there's technically support for custom tag names using Sidenote (if you didn't already notice): you may also want to note that the default field name mapping is slightly different than Go standard lib. refmt maps the first character of the name to lowercase by default. I'm of the opinion "it's probably what you wanted". So, a field called |
OH. <<...out-of-band epiphany...>> You mean CBOR Tags. Sorry; I got hung up on the name similarity with golang struct tags. Okay, CBOR tags: Yep, not supported right now. Tag support will be coming up soon though -- working on it now! I'll post progress updates as they occur. Here's the outline on what I currently expect to support:
More news soon. |
Our primary usecase of tags is to denote where an ipld link is. Basically, when converting from cbor into an in memory object, anything with a certain tag needs to become an instantiation of a certain type, filled with the data the tag is wrapping. |
We use two fields in the Token struct. This is to resolve ambiguity with cbor tags of the int value zero. An earlier iteration of the design tried to get away with one field, and use -1 as a sentinel value, but this generated a *lot* of implementation pain, because that sentinel value would need initialization e v e r y w h e r e in the entire project's codebase, whereas the use of tags is confined to a very small area. It's not worth optimizing for the extra bytes in struct layout; we reuse the bejezus out of the token struct already anyway. The json decoder chooses *not* to bother to re-false the Token.Tagged field -- it just believes that you'll hand it a roughly zero'd token, and not inject screwyness. I think this is sensible and fine given that all the actual user-facing methods for initializing full marshal/unmarshal systems do in fact allocate one plain Token, and it's never reused with completely different implementations of decoder logic which would actually give that field a way to become set. Partially fixes #7. Signed-off-by: Eric Myhre <hash@exultant.us>
We should now be able to round-trip cbor tags losslessly in a cbor<->tokenstream<->cbor decode/encoder. This is good; but also fairly useless, because nothing but cbor supports these tags; no other serializers, nor have we yet implemented any way to alter the behavior of obj unmarshal in response to them, nor yet obj marshal to emit them. Partially fixes #7. Signed-off-by: Eric Myhre <hash@exultant.us>
Oh, wow, thanks github, that's totally an auto-behavior I meant for you to have. |
CBOR tags, both deserializing and serializing, is now in on It's not yet wired to any marshaller/unmarshaller stuff. |
So, the behavior you describe sounds defined if deserializing into a Let's say we have some cbor message roughly like: If we set up the unmarshaller with a custom behavior to take tag=123 as a hint to produce
So far so good. What should the unmarshaller do if you give it that same message and a handle to a
The unmarshaller can't stuff a TypeFoo into a TypeQuux. What's the correct behavior? |
If However, it might be worth making it possible to specify levels of strictness for tags. That is, when registering a tag, one could say specify (also, sorry for taking so long to get back to you on this, we've been a bit busy) |
Ok, a bunch of this stuff should now be in and workable on master! You can see an example combined with some other advanced usage here in the test suite:
This atlas building snippet will give you the power to have...
So this is pretty cool. There may still be paths where support for tag behavior is spotty, but if you uncover any, open an issue and we'll keep expanding the text fixture coverage and the features to match. Aside: Despite supporting this, I also feel obliged to mention at least in passing that I think using CBOR tags is Probably A Bad Idea in most applications. You can do it. But I wouldn't do it without spending serious thought on the tradeoffs. One of the major selling points of CBOR is its isomorphism to JSON -- it's easy to convert CBOR to JSON; and in most cases it's easy to convert the other way as well, and thus it's both simple and correct to consider JSON as an easy way for humans to author raw structures (that can then be canonicalized into CBOR). This breaks down with tags: there's no way to take a CBOR object with tags and convert it to a JSON object losslessly short of doing a giant schema expansion where every object is expanded into a tuple of The |
TL;DR: We're only using one tag and we're using a "reserved" key in JSON to represent it. That is,
Don't worry, we have. We're only going to use one tag and we're only doing that because we don't want to bend over backwards and make our system worse just to support JSON. For some background, we're using CBOR as an (well, the "default") encoding for a merkle-linked structured data system we call IPLD. IPLD is basically a meta-system for understanding (reading, writing, traversing, querying, etc.) any merkle-linked data structure like, e.g., git and ethereum. We've chosen CBOR as the "default" encoding for new applications built on top of this system as it's flexible, compact, and schema-less (i.e., JSON but better). However, to do this, we need a way to efficiently represent merkle-links. In JSON, we're reserving the special key "/" and using
Many of our applications need to efficiently store binary data so the ship has already sailed on that to some extent. We've been talking about adding a special syntax for representing binary data in JSON (e.g. Really, in my view, the selling point isn't JSON compatibility, it's JavaScript/Python/etc. object compatibility. That is, CBOR maps cleanly to the standard "object" structures used in most dynamically typed languages. |
Also, this is awesome and this library is awesome (and will make our lives so much easier). Thanks! |
I see... Yeah, that's a fair cop. (The default, generalized admonition stands for Anyone Else on the wide internets who ends up here by searching for "cbor tags" though...) So... I will tentatively close this issue then? :D I think all the core path for the features you need is in place now. If you find stuff missing, more issues welcome! |
Yep. Thanks! |
Is this a thing I can do?
The text was updated successfully, but these errors were encountered: