-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add code for car serialization format #258
Conversation
Add a code for CARs so that in .storage services we could identify them by multihash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but #239 had some objections so we'll have to be careful not to steamroll through those.
oh, and do we need a v1 and v2 here? we can differentiate once we get the bytes, but do we need to know up front where multicodecs get used? |
For our use cases that seems irrelevant, as long as we can identify the version from the bytes. I suggest we go with generic car code and if we find that capturing version is important we could add version specific entries as well. |
If this code is intended to be used in a CID as |
What the point of that table column if only value allowed is “ipld” ? I suggest we start with “serialization” because it is a fact today. If we end up turning it into codec, using it in CIDs we can update that column to reflect that fact. |
The Multicodec Table is a table that is not related to CIDs. It's just a list of things that map to certain numbers. The column is there to make sense, what such a number is used for. E.g. for a Multihash, or for IPLD Codecs that can then be used in CIDs. |
I have misunderstood what you were referring to with “there” in your previous comment. Does my suggestion of starting with the “serialization” to reflect fact today and updating that as necessary in the future makes sense ? |
Can I go ahead and merge this ? Or do we still have some disagreements to resolve ? |
@@ -124,6 +124,7 @@ http, multiaddr, 0x01e0, draft, | |||
swhid-1-snp, ipld, 0x01f0, draft, SoftWare Heritage persistent IDentifier version 1 snapshot | |||
json, ipld, 0x0200, permanent, JSON (UTF-8-encoded) | |||
messagepack, serialization, 0x0201, draft, MessagePack | |||
car, serialization, 0x0202, draft, Content Addressable aRchive (CAR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we specify this is a carv1 specifically, or does this cover both car v1 and car v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify as in a table somehow or here ? If here it's supposed to be version agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but think this should be clarified before we start using this code:
Add a code for CARs so that in .storage services we could identify them by multihash
How is that multihash generated? Is it the multihash of the root block or something else (of so, how to calculate it)?
Where will the spec for this live? https://ipld.io/docs/codecs/known/?
Re
I'm OK with this as a position if it's not going to be used for CIDs (a good way to think about this column might be something like: "does the decoder yield IPLD links?", and a CAR decoder does in fact yield links). But this raises the question of what this is being used for if not CIDs? Continuing from #239, I think most of us are assuming that's what this would be for. But apparently not? So back to the original ask:
How does this help you identify by multihash? Presumably you're going to hash the bytes and the digest from that gives you the multihash. What do you need the additional identifier for if not to make CIDs? This is not a blocker btw, I think this can be merged, but the nuances might dictate needing to change that type column. I'm currently imagining this being a little like the CAR index format codes, |
To clarify why I asked, the use case I have in mind is convention where
In this convention the multihash in a CID represents the root block of a DAG, and if you plan to use |
I messed up when I said "we could identify them by multihash", because as you've all pointed out it's not really a multihash and I'm not sure we have term for it. We want to generate multihash for CAR and tag it with this code. It is true that it sounds like CID, maybe it should be CID. Yet I really want to avoid the debate of whether it is good idea to identify things larger than libp2p block size limit with a CIDs. There are tradeoffs there and I'm not sure we're prepared to evaluate them yet. I do think however that we can all agree on the fact that CAR is an established serialization format which can have it's own code. I think we'll be in a better position to debate whether CAR as an IPLD codec is good idea after we've had a chance to evaluate that in our work. And only we're convinced that it's a right choice we can discuss tradeoffs and update table field if we choose so. |
I love the idea of making gateway capable of export DAGs, but I am concerned about overloading CID codec here because:
More broadly I think it is a mistake to think of CAR as DAG serialization format. Thinking of it as block set serialization seems a lot more accurate to me. In regards to how we want to use it. We want to generate CAR multihash by hashing bytes of the file (e.g. with sha256 and tagging accordingly) and than tag that multihash with CAR code. If we tag it with CID version we'd get a CID in a more traditional sense, but again I'm not prepared to have a debate on whether we should identify large things (greater than block size limit) with CIDs or not. |
I'm going to merge this given approvals and comments suggesting no blockers here. Happy to carry on related discussions at #239 instead |
Add a code for CARs so that in .storage services we could tag multihashes with