Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated CIDv0 differs from the one generated by IPFS #77

Closed
AminArria opened this issue Oct 5, 2018 · 13 comments
Closed

Generated CIDv0 differs from the one generated by IPFS #77

AminArria opened this issue Oct 5, 2018 · 13 comments
Labels
kind/support A question or request for support

Comments

@AminArria
Copy link

AminArria commented Oct 5, 2018

Hi, i'm receiving some files and want to verify the CID sent, thus I'm doing (summarized):

genCID  := cid.Decode(receivedCID)
fileCID := genCID.Prefix().Sum(filecontent)
genCID.Equals(fileCID)

The CID I'm receiving is the one generated by doing ipfs add -n path/to/file, but it doesn't match the one generated by go-cid.

Something I'm doing wrong?

PS: This works fine for CIDv1

@Stebalien
Copy link
Member

In general, you can't expect this to work. Basically, IPFS is a filesystem built on-top-of IPLD (i.e., it creates IPLD nodes to encode blocks, indirect blocks, inodes, etc). IPFS chunks files into blocks (~256KiB by default) and builds a merkle-tree on top of these chunks (using IPLD). The CID of a file corresponds (approximately) to the hash of the root node of this merkle-tree.

In your case, I assume you're file is less than 256KiB. That means it fits in a single chunk.

In the CIDv0 case, IPFS is taking your file content and wrapping it in a protobuf datastructure then hashing it. IPFS has to do this because CIDv0 only supports one IPLD "codec" (DagPB). Every IPLD object with a V0 CID uses this same DagPB format.

CIDv1, on the other hand, supports many IPLD codecs (the specific codec used is recorded in the CID itself). In this case, because your file fits in a single block, IPFS is using the Raw codec (raw binary). That's why genCID.Prefix().Sum(content) works. In this specific case, ipfs add isn't chunking, wrapping, or re-encoding your file at all. It's just taking it as-is and directly using it as a block.

@Stebalien Stebalien added the kind/support A question or request for support label Oct 5, 2018
@Stebalien
Copy link
Member

(Closing for tracking, please feel free to continue discussing/asking questions)

@DRK3
Copy link

DRK3 commented May 13, 2021

Hi @Stebalien and @AminArria,

I'm running into the same issue - I'm trying to generate a v0 CID that matches the one that an IPFS node would generate, but without running an IPFS node. Is there some way of doing this?

@Stebalien
Copy link
Member

I believe you can use ipfs add --only-hash offline. Alternatively, you could extract that code into a separate tool. But the resulting CID depends on a lot of knobs/structures so there's no way to "predict" it other than to generate it and throw away the results as you go.

@DRK3
Copy link

DRK3 commented May 13, 2021

@Stebalien You're right, ipfs add --only-hash does indeed seem to work offline. However (and I realize now that I failed to specify this in my previous post...), I was looking for a way to do this in Go code (i.e. without the ipfs command installed on the system).

I was hoping that this would work to replicate that behaviour that the ipfs command has (with it using a protobuf):

	prefix := cid.Prefix{
		Version:  0,
		Codec:    cid.DagProtobuf,
		MhType:   mh.SHA2_256,
		MhLength: -1,
	}

	contentID, err := prefix.Sum(content)

But it seems like go-cicd ignores the Codec for v0 CIDs?

@Stebalien
Copy link
Member

Stebalien commented May 13, 2021 via email

@DRK3
Copy link

DRK3 commented May 14, 2021

@Stebalien Thanks so much for all your help and for the quick responses! And thanks for the heads up on how the CID can change depending on the IPFS config. I'll be sure to take that into account as I build my solution.

@DRK3
Copy link

DRK3 commented May 18, 2021

Hi @Stebalien,

I have a follow-up question. I was watching https://www.youtube.com/watch?v=Z5zNPwMDYGg to learn more about how adding data to IPFS works (great video, by the way).

I have a very specific use-case: generate CIDs locally that match the ones produced by the ipfs add command, with an IPFS node running only default settings (the current defaults as of today). The data I'm dealing with is very small and is guaranteed to fit within a single chunk.

What I tried doing was wrapping the data in a UnixFS file wrapper before calculating the CID, but I'm still not getting a matching CID. I've verified that the Merkle DAG should be just a single node by using https://dag.ipfs.io/, so no chunking/node balancing should be needed. It seems I'm still missing something - do you know what?

Here is a short code snippet showing exactly what I'm doing:

import (
	"github.com/ipfs/go-cid"
	"github.com/ipfs/go-unixfs"
)

func Example() {
	sampleData := []byte("content")

	unixFSWrappedSampleData := unixfs.FilePBData(sampleData, uint64(len(sampleData)))

	prefix := cid.Prefix{
		Version:  0,
		Codec:    cid.DagProtobuf,
		MhType:   mh.SHA2_256,
		MhLength: -1, // default length
	}

	contentID, _ := prefix.Sum(unixFSWrappedSampleData)

	// The CID produced here is QmXiUR1x5tZ5zk9AySV4cmD3X72M5to3gWXcx2LnCWZDRY
	// but the CID from IPFS is QmbSnCcHziqhjNRyaunfcCvxPiV3fNL3fWL8nUrp5yqwD5
        // Is there another step I'm missing?
	println(contentID.String())
}

@Stebalien
Copy link
Member

These APIs are really bad, I'm so sorry.

The (current) file format wraps a protobuf within a protobuf. You've just created the inner protobuf, but you still need to create the outer one.

You need to call merkledag.NodeWithData(unixFSWrappedSampleData).Cid() where merkledag is github.com/ipfs/go-merkledag. That should produce the correct CID.

guaranteed to fit within a single chunk

Are you willing to change the defaults? If you are, you can use ipfs add --raw-leaves my_file. If you do that and the data fits into one "chunk", the CID will be equivalent to the hash (specifically, it'll be a CID with the codec set to "raw").


NOTE: "fits into one chunk" means <= 1MiB (ish). IPFS will refuse to transfer larger chunks over bitswap as we don't want to download too much data without verifying it.

@DRK3
Copy link

DRK3 commented May 19, 2021

@Stebalien Yep, that did the trick!

These APIs are really bad, I'm so sorry.

No need to apologize! Thanks for all your hard work on this awesome (and free) project!

Are you willing to change the defaults? If you are, you can use ipfs add --raw-leaves my_file. If you do that and the data fits into one "chunk", the CID will be equivalent to the hash (specifically, it'll be a CID with the codec set to "raw").

For now my requirement is to support the default IPFS settings, but this is really good to know. I'll keep this in mind in case my requirements change (and/or I need to support more configurations).

Thanks again so much for your help!

@Stebalien
Copy link
Member

Stebalien commented May 19, 2021 via email

@Vikram710
Copy link

@DRK3 by any chance were you able to extend the solution to programmatically get ipfs cid for multiple chunks ( my file sizes will be around 4-5 mb )

@DRK3
Copy link

DRK3 commented Nov 21, 2023

@Vikram710 It's been awhile, but from what I recall I didn't have the need for multiple chunks, so it may be possible but I haven't attempted it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support A question or request for support
Projects
None yet
Development

No branches or pull requests

4 participants