Generated CIDv0 differs from the one generated by IPFS #77

AminArria · 2018-10-05T15:03:58Z

Hi, i'm receiving some files and want to verify the CID sent, thus I'm doing (summarized):

genCID  := cid.Decode(receivedCID)
fileCID := genCID.Prefix().Sum(filecontent)
genCID.Equals(fileCID)

The CID I'm receiving is the one generated by doing ipfs add -n path/to/file, but it doesn't match the one generated by go-cid.

Something I'm doing wrong?

PS: This works fine for CIDv1

The text was updated successfully, but these errors were encountered:

Stebalien · 2018-10-05T17:54:23Z

In general, you can't expect this to work. Basically, IPFS is a filesystem built on-top-of IPLD (i.e., it creates IPLD nodes to encode blocks, indirect blocks, inodes, etc). IPFS chunks files into blocks (~256KiB by default) and builds a merkle-tree on top of these chunks (using IPLD). The CID of a file corresponds (approximately) to the hash of the root node of this merkle-tree.

In your case, I assume you're file is less than 256KiB. That means it fits in a single chunk.

In the CIDv0 case, IPFS is taking your file content and wrapping it in a protobuf datastructure then hashing it. IPFS has to do this because CIDv0 only supports one IPLD "codec" (DagPB). Every IPLD object with a V0 CID uses this same DagPB format.

CIDv1, on the other hand, supports many IPLD codecs (the specific codec used is recorded in the CID itself). In this case, because your file fits in a single block, IPFS is using the Raw codec (raw binary). That's why genCID.Prefix().Sum(content) works. In this specific case, ipfs add isn't chunking, wrapping, or re-encoding your file at all. It's just taking it as-is and directly using it as a block.

Stebalien · 2018-10-05T17:54:46Z

(Closing for tracking, please feel free to continue discussing/asking questions)

DRK3 · 2021-05-13T17:50:05Z

Hi @Stebalien and @AminArria,

I'm running into the same issue - I'm trying to generate a v0 CID that matches the one that an IPFS node would generate, but without running an IPFS node. Is there some way of doing this?

Stebalien · 2021-05-13T18:11:38Z

I believe you can use ipfs add --only-hash offline. Alternatively, you could extract that code into a separate tool. But the resulting CID depends on a lot of knobs/structures so there's no way to "predict" it other than to generate it and throw away the results as you go.

DRK3 · 2021-05-13T18:25:50Z

@Stebalien You're right, ipfs add --only-hash does indeed seem to work offline. However (and I realize now that I failed to specify this in my previous post...), I was looking for a way to do this in Go code (i.e. without the ipfs command installed on the system).

I was hoping that this would work to replicate that behaviour that the ipfs command has (with it using a protobuf):

	prefix := cid.Prefix{
		Version:  0,
		Codec:    cid.DagProtobuf,
		MhType:   mh.SHA2_256,
		MhLength: -1,
	}

	contentID, err := prefix.Sum(content)

But it seems like go-cicd ignores the Codec for v0 CIDs?

Stebalien · 2021-05-13T18:37:11Z

IPFS data is first chunked. Then a merkledag is encoded on-top of the data. The final CID relates to the root of this tree. `prefix.Sum(data)` assumes that `data` is a single node in the tree and will return the CID of that node. It won't do any IPFS chunking. To do that, you'll need to extract code from go-ipfs's add function. Also note: you really shouldn't do this. Given different options (in `ipfs add`), different chunking algorithms, different hash functions, format changes, etc., the resulting CID may be different. A given CID always points to the same file, calling `ipfs add` on the same file isn't guaranteed to produce the same CID (unless you use the exact same options, etc.).

DRK3 · 2021-05-14T16:38:07Z

@Stebalien Thanks so much for all your help and for the quick responses! And thanks for the heads up on how the CID can change depending on the IPFS config. I'll be sure to take that into account as I build my solution.

DRK3 · 2021-05-18T21:00:59Z

Hi @Stebalien,

I have a follow-up question. I was watching https://www.youtube.com/watch?v=Z5zNPwMDYGg to learn more about how adding data to IPFS works (great video, by the way).

I have a very specific use-case: generate CIDs locally that match the ones produced by the ipfs add command, with an IPFS node running only default settings (the current defaults as of today). The data I'm dealing with is very small and is guaranteed to fit within a single chunk.

What I tried doing was wrapping the data in a UnixFS file wrapper before calculating the CID, but I'm still not getting a matching CID. I've verified that the Merkle DAG should be just a single node by using https://dag.ipfs.io/, so no chunking/node balancing should be needed. It seems I'm still missing something - do you know what?

Here is a short code snippet showing exactly what I'm doing:

import (
	"github.com/ipfs/go-cid"
	"github.com/ipfs/go-unixfs"
)

func Example() {
	sampleData := []byte("content")

	unixFSWrappedSampleData := unixfs.FilePBData(sampleData, uint64(len(sampleData)))

	prefix := cid.Prefix{
		Version:  0,
		Codec:    cid.DagProtobuf,
		MhType:   mh.SHA2_256,
		MhLength: -1, // default length
	}

	contentID, _ := prefix.Sum(unixFSWrappedSampleData)

	// The CID produced here is QmXiUR1x5tZ5zk9AySV4cmD3X72M5to3gWXcx2LnCWZDRY
	// but the CID from IPFS is QmbSnCcHziqhjNRyaunfcCvxPiV3fNL3fWL8nUrp5yqwD5
        // Is there another step I'm missing?
	println(contentID.String())
}

Stebalien · 2021-05-18T21:35:36Z

These APIs are really bad, I'm so sorry.

The (current) file format wraps a protobuf within a protobuf. You've just created the inner protobuf, but you still need to create the outer one.

You need to call merkledag.NodeWithData(unixFSWrappedSampleData).Cid() where merkledag is github.com/ipfs/go-merkledag. That should produce the correct CID.

guaranteed to fit within a single chunk

Are you willing to change the defaults? If you are, you can use ipfs add --raw-leaves my_file. If you do that and the data fits into one "chunk", the CID will be equivalent to the hash (specifically, it'll be a CID with the codec set to "raw").

NOTE: "fits into one chunk" means <= 1MiB (ish). IPFS will refuse to transfer larger chunks over bitswap as we don't want to download too much data without verifying it.

DRK3 · 2021-05-19T14:29:06Z

@Stebalien Yep, that did the trick!

These APIs are really bad, I'm so sorry.

No need to apologize! Thanks for all your hard work on this awesome (and free) project!

Are you willing to change the defaults? If you are, you can use ipfs add --raw-leaves my_file. If you do that and the data fits into one "chunk", the CID will be equivalent to the hash (specifically, it'll be a CID with the codec set to "raw").

For now my requirement is to support the default IPFS settings, but this is really good to know. I'll keep this in mind in case my requirements change (and/or I need to support more configurations).

Thanks again so much for your help!

Stebalien · 2021-05-19T16:09:58Z

FYI, there's a chance this will become the default in the near future (TM). But that's been the case for a while now.

Vikram710 · 2023-11-14T10:29:38Z

@DRK3 by any chance were you able to extend the solution to programmatically get ipfs cid for multiple chunks ( my file sizes will be around 4-5 mb )

DRK3 · 2023-11-21T23:19:25Z

@Vikram710 It's been awhile, but from what I recall I didn't have the need for multiple chunks, so it may be possible but I haven't attempted it.

Stebalien added the kind/support A question or request for support label Oct 5, 2018

Stebalien closed this as completed Oct 5, 2018

RobStallion mentioned this issue Jan 23, 2019

Implement an IPFS compatible CID function in Elixir using Multihash SHA256 dwyl/cid#11

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated CIDv0 differs from the one generated by IPFS #77

Generated CIDv0 differs from the one generated by IPFS #77

AminArria commented Oct 5, 2018 •

edited

Loading

Stebalien commented Oct 5, 2018

Stebalien commented Oct 5, 2018

DRK3 commented May 13, 2021

Stebalien commented May 13, 2021

DRK3 commented May 13, 2021 •

edited

Loading

Stebalien commented May 13, 2021 via email

DRK3 commented May 14, 2021

DRK3 commented May 18, 2021 •

edited

Loading

Stebalien commented May 18, 2021

DRK3 commented May 19, 2021

Stebalien commented May 19, 2021 via email

Vikram710 commented Nov 14, 2023

DRK3 commented Nov 21, 2023

Generated CIDv0 differs from the one generated by IPFS #77

Generated CIDv0 differs from the one generated by IPFS #77

Comments

AminArria commented Oct 5, 2018 • edited Loading

Stebalien commented Oct 5, 2018

Stebalien commented Oct 5, 2018

DRK3 commented May 13, 2021

Stebalien commented May 13, 2021

DRK3 commented May 13, 2021 • edited Loading

Stebalien commented May 13, 2021 via email

DRK3 commented May 14, 2021

DRK3 commented May 18, 2021 • edited Loading

Stebalien commented May 18, 2021

DRK3 commented May 19, 2021

Stebalien commented May 19, 2021 via email

Vikram710 commented Nov 14, 2023

DRK3 commented Nov 21, 2023

AminArria commented Oct 5, 2018 •

edited

Loading

DRK3 commented May 13, 2021 •

edited

Loading

DRK3 commented May 18, 2021 •

edited

Loading