Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS hash generation #1611

Closed
hdriqi opened this issue Oct 2, 2018 · 9 comments
Closed

IPFS hash generation #1611

hdriqi opened this issue Oct 2, 2018 · 9 comments

Comments

@hdriqi
Copy link

hdriqi commented Oct 2, 2018

Type: Question

Description:

Hi, so i am trying to generate IPFS hash of a file but not using ipfs cli. So what I want to achieve is to have js module that generate IPFS hash that have exactly the same IPFS hash when using ipfs cli e.g "ipfs add -n [filename]".

The reason I'm not using js-ipfs is because its overkill, I mean what we need is just the hash generation algorithm nothing else.

I tried it using "js-ipld-dag-pb" to generate the merkle dag hash but when I compare it with "ipfs add -n", the hash result is different. Am I missing some steps in IPFS hash generation?

Steps to reproduce the error:

hash.js

const multihashes = require('multihashes')
const dagPB = require('ipld-dag-pb')

import { readFileSync } from 'fs'

const fileBuffer = readFileSync('./index.js')

dagPB.DAGNode.create(fileBuffer, (err, node) => {
	if(err) return console.error(err)
	console.log(multihashes.toB58String(node._cid.multihash)) 
	// output QmUkfFWUDiDDTX7pPpXusfVj31RwoQqSyFVgnx1cN7b14R

	// but [ipfs add index.js] output QmPPopR6vsHuujdQw5SnLY8u4GPCCJyLHLBWyoX7nAYNXn
})

index.js

console.log('hello world!')
@achingbrain
Copy link
Member

achingbrain commented Oct 2, 2018

When you use ipfs add, a UnixFS file node is created. The data from a UnixFS node is wrapped in a protobuf (hence dag-pb) which adds bytes to the DAGNode's data field and as such results in a different CID.

ipfs add does something more like:

const dagPB = require('ipld-dag-pb')
const UnixFS = require('ipfs-unixfs')

const fileBuffer = Buffer.from("console.log('hello world!')\n")
const file = new UnixFS('file', fileBuffer)

dagPB.DAGNode.create(file.marshal(), (err, node) => {
  if(err) return console.error(err)
  console.log(node._cid.toBaseEncodedString()) // nb. try to use this method instead of the multihashes module
  // output QmRFSrX7MJW5P7YjdDoe4ckEEMVMSpnR5WnFNxgbggjwH1

  // JS version:
  // $ jsipfs add index.js 
  // added QmRFSrX7MJW5P7YjdDoe4ckEEMVMSpnR5WnFNxgbggjwH1 index.js

  // Go version:
  // $ ipfs add index.js 
  // added QmRFSrX7MJW5P7YjdDoe4ckEEMVMSpnR5WnFNxgbggjwH1 index.js
})

@hdriqi
Copy link
Author

hdriqi commented Oct 2, 2018

@achingbrain thanks a lot! its working now

@hdriqi hdriqi closed this as completed Oct 2, 2018
@hdriqi
Copy link
Author

hdriqi commented Oct 3, 2018

Hi @achingbrain

the code works most of the time, but sometimes it is resulting in different IPFS hash when calculating huge files such as videos or event files code.

Here's the huge code gist that result different hash https://gist.github.com/hdriqi/7d60811b2dc40802bed70741491cd10f

the manual code result -> QmNvdWfLScJwsDe7SPG4D1bwhZgU9dVqHdouG7mhzu7Q3S
ipfs add -> QmcxS3eSGQNL7eRN5iNG4VqUnjpQnGMBNh5i2JRZP8h1Lm

So what the IPFS really do so that it generate different result. Thanks!

@hdriqi hdriqi reopened this Oct 3, 2018
@achingbrain
Copy link
Member

achingbrain commented Oct 3, 2018

Are you breaking your file up into chunks or trying to store it as one enormous DAGNode?

Please see the importer from the ipfs-unixfs-engine module for what IPFS uses.

@hdriqi
Copy link
Author

hdriqi commented Oct 3, 2018

@achingbrain yes, I do follow some codes in ipfx-unixfs-engine

const dagPB = require('ipld-dag-pb')
const UnixFS = require('ipfs-unixfs')
import {createReadStream} from 'fs'

const CID = require('cids')

const DAGLink = dagPB.DAGLink
const DAGNode = dagPB.DAGNode

const filestream = createReadStream('some-videos.mp4', { highWaterMark: 262144 })

let chunkTemp = []

filestream.on('data', (data) => {
	chunkTemp.push(data)
})

filestream.on('end', async () => {
	const leaves = await Promise.all(chunkTemp.map(async (chunk) => {
		const file = new UnixFS('file', chunk)
		
		return new Promise((resolve) => {
			DAGNode.create(file.marshal(), (err, node) => {
				if(err) return console.error(err)
				resolve({
					multihash: node.multihash,
					size: node.size,
					leafSize: file.fileSize(),
					cid: new CID(0, 'dag-pb', node.multihash),
					data: node
				})
			})
		})
	}))

	const f = new UnixFS('file')

	const links = leaves.map((leaf) => {
		f.addBlockSize(leaf.leafSize)

		let cid = leaf.cid

		if (!cid) {
			cid = new CID(0, 'dag-pb', leaf.multihash)
		}

		return new DAGLink(leaf.name, leaf.size, cid.buffer)
	})

	DAGNode.create(f.marshal(), links, (err, node) => {
		if(err) return console.log(err)
		const cid = new CID(0, 'dag-pb', node.multihash)
		console.log(cid.toBaseEncodedString())
	})
})

some smaller size file resulting in same hash, but huge files like videos giving different hash. I think I miss something in the

@achingbrain
Copy link
Member

achingbrain commented Oct 3, 2018

It looks like you're going to end up with one parent DAGNode with all the rest of the data in children - e.g. a tree one level deep. DAGNodes have a limited number of children in the IPFS implementation before adding new levels to the tree.

If you're going to handle cases like that (and also not read the entire file into memory before you start to write the chunks to IPFS) you're basically reimplementing ipfs-unixfs-engine and are probably better off using that directly.

@hdriqi
Copy link
Author

hdriqi commented Oct 3, 2018

@achingbrain now I am trying to use ipfs-unixfs-engine especially the Importer, from my understanding to use Importer we only need to pipe from readable stream. But, when I try it error occured dest.on is not a function

here's my code:

const Importer = require('ipfs-unixfs-engine').importer
import IPLD from 'ipld'
import {createReadStream} from 'fs'

IPLD.inMemory((err, ipld) => {
	const importer = Importer(ipld)

	const filestream = createReadStream('some-videos.mp4')

	filestream.pipe(importer)
        // error dest.on is not a function
})

@achingbrain
Copy link
Member

achingbrain commented Oct 3, 2018

The README is a little out of date - the importer was rewritten to be a pull stream.

const {
  importer
} = require('ipfs-unixfs-engine')
const IPLD = require('ipld')
const {
  createReadStream
} = require('fs')
const pull = require('pull-stream')
const {
  once, collect
} = pull

IPLD.inMemory((err, ipld) => {
  pull(
    once(createReadStream('./app.js')),
    importer(ipld),
    collect((err, files) => {
      console.info(err, files)
    })
  )
})

@hdriqi
Copy link
Author

hdriqi commented Oct 4, 2018

@achingbrain works like charm! though to get the same IPFS hash the input data should be inform of
{ data, content}. thank you for your guidance!

so my final code is:

const files = [ 'path1', 'path2' ]

IPLD.inMemory((err, ipld) => {
	pull(
		pull.values(files),
		pull.map((file) => ({
			path: file,
			content: toPull.source(createReadStream(file))
		})),
		importer(ipld),
		pull.collect((err, files) => {
			if(err) return console.error(err)
			const cid = new CID(0, 'dag-pb', files[0].multihash)
			console.log(cid.toBaseEncodedString())
		})
	)
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants