Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS Upload Integrity Check #112

Closed
lautarodragan opened this issue May 28, 2018 · 1 comment
Closed

IPFS Upload Integrity Check #112

lautarodragan opened this issue May 28, 2018 · 1 comment

Comments

@lautarodragan
Copy link
Member

lautarodragan commented May 28, 2018

Problem

Currently when the POST /works is requested API will dispatch a message to RMQ that is picked up by Storage, which in turn calls the following code:

async create(claim: Claim): Promise<void> {
const logger = this.logger.child({ method: 'create' })
logger.trace({ claim }, 'Storing Claim')
const ipfsHash = await this.ipfs.addText(JSON.stringify(claim))
logger.info({ claim, ipfsHash }, 'Claim Stored')
await this.collection.insertOne({
claimId: claim.id,
ipfsHash,
})
await this.messaging.publish(Exchange.ClaimIPFSHash, {
claimId: claim.id,
ipfsHash,
})
}

This means that if the upload to IPFS fails for any reason, the caller of POST /works won't find out about it at all. It also means that if the file sent to IPFS is broken before reaching IPFS, a valid IPFS hash will be generated but the file will be broken, which is what may be happening in #53.

Solution Proposal 1

A way to validate that the file added to IPFS is correct would be to generate the hash of the file in the same way IPFS does and matching our hash against the one returned by IPFS.

IPFS uses the multihash protocol. By default, it currently uses SHA-256 encoded in base58, which is why IPFS hashes start with "Qm".

Calculating the IPFS hash without using IPFS altogether could be quite complex.

The multi-hash format only provides a wrapper around the hash algorithm, so you can change from SHA256 to BLAKE2b, and the prefix will differ. HOWEVER, the hash is also affected by the chunking algorithm, DAG format and CID version, so you can have completely different hashes even if the format is marked the same.

This functionality should already be available in js-ipfs. Maybe we can use the js implementation of IPFS to calculate the hash without actually running the code? (Need to confirm, but I believe const ipfs = new IPFS() would have the unwanted side effect of eating a whole lot of RAM and CPU).

https://github.com/ipfs/js-ipfs/blob/bddc5b4a967258306bed8bd37a9b4fd308b98fc8/test/http-api/files.js#L31-L48

UPDATE

Using js-ipfs to add -n requires creating an instance of IPFS, which in turn runs the IPFS Node. I tried reading the source code of js-ipfs to understand how it works internally and made some small modifications to make it run without running the IPFS node, but was unsuccessful.

I thought about requesting this feature, but I gather this behavior is intentional: the hash generated by ipfs add does not only depend on the contents of the file, but also some node-specific values that could change from node to node.

the hash is also affected by the chunking algorithm, DAG format and CID version, so you can have completely different hashes even if the format is marked the same

Multihash just hashes the file and prints out it hash in multihash format.
ipfs add creates merkeldag and wraps the file in needed format, that is why the hash is different.

Solution Proposal 2

Plan B would be to ipfs cat right after ipfs add and verify the content matches. This operation is much more expensive, basically duplicating the memory and network costs, and would need to be async and have a retry mechanism like the one we use for downloads since we can't rely on IPFS being 100% available.

I'd rather wait before we go this way. Should be relatively straight forward to implement, but has too high costs.

Solution Proposal 3

In my mind we should be able to have a pure function that calculates an IPFS hash for a given content such as add -n would do, considering the DAG format, CID version and chunking algorithm but without instantiating the node.

If these values could change from IPFS node to IPFS node but are constant over a run of the node, we could have the Po.et Node request these values from the IPFS Node on start up once.

Comparing this to Solution 2, we'd be making a constant-size network request once per start up against doing a variable-size (which could be extremely large) network request to IPFS for each add.

I've uploaded some basic tests to ipfs-hash-tests.

@geoffturk geoffturk added this to the Mainnet milestone May 28, 2018
@lautarodragan lautarodragan changed the title IPFS Upload Retries IPFS Upload Integrity Check May 29, 2018
@lautarodragan lautarodragan self-assigned this May 30, 2018
@lautarodragan lautarodragan mentioned this issue Jun 13, 2018
5 tasks
@geoffturk geoffturk added the o1 label Jun 18, 2018
@geoffturk geoffturk modified the milestones: Mainnet, Mainnet-beta Jun 22, 2018
@geoffturk geoffturk added the t5 label Jul 17, 2018
@geoffturk geoffturk removed the o1 label Aug 15, 2018
@kennylavender
Copy link
Contributor

We have decided to go with Solution Proposal 2, Issue for that Proposal is #177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants