IPFS Upload Integrity Check #112

lautarodragan · 2018-05-28T18:03:09Z

Problem

Currently when the POST /works is requested API will dispatch a message to RMQ that is picked up by Storage, which in turn calls the following code:

node/src/Storage/ClaimController.ts

Lines 41 to 58 in 3e62476

    
           async create(claim: Claim): Promise<void> { 
        
             const logger = this.logger.child({ method: 'create' }) 
        
             logger.trace({ claim }, 'Storing Claim') 
        
             const ipfsHash = await this.ipfs.addText(JSON.stringify(claim)) 
        
             logger.info({ claim, ipfsHash }, 'Claim Stored') 
        
             await this.collection.insertOne({ 
        
               claimId: claim.id, 
        
               ipfsHash, 
        
             }) 
        
             await this.messaging.publish(Exchange.ClaimIPFSHash, { 
        
               claimId: claim.id, 
        
               ipfsHash, 
        
             }) 
        
           }

This means that if the upload to IPFS fails for any reason, the caller of POST /works won't find out about it at all. It also means that if the file sent to IPFS is broken before reaching IPFS, a valid IPFS hash will be generated but the file will be broken, which is what may be happening in #53.

Solution Proposal 1

A way to validate that the file added to IPFS is correct would be to generate the hash of the file in the same way IPFS does and matching our hash against the one returned by IPFS.

IPFS uses the multihash protocol. By default, it currently uses SHA-256 encoded in base58, which is why IPFS hashes start with "Qm".

Calculating the IPFS hash without using IPFS altogether could be quite complex.

The multi-hash format only provides a wrapper around the hash algorithm, so you can change from SHA256 to BLAKE2b, and the prefix will differ. HOWEVER, the hash is also affected by the chunking algorithm, DAG format and CID version, so you can have completely different hashes even if the format is marked the same.

This functionality should already be available in js-ipfs. Maybe we can use the js implementation of IPFS to calculate the hash without actually running the code? (Need to confirm, but I believe const ipfs = new IPFS() would have the unwanted side effect of eating a whole lot of RAM and CPU).

https://github.com/ipfs/js-ipfs/blob/bddc5b4a967258306bed8bd37a9b4fd308b98fc8/test/http-api/files.js#L31-L48

UPDATE

Using js-ipfs to add -n requires creating an instance of IPFS, which in turn runs the IPFS Node. I tried reading the source code of js-ipfs to understand how it works internally and made some small modifications to make it run without running the IPFS node, but was unsuccessful.

I thought about requesting this feature, but I gather this behavior is intentional: the hash generated by ipfs add does not only depend on the contents of the file, but also some node-specific values that could change from node to node.

the hash is also affected by the chunking algorithm, DAG format and CID version, so you can have completely different hashes even if the format is marked the same

Multihash just hashes the file and prints out it hash in multihash format.
ipfs add creates merkeldag and wraps the file in needed format, that is why the hash is different.

Solution Proposal 2

Plan B would be to ipfs cat right after ipfs add and verify the content matches. This operation is much more expensive, basically duplicating the memory and network costs, and would need to be async and have a retry mechanism like the one we use for downloads since we can't rely on IPFS being 100% available.

I'd rather wait before we go this way. Should be relatively straight forward to implement, but has too high costs.

Solution Proposal 3

In my mind we should be able to have a pure function that calculates an IPFS hash for a given content such as add -n would do, considering the DAG format, CID version and chunking algorithm but without instantiating the node.

If these values could change from IPFS node to IPFS node but are constant over a run of the node, we could have the Po.et Node request these values from the IPFS Node on start up once.

Comparing this to Solution 2, we'd be making a constant-size network request once per start up against doing a variable-size (which could be extremely large) network request to IPFS for each add.

I've uploaded some basic tests to ipfs-hash-tests.

The text was updated successfully, but these errors were encountered:

kennylavender · 2018-08-30T15:10:36Z

We have decided to go with Solution Proposal 2, Issue for that Proposal is #177

geoffturk added this to the Mainnet milestone May 28, 2018

geoffturk added the enhancement label May 28, 2018

lautarodragan changed the title ~~IPFS Upload Retries~~ IPFS Upload Integrity Check May 29, 2018

lautarodragan mentioned this issue May 30, 2018

IPFS Upload Retries #119

Closed

geoffturk added the team-dev label May 30, 2018

geoffturk mentioned this issue May 30, 2018

Broken IPFS File? #53

Closed

lautarodragan self-assigned this May 30, 2018

lautarodragan mentioned this issue Jun 13, 2018

IPFS failure detail #145

Merged

5 tasks

geoffturk added the o1 label Jun 18, 2018

geoffturk modified the milestones: Mainnet, Mainnet-beta Jun 22, 2018

lautarodragan added the planned label Jun 23, 2018

This was referenced Jun 23, 2018

IPFS Upload Integrity Check — add + cat #177

Closed

Timestamp Receipts — Add Timestamp to Anchor #178

Open

geoffturk added the t5 label Jul 17, 2018

geoffturk assigned kennylavender and unassigned lautarodragan Aug 9, 2018

geoffturk removed the o1 label Aug 15, 2018

kennylavender closed this as completed Aug 30, 2018

lautarodragan removed planned labels Aug 30, 2018

lautarodragan mentioned this issue Nov 13, 2018

Timestamp Receipts RDD.md poetapp/documentation#22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPFS Upload Integrity Check #112

IPFS Upload Integrity Check #112

lautarodragan commented May 28, 2018 •

edited

Loading

kennylavender commented Aug 30, 2018

IPFS Upload Integrity Check #112

IPFS Upload Integrity Check #112

Comments

lautarodragan commented May 28, 2018 • edited Loading

Problem

Solution Proposal 1

UPDATE

Solution Proposal 2

Solution Proposal 3

kennylavender commented Aug 30, 2018

lautarodragan commented May 28, 2018 •

edited

Loading