Find file Copy path
773ec8d Jan 30, 2017
0 contributors

Users who have contributed to this file

53 lines (33 sloc) 2.98 KB

Instructions for Providers/Sources Publishing Large Datasets

This is part of the Instructions for Replicating Large Amounts of Data. Please read the Overview before proceeding.

If you have datasets on your machine and want to add them to ipfs so you can serve them over the network, follow these instructions. They will walk you through using ipfs-pack to register and serve your data.

By contrast, if you want to mirror data that someone else has published, you can follow Instructions for Mirrors Replicating Large Datasets to efficiently replicate the data onto your IPFS node.


If you just want to run the commands without explanation, here's what you need to do. This assumes that you've already installed ipfs-pack.

It's best if you do this on a machine that has a public IP address.

cd /path-to-your/dataset-directory
ipfs-pack make
ipfs-pack serve

And then retrieve the Pack Root and the multiaddr for the node you just started. This uses info that was printed out on the console when you ran ipfs pack serve. To learn how to retrieve the multiaddr, see Step 2: Get the multiaddr of your node Give the multiaddr and Pack Root to the people who are setting up Mirrors.

Step 1: Use ipfs-pack to serve your datasets

Follow the instructions in the ipfs-pack tutorial, which covers installing ipfs-pack, initializing a pack, and serving the contents of your pack on the IPFS network.

Step 2: Get the multiaddr of your node

This works best if you're running ipfs on a machine with a public IP address.

After starting the ipfs node with ipfs-pack serve, you will see some info about the node printed on the console. It will look like:

verified pack, starting server...
Serving data in this pack...
Peer ID: QmVbXV7mQ5Fs3tYY2Euek5YdkkzcRafUg8qGWvFdgaBMuo
Pack root is QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4
Shared:      0 blocks, 0 B total data uploaded

The multiaddr is the public IPv4 address plus the Pack Peer ID, so for the sample output above, your pack's multiaddr would be /ip4/

The pack root in this sample is QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4. This hash is the content-address for the ipfs-pack that contains your dataset.

Step 3: Publish the Pack Root Hash and Give the multiaddr to your Mirrors

Give that multiaddr to the people who are setting up Mirrors. They will use the multiaddr to bootstrap their network connections based on your node. This will make the connections between your nodes more efficient because they're establishing point-to-point connections with your node.

The mirrors, and anyone else replicating your dataset, will also need the Pack Root hash. They will use that hash to pin your dataset onto their nodes.