This project is in progress.
IPFS-Cluster is a good project for data orchestration on IPFS. But it does not support erasure coding, which means that we need to use multiple memory for fault tolerance. But it can be solved by adding a Reed-Solomon module. See discuss.
This work can be divided into three parts.
- Data Addition: First is obtain data. Since data can only be accessed once, use
DAGService.Get
get the block data and send it to Erasure module during MerkleDAG traversal. Once Erasure module receives enough data shards, it use ReedSolomon encodes parity shards and send them toadder
. Then adder reusessingle/dag_service
add them to IPFS as several individual files. - Shard Allocation: We need to decide which nodes are suitable to each shard. The implementation ensures that each shard only will store by one peer when added(but allocation may change when shard broken and recovery), and each peer of IPFS have same change to store data or parity. See
DefaultECAllocate
for details. After determining allocation of shards, we use the RPC CallIPFSConnector.BlockStream
to send blocks, andCluster.Pin
to pin remotely or locally. - Data Recovery: We use
clusterPin
store the cid of data and parity shards as well as the size of data shards. During reconstruction, we set one minute as timeout and attempt to retrieve data and parity shards separately. If some shards are broken, we finally use ReedSolomon module to reconstruct and repin the file. However, ReedSolomon has a limit, only the sum of the number of existing data shards and party shards needs to greater than total data shards, can we reconstruct all data shards and piece together the complete data.
It's exactly the same as ipfs cluster. First, we need to start the IPFS daemon and the IPFS cluster daemon, and then interact with the IPFS cluster daemon through ipfs-cluster-ctl. See documentation
The only difference is that each binary executable needs to be replaced.
# copy to ~/.bashrc || ~/.zshrc
alias dctl="$GOPATH/src/ipfs-cluster/cmd/ipfs-cluster-ctl/ipfs-cluster-ctl"
alias dfollow="$GOPATH/src/ipfs-cluster/cmd/ipfs-cluster-follow/ipfs-cluster-follow"
alias dservice="$GOPATH/src/ipfs-cluster/cmd/ipfs-cluster-service/ipfs-cluster-service"
alias fctl="$GOPATH/src/ipfs-cluster/cmd/ipfs-cluster-ctl/ipfs-cluster-ctl --host /unix/$HOME/.ipfs-cluster-follow/ali/api-socket" #Communicate with the ipfs-cluster-follow
alias cctl="ipfs-cluster-ctl"
alias cfollow="ipfs-cluster-follow"
alias cservice="ipfs-cluster-service"
export GOLOG_LOG_LEVEL="info,subsystem1=warn,subsystem2=debug" # github.com/ipfs/go-log set log level
# fastly start the cluster using Docker
alias dctlmake='
cd $GOPATH/src/ipfs-cluster/cmd/ipfs-cluster-ctl && make
cd $GOPATH/src/ipfs-cluster
docker build -t ipfs-cluster-erasure -f Dockerfile-erasure .
docker-compose -f docker-compose-erasure.yml up -d
docker logs -f cluster0
'
# little change may let test fail
dctltest() {
cd $GOPATH/src/ipfs-cluster
make cmd/ipfs-cluster-ctl
docker build -t ipfs-cluster-erasure -f Dockerfile-erasure .
docker-compose -f docker-compose-erasure.yml up -d
sleep 10
# QmSxdRX48W7PeS4uNEmhcx4tAHt7rzjHWBwLHetefZ9AvJ is the cid of tmpfile
ci="QmSxdRX48W7PeS4uNEmhcx4tAHt7rzjHWBwLHetefZ9AvJ"
dctl pin rm $ci
seq 1 250000 > tmpfile
dctl add tmpfile -n tmpfile --erasure --shard-size 512000 # --data-shards 4 --parity-shards 2
# find frist peer no equal cluster0 and store sharding data
# awk '$1 == 1 && $2 != 0 {print $2}' means that find the peer that store one shard and it's id not cluster0(cluster0 expose port)
x=$(dctl status --filter pinned | grep -A 2 tmpfile | awk -F'cluster' '{print $2}' | awk '{print $1}' | sort | uniq -c | awk '$1 == 3 && $2 != 0 {print $2}' | head -n 1)
docker stop "cluster$x" "ipfs$x"
dctl ipfs gc # clean ipfs cache
sleep 5
dctl ecget $ci
diff $ci tmpfile > /dev/null
if [ $? -eq 0 ]; then
echo "Files are identical, test pass ^^"
else
echo "Files are different, test fail :D"
fi
dctl pin rm $ci
rm $ci
rm tmpfile
}
P.S. If you notice no disk space left, use docker system df
to check docker cache :)
add <filepath> --erasure
: Added file by erasure coding, build ipfs-cluster-ctl
and type ipfs-cluster-ctl add -h
for details.
P.S. Using --erasure also force enables raw-leaves and shard
ecget
: Get erasure file by cid, if file was broken(canot get all shards) it will automatically recovery it.
P.S. Shell command can download file and directory directly. But rpc
Cluster.ECGet
can only retrieve tar archived []byte by stream, so you need to usetar.Extractor
to extract it to FileSystem.
ecrecovery
: Scan all erasure coding files pinned, if some files broken then try to recover.
This project currently supports fundamental features about Erasure Code. However, there are something need to be optimizated:
- At present, we use
sharding/dag_service
to store the original file andsingle/dag_service
to store single files. A more elegant solution would be to create a newadder
module to combine them.draft: One possible way is use block as unit, to RS encode and put the pairty blocks into origin Merkle DAG. It will make a new different file(combine origin file and parity blocks). then we pin this file to avoid gc and figure out when some block loss, how to retrieve this file into different group then use RS decode get origin blocks. This method must change the logic of layout func. Because default balance layout will fill probably 174 raw blocks into a no-leave node, we need to calculate the number of parity blocks pre no-leave node and fill with parity blocks.
- Support for block-level erasure.
set shard size=defaultBlockSize is block-level erasure - When using
single/dag_service
to add parity shards as individual files, makeapi.NodeWithMeta
andsync.Once
as a slice is simple but stupid solution to prevent multiple invocations of Finalize. Each parity shard uses a uniqueapi.NodeWithMeta
andsync.Once
, avoiding conflicts. However, this approach disrupts the original structure ofsingle/dag_service
, it return TODO1.
Pinset orchestration for IPFS
IPFS Cluster provides data orchestration across a swarm of IPFS daemons by allocating, replicating and tracking a global pinset distributed among multiple peers.
There are 3 different applications:
- A cluster peer application:
ipfs-cluster-service
, to be run along withkubo
(go-ipfs
) as a sidecar. - A client CLI application:
ipfs-cluster-ctl
, which allows easily interacting with the peer's HTTP API. - An additional "follower" peer application:
ipfs-cluster-follow
, focused on simplifying the process of configuring and running follower peers.
Please participate in the IPFS Cluster user registry.
- IPFS Cluster(Erasure Coding Support)
- Table of Contents
- Documentation
- News & Roadmap
- Install
- Usage
- Contribute
- License
Please visit https://ipfscluster.io/documentation/ to access user documentation, guides and any other resources, including detailed download and usage instructions.
We regularly post project updates to https://ipfscluster.io/news/ .
The most up-to-date Roadmap is available at https://ipfscluster.io/roadmap/ .
Instructions for different installation methods (including from source) are available at https://ipfscluster.io/download .
Extensive usage information is provided at https://ipfscluster.io/documentation/ , including:
PRs accepted. As part of the IPFS project, we have some contribution guidelines.
This library is dual-licensed under Apache 2.0 and MIT terms.
© 2022. Protocol Labs, Inc.