New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison of IPFS and BitTorrent for Archives #208

Open
flyingzumwalt opened this Issue Dec 30, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@flyingzumwalt

flyingzumwalt commented Dec 30, 2016

For a project that's looking to store a lot of data redundantly and validate it (ie. #ClimateMirror, what's the best way to explain the differences between IPFS and BitTorrent? What advantages and weaknesses should a project like that consider?

As a starting point, there's this bit on page 4 of the ipfs whitepaper

Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent. BitSwap operates as a persistent marketplace where node can acquire the blocks they need, regardless of what files those blocks are part of. The blocks could come from completely unrelated files in the filesystem. Nodes come together to barter in the marketplace

@flyingzumwalt

This comment has been minimized.

Show comment
Hide comment
@flyingzumwalt

flyingzumwalt Dec 30, 2016

The main distinction I'm aware of is the fact that BitTorrent relies on torrent files, each of which contains a content-addressed manifest of the blocks that make up particular content. This has some ramifications:

  • forces you to choose what is in each torrent file -- ie. do you create one huge torrent file for all of your datasets or do you make a torrent file per-dataset?
  • forces you to track the torrent files themselves with some other tool/system
  • requires you to create metadata about the torrent files
  • does not natively provide a way to identify torrent files themselves using cryptographic hashes
  • does not handle versioning of content

By contrast, IPFS lets you build a DAG of arbitrary size and structure.

Some advantages that occur to me:

  • You can track both the content and the metadata in the IPFS DAG
  • You can add multiple versions of a dataset to IPFS. Each version gets a unique hash and IPFS does its best to avoid storing duplicate blocks
  • You have complete control over which blocks are stored on which IPFS node -- this has huge advantages for distributing storage/backup (see ipfs-cluster)

flyingzumwalt commented Dec 30, 2016

The main distinction I'm aware of is the fact that BitTorrent relies on torrent files, each of which contains a content-addressed manifest of the blocks that make up particular content. This has some ramifications:

  • forces you to choose what is in each torrent file -- ie. do you create one huge torrent file for all of your datasets or do you make a torrent file per-dataset?
  • forces you to track the torrent files themselves with some other tool/system
  • requires you to create metadata about the torrent files
  • does not natively provide a way to identify torrent files themselves using cryptographic hashes
  • does not handle versioning of content

By contrast, IPFS lets you build a DAG of arbitrary size and structure.

Some advantages that occur to me:

  • You can track both the content and the metadata in the IPFS DAG
  • You can add multiple versions of a dataset to IPFS. Each version gets a unique hash and IPFS does its best to avoid storing duplicate blocks
  • You have complete control over which blocks are stored on which IPFS node -- this has huge advantages for distributing storage/backup (see ipfs-cluster)
@flyingzumwalt

This comment has been minimized.

Show comment
Hide comment
@flyingzumwalt

flyingzumwalt Dec 30, 2016

Oh- and you can reference contents/files within a datasets using merkle paths and link to them with merkle links.

flyingzumwalt commented Dec 30, 2016

Oh- and you can reference contents/files within a datasets using merkle paths and link to them with merkle links.

@20zinnm

This comment has been minimized.

Show comment
Hide comment
@20zinnm

20zinnm Jan 2, 2017

For Climate Mirror, the big advantages include:

  • Being able to access files in folders without downloading an entire dataset (especially for the researchers who need to use this data)
  • IPNS. Need I say more? We can host an index of both IPFS hashes and normal mirrors, and update it frequently. Thus, we have a content discovery mechanism. https://ipfs.io/ipns/QmRsCTmkqL35LZ7uBGDoPnLtgJuyiEDDXjLaFYmMWsmTaM
  • No duplicate blocks is huge.

That's among several other advantages, but those are some key points I've found.

NOTE: That index is simply a sampling for an explorer I'm building. The real index will have IPFS datasets, etc.

20zinnm commented Jan 2, 2017

For Climate Mirror, the big advantages include:

  • Being able to access files in folders without downloading an entire dataset (especially for the researchers who need to use this data)
  • IPNS. Need I say more? We can host an index of both IPFS hashes and normal mirrors, and update it frequently. Thus, we have a content discovery mechanism. https://ipfs.io/ipns/QmRsCTmkqL35LZ7uBGDoPnLtgJuyiEDDXjLaFYmMWsmTaM
  • No duplicate blocks is huge.

That's among several other advantages, but those are some key points I've found.

NOTE: That index is simply a sampling for an explorer I'm building. The real index will have IPFS datasets, etc.

@yousefamar

This comment has been minimized.

Show comment
Hide comment
@yousefamar

yousefamar Aug 9, 2017

@flyingzumwalt, @20zinnm I'd be interested to hear your thoughts on how IPFS compares to BitTorrent v2 — it seems to me the gap has gotten smaller.

yousefamar commented Aug 9, 2017

@flyingzumwalt, @20zinnm I'd be interested to hear your thoughts on how IPFS compares to BitTorrent v2 — it seems to me the gap has gotten smaller.

@flyingzumwalt

This comment has been minimized.

Show comment
Hide comment
@flyingzumwalt

flyingzumwalt Aug 9, 2017

The key distinguishing factor in my mind is the fact that IPFS allows you to use any hash, of any content or any subset of content, as an identifier. You can use that hash to ask the network who has that exact content. This makes the system much more flexible than bittorrent, because you can precisely identify exactly the content you are providing or requesting, regardless of whether it's a huge set of files, a single file, a part of a file, or a single entry from some dataset. Contrast this with bittorrent's reliance on torrent files, which bundle data together according to however that torrent file was originally structured by its creator.

As far as I can tell, bittorrent v2 does not decrease this reliance on torrent files.

flyingzumwalt commented Aug 9, 2017

The key distinguishing factor in my mind is the fact that IPFS allows you to use any hash, of any content or any subset of content, as an identifier. You can use that hash to ask the network who has that exact content. This makes the system much more flexible than bittorrent, because you can precisely identify exactly the content you are providing or requesting, regardless of whether it's a huge set of files, a single file, a part of a file, or a single entry from some dataset. Contrast this with bittorrent's reliance on torrent files, which bundle data together according to however that torrent file was originally structured by its creator.

As far as I can tell, bittorrent v2 does not decrease this reliance on torrent files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment