New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃専 adding Torrent support to IPFS #779

Open
diasdavid opened this Issue Mar 8, 2017 · 17 comments

Comments

Projects
None yet
8 participants
@diasdavid
Member

diasdavid commented Mar 8, 2017

I've started working in enabling Torrent support for js-ipfs, very much in the same way that we have support for: dag-pb, dag-cbor, eth-blocks, eth-tx, zcash (go-ipfs only), git (go-ipfs only) and bitcoin (go-ipfs only).

The end goal is to expose two top level commands to add and retrieve files that are Torrents, from the IPFS or BitTorrent network (through a bridge and in the future, by connecting directly). The commands being:

  • jsipfs torrent add
  • jsipfs torrent cat

However, I stumbled upon a question in which we will have to make a decision and I would like to get feedback before going at full speed. In BitTorrent, torrent files are not referenced by a Cryptographic hash due to their ephemeral and mutable nature (in fact, decoding and encoding is not even always idempotent by spec), the only thing that has a cryptographic identifier is the info field in the torrent file.

I started implementing the IPLD format for a Torrent file, but I'm guessing that most people will want to fetch their torrent through the infoHash of the torrent that they get from a thing like a magnetic URI, the crux is that there is never a file for the info field, as soon as a infoHash query is performed, a Torrent file is retried, rising the question of:

Should dag.get(<infoHash>/somePath) resolve through the retrieved Torrent file or only over the info field?

  • Resolve through the Torrent file - This is weird to the IPLD resolver, as it would be resolving an immutable pointer to something that has more fields
  • Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

Thoughts? //cc @jbenet @whyrusleeping @nicola

@lgierth

This comment has been minimized.

Show comment
Hide comment
@lgierth

lgierth Mar 8, 2017

Member

Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

This sounds like the pragmatic way for me too -- we'll likely get a better idea of what to do with the whole torrent in the process of working on this.

Given that the torrent file itself is not already content-addressed, it's also the "correct" way I think. Magnet URIs address the info hash anyway.

Member

lgierth commented Mar 8, 2017

Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

This sounds like the pragmatic way for me too -- we'll likely get a better idea of what to do with the whole torrent in the process of working on this.

Given that the torrent file itself is not already content-addressed, it's also the "correct" way I think. Magnet URIs address the info hash anyway.

@lgierth

This comment has been minimized.

Show comment
Hide comment
@lgierth

lgierth Mar 8, 2017

Member
{
  "infoHash": "d2474e86c95b19b8bcfdb92bc12c9d44667cfa36",
  "infoHashBuffer": {"/": "$infoHashAsCID"},
  "name": "Leaves of Grass by Walt Whitman.epub",
}
Member

lgierth commented Mar 8, 2017

{
  "infoHash": "d2474e86c95b19b8bcfdb92bc12c9d44667cfa36",
  "infoHashBuffer": {"/": "$infoHashAsCID"},
  "name": "Leaves of Grass by Walt Whitman.epub",
}
@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Mar 8, 2017

Member

@lgierth I only received your comments after I posted, it seems that we had this chat while you were writing those :)

Notes from a chat with @jbenet and @whyrusleeping

  • Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).
  • When importing a Torrent, two objects need to be created, one for the info and one for the Torrent file itself.
  • New command added: torrent import <torrent-file, magnetic-uri, infohash>
  • jsipfs torrent will be available through a module called ipfs-torrent that exposes both a CLI and a module (like ipfs-unixfs-engine).

This leads to the following steps

1. Implement the IPLD Formats to support torrents

2. Implement a blockstore that uses webtorrent as it's storage driver

  • torrent-pull-blob-store
  • confirm that we can dag.get(<torrentHash or infoHash>) and traverse through those objects

3. Implement the ipfs-torrent service (like ipfs-unixfs-engine)

  • module
    • .import (adds the torrent file and creates an infoHash object too)
      • import by magnetic URI
      • import by infoHash
    • .add
      • single files support
      • directories support
    • .cat (single files)
    • .get
  • cli
    • spawn a js-ipfs daemon or connect to a remoteDaemon
Member

diasdavid commented Mar 8, 2017

@lgierth I only received your comments after I posted, it seems that we had this chat while you were writing those :)

Notes from a chat with @jbenet and @whyrusleeping

  • Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).
  • When importing a Torrent, two objects need to be created, one for the info and one for the Torrent file itself.
  • New command added: torrent import <torrent-file, magnetic-uri, infohash>
  • jsipfs torrent will be available through a module called ipfs-torrent that exposes both a CLI and a module (like ipfs-unixfs-engine).

This leads to the following steps

1. Implement the IPLD Formats to support torrents

2. Implement a blockstore that uses webtorrent as it's storage driver

  • torrent-pull-blob-store
  • confirm that we can dag.get(<torrentHash or infoHash>) and traverse through those objects

3. Implement the ipfs-torrent service (like ipfs-unixfs-engine)

  • module
    • .import (adds the torrent file and creates an infoHash object too)
      • import by magnetic URI
      • import by infoHash
    • .add
      • single files support
      • directories support
    • .cat (single files)
    • .get
  • cli
    • spawn a js-ipfs daemon or connect to a remoteDaemon
@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Mar 8, 2017

Member

\o/

Member

jbenet commented Mar 8, 2017

\o/

@dignifiedquire

This comment has been minimized.

Show comment
Hide comment
@dignifiedquire

dignifiedquire Mar 8, 2017

Member

@diasdavid maybe wait with the torrent blob store for the datastore refactor?

Member

dignifiedquire commented Mar 8, 2017

@diasdavid maybe wait with the torrent blob store for the datastore refactor?

@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Mar 8, 2017

Member

@dignifiedquire I see the value, but won't block Torrent support because of the datastore refactor, it is not a dependency.

Member

diasdavid commented Mar 8, 2017

@dignifiedquire I see the value, but won't block Torrent support because of the datastore refactor, it is not a dependency.

@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Mar 8, 2017

Member

To keep on log, here is the real structure of both Torrent file and info fields - https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure

Member

diasdavid commented Mar 8, 2017

To keep on log, here is the real structure of both Torrent file and info fields - https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure

@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Mar 8, 2017

Member

Bringing this one back (馃帾 )

Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).

It turns out that we might actually just need to do the bencode, because the format, as described in -- https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure -- prescribes that the SHA1 hashes of the pieces be all concatenated, which means that there won't be any <infohash>/info/pieces/<insert piece number>, unless we apply a transformation to the bencoded data in the first place.

This means that we won't be able to use IPLD resolver to traverse through, without transforming the data, as that pieces field will just be a very long byte array value.

Member

diasdavid commented Mar 8, 2017

Bringing this one back (馃帾 )

Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).

It turns out that we might actually just need to do the bencode, because the format, as described in -- https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure -- prescribes that the SHA1 hashes of the pieces be all concatenated, which means that there won't be any <infohash>/info/pieces/<insert piece number>, unless we apply a transformation to the bencoded data in the first place.

This means that we won't be able to use IPLD resolver to traverse through, without transforming the data, as that pieces field will just be a very long byte array value.

@diasdavid diasdavid added the ready label Mar 9, 2017

@diasdavid diasdavid self-assigned this Mar 9, 2017

@lgierth

This comment has been minimized.

Show comment
Hide comment
@lgierth

lgierth Mar 10, 2017

Member

It's pretty ironic, but we can exploit the fact that it prescribes SHA1 and split every 40 bytes.

Member

lgierth commented Mar 10, 2017

It's pretty ironic, but we can exploit the fact that it prescribes SHA1 and split every 40 bytes.

@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Mar 10, 2017

Member

20 bytes*, @lgierth we can indeed, that falls into the 'Transformations' category, as IPLD compatible format goes, we are strict about not messing with the data.

Member

diasdavid commented Mar 10, 2017

20 bytes*, @lgierth we can indeed, that falls into the 'Transformations' category, as IPLD compatible format goes, we are strict about not messing with the data.

@diasdavid diasdavid referenced this issue Mar 12, 2017

Open

Add usage #2

@diasdavid diasdavid changed the title from Adding Torrent support to IPFS to 馃専 adding Torrent support to IPFS Mar 21, 2017

@diasdavid diasdavid referenced this issue Mar 22, 2017

Closed

馃殌 0.23 Release 馃専 #795

18 of 22 tasks complete
@kumavis

This comment has been minimized.

Show comment
Hide comment
@kumavis

kumavis Mar 29, 2017

Contributor

@diasdavid I dont think splitting on 20 bytes for each piece id is any different than biting off the first N bytes for the first parameter of any binary serialization

i would say its not a transformation if the serialization doesn't need to change

our thinking seems to diverge here, based on previous discussions around ethereum resolvers

Contributor

kumavis commented Mar 29, 2017

@diasdavid I dont think splitting on 20 bytes for each piece id is any different than biting off the first N bytes for the first parameter of any binary serialization

i would say its not a transformation if the serialization doesn't need to change

our thinking seems to diverge here, based on previous discussions around ethereum resolvers

@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Mar 29, 2017

Member

@kumavis agreed that there might be space to be a little less strict with the separation of local resolver vs transformation. Note: I intuitively did the same as you with dag-pb https://github.com/ipld/js-ipld-dag-pb/blob/master/src/resolver.js#L44-L47 .

I'll be with @nicola next week and revisit this question for IPLD transformations. Let's continue this thread on the IPLD repo ipld/ipld#13.

Member

diasdavid commented Mar 29, 2017

@kumavis agreed that there might be space to be a little less strict with the separation of local resolver vs transformation. Note: I intuitively did the same as you with dag-pb https://github.com/ipld/js-ipld-dag-pb/blob/master/src/resolver.js#L44-L47 .

I'll be with @nicola next week and revisit this question for IPLD transformations. Let's continue this thread on the IPLD repo ipld/ipld#13.

@kumavis

This comment has been minimized.

Show comment
Hide comment
@kumavis

kumavis Apr 3, 2017

Contributor

I think ipld/ipld#13 is slightly more complicated (pre-process with hash, split into halfbytes).

splitting the concatenated SHA1 refs still falls under (consume path part, return result) which is no more of a transformation than any IPFS resolver performs.

Contributor

kumavis commented Apr 3, 2017

I think ipld/ipld#13 is slightly more complicated (pre-process with hash, split into halfbytes).

splitting the concatenated SHA1 refs still falls under (consume path part, return result) which is no more of a transformation than any IPFS resolver performs.

@jeremyBanks

This comment has been minimized.

Show comment
Hide comment
@jeremyBanks

jeremyBanks Aug 29, 2017

I wanted to note the release of The BitTorrent Protocol Specification v2. I don't expect it to be fully supported soon, but it's probably worth being aware of them when designing v1 support. My understanding may not be entirely correct, but here are the key points as I understand them:

v2 torrents use different structures than v1 in the info dictionary and metainfo .torrent files. v2 torrents are identified using SHA-2-256 hash of the info dictionary, truncated to 20 bytes to match the length of v1's SHA-1 hashes. It's possible to create hybrid torrents that contain both v1 and v2 structures, and can be identified by either hash.

Because a different hash function is used, v1 and v2 torrents' IPFS paths be distinguished (because that's included in their multihash):

/ipfs/f 017b 11 14123456fc77d23aca05a8b58066bb55fe06c72f8e - SHA-1, v1
/ipfs/f 017b 12 14cd5877ccec0ebc8c231ecc70265ce239a90bdb9e - truncated SHA-2-256, v2

EDIT: the following is wrong, see my next comment.

BitTorrent magnet links do not have this information; v1 and v2 magnet links cannot be distinguished. I think you need to connect connect to the torrent swarm and download the metadata before you can check which version and hash algorithm were used.

So it may not be strictly correctly possible to map BitTorrent magnet URLs (e.g. ipfs-shipyard/ipfs-companion#256) to a specific IPFS path, because the hash algorithm will not be known.

jeremyBanks commented Aug 29, 2017

I wanted to note the release of The BitTorrent Protocol Specification v2. I don't expect it to be fully supported soon, but it's probably worth being aware of them when designing v1 support. My understanding may not be entirely correct, but here are the key points as I understand them:

v2 torrents use different structures than v1 in the info dictionary and metainfo .torrent files. v2 torrents are identified using SHA-2-256 hash of the info dictionary, truncated to 20 bytes to match the length of v1's SHA-1 hashes. It's possible to create hybrid torrents that contain both v1 and v2 structures, and can be identified by either hash.

Because a different hash function is used, v1 and v2 torrents' IPFS paths be distinguished (because that's included in their multihash):

/ipfs/f 017b 11 14123456fc77d23aca05a8b58066bb55fe06c72f8e - SHA-1, v1
/ipfs/f 017b 12 14cd5877ccec0ebc8c231ecc70265ce239a90bdb9e - truncated SHA-2-256, v2

EDIT: the following is wrong, see my next comment.

BitTorrent magnet links do not have this information; v1 and v2 magnet links cannot be distinguished. I think you need to connect connect to the torrent swarm and download the metadata before you can check which version and hash algorithm were used.

So it may not be strictly correctly possible to map BitTorrent magnet URLs (e.g. ipfs-shipyard/ipfs-companion#256) to a specific IPFS path, because the hash algorithm will not be known.

@sesam

This comment has been minimized.

Show comment
Hide comment
@sesam

sesam Oct 12, 2017

ping @arvidn Maybe you know if magnet: links uniquely identify content, or if it聽needs network discovery, and if this is considered a feature or bug for v2?

sesam commented Oct 12, 2017

ping @arvidn Maybe you know if magnet: links uniquely identify content, or if it聽needs network discovery, and if this is considered a feature or bug for v2?

@jeremyBanks

This comment has been minimized.

Show comment
Hide comment
@jeremyBanks

jeremyBanks Oct 12, 2017

What I wrote above is wrong! I apologize for the misinformation. >_<

The updated BEP-9 does in fact use a multihash under a different key to identify a v2 torrent data. I thought that this was cut out before the final version. (The idea of using multihash elsewhere in the protocol was cut, I didn't realize it remained here.) So I think the direct mapping is like:

SHA-1, v1
/ipfs/f017b1114123456fc77d23aca05a8b58066bb55fe06c72f8e
magnet:?xt=urn:btih:123456fc77d23aca05a8b58066bb55fe06c72f8e

truncated SHA-2-256, v2
/ipfs/f017b1214cd5877ccec0ebc8c231ecc70265ce239a90bdb9e
magnet:?xt=urn:btmh:1214123456fc77d23aca05a8b58066bb55fe06c72f8e

Hybrid torrents still have two possible addresses, but that shouldn't be a problem.

jeremyBanks commented Oct 12, 2017

What I wrote above is wrong! I apologize for the misinformation. >_<

The updated BEP-9 does in fact use a multihash under a different key to identify a v2 torrent data. I thought that this was cut out before the final version. (The idea of using multihash elsewhere in the protocol was cut, I didn't realize it remained here.) So I think the direct mapping is like:

SHA-1, v1
/ipfs/f017b1114123456fc77d23aca05a8b58066bb55fe06c72f8e
magnet:?xt=urn:btih:123456fc77d23aca05a8b58066bb55fe06c72f8e

truncated SHA-2-256, v2
/ipfs/f017b1214cd5877ccec0ebc8c231ecc70265ce239a90bdb9e
magnet:?xt=urn:btmh:1214123456fc77d23aca05a8b58066bb55fe06c72f8e

Hybrid torrents still have two possible addresses, but that shouldn't be a problem.

@arvidn

This comment has been minimized.

Show comment
Hide comment
@arvidn

arvidn Oct 12, 2017

yeah, the hash in the magnet link definitely identifies the content. However, it also identifies some other metadata such as piece size, file names, etc. So even with bittorrent v1, it's possible to have two separate magnet links refer to exactly identical content (but with different piece sizes for instance).

arvidn commented Oct 12, 2017

yeah, the hash in the magnet link definitely identifies the content. However, it also identifies some other metadata such as piece size, file names, etc. So even with bittorrent v1, it's possible to have two separate magnet links refer to exactly identical content (but with different piece sizes for instance).

@diasdavid diasdavid added the P1 - High label Oct 18, 2017

@diasdavid diasdavid removed their assignment Jun 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment