New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does Resolution and Routing work with IPFS? #48

Closed
pocketmax opened this Issue Sep 29, 2015 · 9 comments

Comments

Projects
None yet
8 participants
@pocketmax

pocketmax commented Sep 29, 2015

If the content that I request is on a device (say a tablet) 2 or 3 devices (i.e. hops) away from me, will IPFS, flood forward my request to neighboring devices until it finds the content? Once found, will it establish a connection through the neighboring devices long enough to download the content?

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Nov 2, 2015

Member

I recently wrote out a full description of content resolution and routing. here it is.

This assumes knowledge at the level of this talk: https://www.youtube.com/watch?v=HUVmypx9HGI

How Routing and Resolution works in IPFS

The abstract is this:

  • We have a "content routing" system, which allows us to store signed records in the network at a given hash address.
  • Records can be (a) mutable pointers (names) or (b) provider locations (like glue, used to retrieve merkledag objects)
  • Nodes "make available" records for the content they're willing to serve, and the names they control (hold the private key).
  • "make available" depends on the routing system of choice. one example is a global DHT (like MainlineDHT, which has 30+ Million nodes). Others include DNS, pub/sub groups (overlay multicast), OpenFlow content routing, and more.
  • The choice of "routing system" has implications regarding (a) where the records are stored, (b) how they're retrieved, and (c) how fast they propagate. These are tradeoffs that need to be upgradeable over time, and which users must be able to choose for themselves.
  • But Routing is cleanly a layer below the content + record description, so that these can be layered over a variety of usecase-dependent routing systems. (hence thin waist again).

In short, today we use a global DHT and DNS to resolve routing records. But this is well layered and users can choose among suitable protocols.

What Paths (URIs) Look Like

Paths (URIs) in IPFS look like unix paths beginning with "/ipfs/..." or "/ipns/...". The canonical path is not a web-style scheme ("ipfs://") because
then it would not be (a) composable, nor (b) mountable on unix filesystems, but if a scheme is required, users can use "ipfs:/ipfs/..." and "ipfs:/ipns/...".

IPFS paths are either mutable or immutable. these links are resolvable through IPFS, and we can use an HTTP gateway we provide at https://ipfs.io to do it for us. So, these are only HTTP to our gateways so you can see them, but it's IPFS doing the work underneath!

Immutable Paths (URIs) /ipfs/...

Mutable Paths (URIs) /ipns/...

  • /ipns/QmSyjxWfaAmnaK3yBVZWUsfcPV4AFXEfJURuqLvHMxj3Lk/foo/bar/baz.png (<-- this one may break as we'll be making some changes to things)
    a key-addressed path (resolves to content-addressed). verified by signatures.

  • /ipns/a.ex.ipfs.io/foo/bar/baz.png
    a dns-addressed path (resolves to content-addressed) based on a dnslink DNS record.

     > dig TXT a.ex.ipfs.io
     a.ex.ipfs.io.      1800    IN  TXT "dnslink=/ipfs/QmbGiL5Z9Be2vi1MMVEykzz73XB2nryU4T1wwLiS16GNAr"
    

    verified by DNS (DNSSEC). typically, pointed to /ipns/<key>.
    It can also be accessed via http://a.ex.ipfs.io/foo/bar/baz.png
    (by adding an A record to any IPFS gateway, i.e. our public one. it would be HTTPS but not yet getting wildcard certs (ugh!) for examples.)

Ttry these links out! specially the pure HTTP one was really fun to get working. thats HTTP hosted.

How Path Resolution Works

For now, assume we have a way to do the following (explained further down, in the "Routing" part):

  • fetch content-addressed graph nodes by their hash (check hash matches).
  • fetch key-addressed pointers by their hash (check signature matches).

Resolving Immutable Paths (by walking the merkledag):

  1. start with the first path component, retrieve the object whose hash matches the component.
  2. for each remaining component C, look into the current node's links, for a mapping from C to the next hash. (like filesystem directories with inodes).
  3. Terminate at the end of the path. If at any point no mapping exists, it is an invalid path and return an error.
  4. The entire chain is authenticated (by being content-addressed)

Resolving Mutable Key-based Paths (pointers in the name system, hence ipns):

  1. start with the first path component, which is the hash of a public key.
  2. lookup a pointer (in the routing system) which matches the hash. valid pointers MUST be signed by the private key corresponding to the public key.
  3. The pointer value is another path (mutable or immutable), resolve it to get a merkledag object.
  4. This is the root of the rest of the original path.

Resolving Mutable DNS-based Paths

  1. Lookup DNS TXT records at the given domain
  2. Select the first (ordered) which matches "dnslink=". the value is another path (mutable or immutable). Resolve it to get a merkledag object.
  3. This is the root of the rest of the original path.

Content Routing and Records

The hardest part of making IPFS work is to find a way to distribute the content that is fast, scalable, secure, compatible with human policies, and which can be run entirely by simple IPFS nodes (i.e. no change to the internet at all, and no central points whatsoever). This is hard, but possible by cleanly layering.

One of the problems with plans to evolve the internet towards content routing ({NDN/CCN, XIA, etc}) is that they requires upgrading the internet itself, which is really hard to warrant without massive demand. Even with large demand, IPv6 has yet to be fully deployed :( -- which does not give me any hope of seeing NDN/CCN massively deployed in the core, without FIRST establishing the use of content-addressed networks. Meaning that end developers (web developers) must be able to use content-addressed networks to move lots of data (video, etc) extremely effectively well before substantial demand to improve the underlying network will materialize. So as we see it, by making IPFS usable to end developers we can create demand for these architectures as well.

Unlike many other p2p-based systems, our model requires that:

  • Content MUST be able to move as fast as the underlying network permits. This means that nodes in the same lan/datacenter should not route through the backbone (unless a human policy requires it).
  • Nodes MUST be able to ONLY store and download the content they explicitly choose to. (important even if encrypted, given illegal bits). This is in contrast to many other p2p systems.
  • Nodes MUST be able make choices regarding choice of transports + routing, to enact policies, and to make tradeoffs (say between privacy and performance).

This content model requires separating Content from Routing Records. The Content is just merkledag objects which represent users' files and other datastructures. The Routing Records (also merkledag objects) are explicitly used for finding other nodes, and for finding content. These records are cryptographic artifacts with some room to play with, so a variety of cryptographic protocols can be deployed on top of this record system. We think of it as an improved DNS learning from all the uses of things like Git, BitTorrent, Bitcoin, and so on.

The Routing System interface is very simple -- it's a distributed key-value store with some requirements that the values conform to some validity rules. These can include cryptographic proofs. There is also a total ordering, so there is a "best record", and optional freshness guarantees. This allows nodes storing records to discard invalid, old, or "less good" records.

type Routing interface {

    // Put allows a node to publish a record to a given key. 
    // These are validated before being stored anywhere.
    Put(Key, Record) (error)

    // Get allows a node to retrieve records for a given key. 
    // These are validated before returning to the caller.
    Get(Key) ([]Record, error)
}

Many things can support this key-value store interface, from a centralized database, to a totally distributed hash table. In reality, we can use well-known and established systems, such as DNS, global-scale DHTs, pub/sub (multicast) overlays, OpenFlow, and more. There is also the possibility of using completely oblivious RAM systems for routing, which would be a tremendous complement to tor/i2p and other privacy sensitive applications. Routing systems can also choose whether to be part of the global routing system (i.e. make record available over to all systems) or ONLY to a smaller subnet.

The flexibity here exists because ultimately, the use case will dictate important constraints on the conditions for routing. This is important for human routing policies, from performance to privacy constraints. There is no "one-size fits all" routing system yet. (Maybe {NDN/CCN, XIA, etc} could be it, but we cannot wait for them to be deployed before IPFS works).

The way we see it, we will have one global DHT, DNS, and many smaller special-purpose routing systems.

Today, IPFS uses a Kademlia-based DHT, which will soon turn into a Coral+S/Kademlia based DHT, and will continue to learn from DHT research. We will also have other routing systems deployed -- including mDNS and pub/sub (multicast) -- in 2016.

One interesting thing we're building on this routing system (and ipfs as a whole) is a datastructure + tools + libs which allows application developers to use a web of trust like SPKI/SDSI, tied to whatever naming system they want (can be pure key chains, or bound to DNS, or bound to other things).

Member

jbenet commented Nov 2, 2015

I recently wrote out a full description of content resolution and routing. here it is.

This assumes knowledge at the level of this talk: https://www.youtube.com/watch?v=HUVmypx9HGI

How Routing and Resolution works in IPFS

The abstract is this:

  • We have a "content routing" system, which allows us to store signed records in the network at a given hash address.
  • Records can be (a) mutable pointers (names) or (b) provider locations (like glue, used to retrieve merkledag objects)
  • Nodes "make available" records for the content they're willing to serve, and the names they control (hold the private key).
  • "make available" depends on the routing system of choice. one example is a global DHT (like MainlineDHT, which has 30+ Million nodes). Others include DNS, pub/sub groups (overlay multicast), OpenFlow content routing, and more.
  • The choice of "routing system" has implications regarding (a) where the records are stored, (b) how they're retrieved, and (c) how fast they propagate. These are tradeoffs that need to be upgradeable over time, and which users must be able to choose for themselves.
  • But Routing is cleanly a layer below the content + record description, so that these can be layered over a variety of usecase-dependent routing systems. (hence thin waist again).

In short, today we use a global DHT and DNS to resolve routing records. But this is well layered and users can choose among suitable protocols.

What Paths (URIs) Look Like

Paths (URIs) in IPFS look like unix paths beginning with "/ipfs/..." or "/ipns/...". The canonical path is not a web-style scheme ("ipfs://") because
then it would not be (a) composable, nor (b) mountable on unix filesystems, but if a scheme is required, users can use "ipfs:/ipfs/..." and "ipfs:/ipns/...".

IPFS paths are either mutable or immutable. these links are resolvable through IPFS, and we can use an HTTP gateway we provide at https://ipfs.io to do it for us. So, these are only HTTP to our gateways so you can see them, but it's IPFS doing the work underneath!

Immutable Paths (URIs) /ipfs/...

Mutable Paths (URIs) /ipns/...

  • /ipns/QmSyjxWfaAmnaK3yBVZWUsfcPV4AFXEfJURuqLvHMxj3Lk/foo/bar/baz.png (<-- this one may break as we'll be making some changes to things)
    a key-addressed path (resolves to content-addressed). verified by signatures.

  • /ipns/a.ex.ipfs.io/foo/bar/baz.png
    a dns-addressed path (resolves to content-addressed) based on a dnslink DNS record.

     > dig TXT a.ex.ipfs.io
     a.ex.ipfs.io.      1800    IN  TXT "dnslink=/ipfs/QmbGiL5Z9Be2vi1MMVEykzz73XB2nryU4T1wwLiS16GNAr"
    

    verified by DNS (DNSSEC). typically, pointed to /ipns/<key>.
    It can also be accessed via http://a.ex.ipfs.io/foo/bar/baz.png
    (by adding an A record to any IPFS gateway, i.e. our public one. it would be HTTPS but not yet getting wildcard certs (ugh!) for examples.)

Ttry these links out! specially the pure HTTP one was really fun to get working. thats HTTP hosted.

How Path Resolution Works

For now, assume we have a way to do the following (explained further down, in the "Routing" part):

  • fetch content-addressed graph nodes by their hash (check hash matches).
  • fetch key-addressed pointers by their hash (check signature matches).

Resolving Immutable Paths (by walking the merkledag):

  1. start with the first path component, retrieve the object whose hash matches the component.
  2. for each remaining component C, look into the current node's links, for a mapping from C to the next hash. (like filesystem directories with inodes).
  3. Terminate at the end of the path. If at any point no mapping exists, it is an invalid path and return an error.
  4. The entire chain is authenticated (by being content-addressed)

Resolving Mutable Key-based Paths (pointers in the name system, hence ipns):

  1. start with the first path component, which is the hash of a public key.
  2. lookup a pointer (in the routing system) which matches the hash. valid pointers MUST be signed by the private key corresponding to the public key.
  3. The pointer value is another path (mutable or immutable), resolve it to get a merkledag object.
  4. This is the root of the rest of the original path.

Resolving Mutable DNS-based Paths

  1. Lookup DNS TXT records at the given domain
  2. Select the first (ordered) which matches "dnslink=". the value is another path (mutable or immutable). Resolve it to get a merkledag object.
  3. This is the root of the rest of the original path.

Content Routing and Records

The hardest part of making IPFS work is to find a way to distribute the content that is fast, scalable, secure, compatible with human policies, and which can be run entirely by simple IPFS nodes (i.e. no change to the internet at all, and no central points whatsoever). This is hard, but possible by cleanly layering.

One of the problems with plans to evolve the internet towards content routing ({NDN/CCN, XIA, etc}) is that they requires upgrading the internet itself, which is really hard to warrant without massive demand. Even with large demand, IPv6 has yet to be fully deployed :( -- which does not give me any hope of seeing NDN/CCN massively deployed in the core, without FIRST establishing the use of content-addressed networks. Meaning that end developers (web developers) must be able to use content-addressed networks to move lots of data (video, etc) extremely effectively well before substantial demand to improve the underlying network will materialize. So as we see it, by making IPFS usable to end developers we can create demand for these architectures as well.

Unlike many other p2p-based systems, our model requires that:

  • Content MUST be able to move as fast as the underlying network permits. This means that nodes in the same lan/datacenter should not route through the backbone (unless a human policy requires it).
  • Nodes MUST be able to ONLY store and download the content they explicitly choose to. (important even if encrypted, given illegal bits). This is in contrast to many other p2p systems.
  • Nodes MUST be able make choices regarding choice of transports + routing, to enact policies, and to make tradeoffs (say between privacy and performance).

This content model requires separating Content from Routing Records. The Content is just merkledag objects which represent users' files and other datastructures. The Routing Records (also merkledag objects) are explicitly used for finding other nodes, and for finding content. These records are cryptographic artifacts with some room to play with, so a variety of cryptographic protocols can be deployed on top of this record system. We think of it as an improved DNS learning from all the uses of things like Git, BitTorrent, Bitcoin, and so on.

The Routing System interface is very simple -- it's a distributed key-value store with some requirements that the values conform to some validity rules. These can include cryptographic proofs. There is also a total ordering, so there is a "best record", and optional freshness guarantees. This allows nodes storing records to discard invalid, old, or "less good" records.

type Routing interface {

    // Put allows a node to publish a record to a given key. 
    // These are validated before being stored anywhere.
    Put(Key, Record) (error)

    // Get allows a node to retrieve records for a given key. 
    // These are validated before returning to the caller.
    Get(Key) ([]Record, error)
}

Many things can support this key-value store interface, from a centralized database, to a totally distributed hash table. In reality, we can use well-known and established systems, such as DNS, global-scale DHTs, pub/sub (multicast) overlays, OpenFlow, and more. There is also the possibility of using completely oblivious RAM systems for routing, which would be a tremendous complement to tor/i2p and other privacy sensitive applications. Routing systems can also choose whether to be part of the global routing system (i.e. make record available over to all systems) or ONLY to a smaller subnet.

The flexibity here exists because ultimately, the use case will dictate important constraints on the conditions for routing. This is important for human routing policies, from performance to privacy constraints. There is no "one-size fits all" routing system yet. (Maybe {NDN/CCN, XIA, etc} could be it, but we cannot wait for them to be deployed before IPFS works).

The way we see it, we will have one global DHT, DNS, and many smaller special-purpose routing systems.

Today, IPFS uses a Kademlia-based DHT, which will soon turn into a Coral+S/Kademlia based DHT, and will continue to learn from DHT research. We will also have other routing systems deployed -- including mDNS and pub/sub (multicast) -- in 2016.

One interesting thing we're building on this routing system (and ipfs as a whole) is a datastructure + tools + libs which allows application developers to use a web of trust like SPKI/SDSI, tied to whatever naming system they want (can be pure key chains, or bound to DNS, or bound to other things).

@jbenet jbenet changed the title from How does routing work with IPFS? to How does Resolution and Routing work with IPFS? Nov 2, 2015

@pocketmax

This comment has been minimized.

Show comment
Hide comment
@pocketmax

pocketmax Nov 2, 2015

This is a fantastic article and the video did shed a lot of light on merkledag. I have a semi-related question though. Last year someone was talking to me about a new residential indoor wireless spec that focused on long range (a few miles) over bandwidth and had an P2P component. So a community with a few of these wifi hotspots could route internet access through one or two shared connections. If an access point goes down, the other access points could route around it. I've been looking for the technology for a year and I have no clue what it is. He said it was still a spec and a long way from being ready for commercial use but I want to find it anyway so I can keep tabs on the tech. I'm sure some of what he was telling me was pie in the sky but a new wireless protocol with long range P2P capabilities would be a good fit for IPFS.

pocketmax commented Nov 2, 2015

This is a fantastic article and the video did shed a lot of light on merkledag. I have a semi-related question though. Last year someone was talking to me about a new residential indoor wireless spec that focused on long range (a few miles) over bandwidth and had an P2P component. So a community with a few of these wifi hotspots could route internet access through one or two shared connections. If an access point goes down, the other access points could route around it. I've been looking for the technology for a year and I have no clue what it is. He said it was still a spec and a long way from being ready for commercial use but I want to find it anyway so I can keep tabs on the tech. I'm sure some of what he was telling me was pie in the sky but a new wireless protocol with long range P2P capabilities would be a good fit for IPFS.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Nov 2, 2015

Member

absolutely. not sure which protocol it is, but will keep eyes out for it

Member

jbenet commented Nov 2, 2015

absolutely. not sure which protocol it is, but will keep eyes out for it

@noffle noffle referenced this issue Feb 8, 2016

Closed

Sprint: February 1 #88

@novocodev

This comment has been minimized.

Show comment
Hide comment
@novocodev

novocodev Mar 31, 2016

@jbenet in the description you give above could you expand a little on:

Nodes "make available" records for the content they're willing to serve, and the names they control (hold the private key).

From the ifs spec I understood that nodes have an identity based on a public/private key pair that is used to sign IPNS mutable records.

But the above suggests that a Node may have control over 'other' public/private key pairs that can be used to sign and publish IPNS mutable records.

Is this a feature? and could it be used to create virtual or offline peers that generate content but do not actively participate in the IPNS routing or block exchange?

novocodev commented Mar 31, 2016

@jbenet in the description you give above could you expand a little on:

Nodes "make available" records for the content they're willing to serve, and the names they control (hold the private key).

From the ifs spec I understood that nodes have an identity based on a public/private key pair that is used to sign IPNS mutable records.

But the above suggests that a Node may have control over 'other' public/private key pairs that can be used to sign and publish IPNS mutable records.

Is this a feature? and could it be used to create virtual or offline peers that generate content but do not actively participate in the IPNS routing or block exchange?

@uvok

This comment has been minimized.

Show comment
Hide comment
@uvok

uvok Apr 2, 2016

Regarding IPNS, does a daemon on a peer need to be running for other peers to resolve the name of this peer?

I published some content via ipns name publish, resolved it via the public ipfs gateway (gateway.ipfs.io/ipns/....), and the published content was displayed. Then I shut down my PC (where the daemon was running) and tried resolving it again later. Got an error message "Path Resolve error: could not resolve name.".

uvok commented Apr 2, 2016

Regarding IPNS, does a daemon on a peer need to be running for other peers to resolve the name of this peer?

I published some content via ipns name publish, resolved it via the public ipfs gateway (gateway.ipfs.io/ipns/....), and the published content was displayed. Then I shut down my PC (where the daemon was running) and tried resolving it again later. Got an error message "Path Resolve error: could not resolve name.".

@Kubuxu

This comment has been minimized.

Show comment
Hide comment
@Kubuxu

Kubuxu Apr 2, 2016

Member

The name will be valid for 24h if the daemon that published it goes down.
It is also possible for a name to disappear from the network earlier as IIRC they are not republished by 3rd party peers.

Member

Kubuxu commented Apr 2, 2016

The name will be valid for 24h if the daemon that published it goes down.
It is also possible for a name to disappear from the network earlier as IIRC they are not republished by 3rd party peers.

@flyingzumwalt

This comment has been minimized.

Show comment
Hide comment
Contributor

flyingzumwalt commented May 23, 2017

@mohsenghajar

This comment has been minimized.

Show comment
Hide comment
@mohsenghajar

mohsenghajar Nov 24, 2017

Where can I get the most up to date information on this question? Most interested in who is working on implementing IPFS on overlay networks?

mohsenghajar commented Nov 24, 2017

Where can I get the most up to date information on this question? Most interested in who is working on implementing IPFS on overlay networks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment