Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Discovery instead of Protocol Negotiation #32

Open
Stebalien opened this issue Aug 15, 2017 · 26 comments
Open

Service Discovery instead of Protocol Negotiation #32

Stebalien opened this issue Aug 15, 2017 · 26 comments

Comments

@Stebalien
Copy link
Member

Currently, there's a lot of overhead when each connection is established opening up multiple streams to figure out what services the other peer supports. This also tends to be a bit racy and really messy.

Instead, we could introduce a push-based service discovery system. On connect, a peer would announce what protocols it supports up-front (the list shouldn't be that large) and whenever a new service gets added/removed, the peer would tell its peers about this change. We could then avoid an entire round trip when opening new streams because we already know the protocols our peers support. This would also remove the need for a lot of our peer probing code (we could just have the host keep track of which connections support which protocols).


This would also make it possible to implement a pseudo message-oriented system on-top of our stream-oriented communication primitives. Why? Because most consumers of libp2p in go-ipfs are already (mostly) message oriented. That is, we have, for almost every service:

  1. Something to keep track of peers and metadata about them (and keep track of the status of the connection).
  2. Go routines to accept messages from peers and pipe it into a central event loop.
  3. Go routines to send messages back out to peers. Note: they are sent back to peers in-order, this is the only non-message oriented part of this and why implementing this on-top of streams actually makes sense.

We shouldn't have to implement this over-and-over for every service; this is where a lot of our leaks and race conditions come from...

If we had this central service registry, we could use it to transparently manage all streams, opening them as necessary and possibly even managing associated metadata.

@dryajov
Copy link
Member

dryajov commented Aug 15, 2017

@Stebalien I like the general idea and I believe this is to some extent along the same lines as ipfs/notes#237. We need a way to tell what a peer can do and who it knows about (connected to?) and I believe @vyzo and I have both independently proposed something very similar to this while working on the circuit implementations.

@whyrusleeping
Copy link
Contributor

We do currently announce what protocols we support up front (via identify). The issue i think is that other protocols dont bother waiting for identify to finish. Great suggestion overall, i agree.

@RangerMauve
Copy link

How does one use the identify API from js-ipfs?

@dryajov
Copy link
Member

dryajov commented Jan 12, 2018

@RangerMauve afaik identify is a lower level libp2p protocol and we don't expose it with the http API.

@RangerMauve
Copy link

@dryajov js-ipfs in the browser can run a full IPFS node which uses js-libp2p. Where exactly in libp2p would this call be? Would it be somewhere inside swarm?

@dryajov
Copy link
Member

dryajov commented Jan 12, 2018

@RangerMauve
Copy link

RangerMauve commented Jan 12, 2018

Thank you @dryajov!
So this will give me a list of protocols that are supported (like circuit relay), are there services that don't advertise as protocols that are currently undiscoverable? (Or is service===protocol in libp2p?

Also, it looks like peer.on("peer-mux-established") is what should be used to wait for the info to be available. This may be relevant to libp2p/js-libp2p-websocket-star#41 so that it can discover relay-able peers.

@dryajov
Copy link
Member

dryajov commented Jan 12, 2018

So this will give me a list of protocols that are supported (like circuit relay), are there services that don't advertise as protocols that are currently undiscoverable? (Or is service===protocol in libp2p?

@RangerMauve unfortunately, not really - it will provide you with a more up to date peer-info of the remote peer. It's possible to tell which transports are available based on the peers multiaddrs, but nothing more.

Also, it looks like peer.on("peer-mux-established") is what should be used to wait for the info to be available. This may be relevant to libp2p/js-libp2p-websocket-star#41 so that it can discover relay-able peers.

Yep, in fact, that's what circuit uses to probe incoming peers for relay support.

@RangerMauve
Copy link

Thanks for the details!

Is this the canonical place to discuss the discovery service?
What can I do to help contribute to this feature?

@dryajov
Copy link
Member

dryajov commented Jan 12, 2018

Is this the canonical place to discuss the discovery service?

Good question, could you provide more info on what is your use case?

@dryajov
Copy link
Member

dryajov commented Jan 12, 2018

@RangerMauve could you create an issue describing what you're looking for, I believe it would help identifying where it falls in the stack and what is required. Also, I believe that service discovery is something that means different things to different folks :) - in the future, it would be helpful to clearly identify, what's a service, whats a protocol and what are general purpose capability discovery.

@RangerMauve
Copy link

I'm learning as much as I can about the different parts of libp2p and IPFS at the moment, but my goal is to help contribute to making the browser build more fully-featured. I got here while looking into what was missing from getting the browser to participate in the DHT.

There's some effort happening for migrating websocket-star to use circuit-relay instead of having centralized socket.io servers, and that seems to be dependent on circuit-relay working and being able to discover circuit-relay peers, which seems to be related to this issue of lack of service discovery.

@RangerMauve
Copy link

Question, how does this relate to Resource Discovery from the spec?

@Stebalien
Copy link
Member Author

Question, how does this relate to Resource Discovery from the spec?

Mostly unrelated. Resource Discovery allows us to find peers that claim to be able to do something. However, that doesn't necessarily tell us precisely which protocols they support and if they're willing to speak the protocol with us at the current point in time.

The issue here is more about structuring libp2p programs. Instead of having each libp2p service say "Look, a new peer! I wonder if they'll talk with me.", we can have services register with libp2p and tell libp2p "Hey, libp2p, when you see a peer willing to speak protocol X with me, tell me."

The current system (in go-ipfs, at least), is like a bunch of puppies running to the door to see if someone is interested in petting them. What I'd like is a bunch of well-trained dogs that wait to be called when needed.

@RangerMauve
Copy link

@Stebalien What would need to happen to add a new field to the PeerInfo data type listing service names?
I think that that would be the most obvious place to put it and would make it easy to filter remote peers for ones that offer a specific service we want to connect to.

I think that the bunch of puppies approach may be behind some of the performance issues we're seeing in the browser, too. (I don't have numbers to back that claim up)

@Stebalien
Copy link
Member Author

What would need to happen to add a new field to the PeerInfo data type listing service names?

One issue there is that the peerstore is already massive. However, I'd absolutely love that feature. In theory, we could just add it to the Peer message defined in the DHT's protobuf. After that, we'd need to record this information somewhere in the peerstore.

However, I don't think doing that will make anything faster at the moment as, currently, all nodes offer pretty much every service (except, experimental ones like relay and pubsub). This will only really become an issue when we make relaying non-experimental (we need a nice way to advertise "I'm willing to relay").

@RangerMauve
Copy link

Would writing up a change to PeerInfo be welcome, or should this be discussed more first?

I'm also a little confused by the docs for listening. It implies that one should be using a multicodec, but looking at examples it seems that node.handle can take an arbitrary string.

I think PeerInfo should contain a list of the things that the node is listening on, and I'm a little confused as to what data type that should really be.

@Stebalien
Copy link
Member Author

@RangerMauve it definitely needs to be discussed more (and probably in a separate issue). It's a non-trivial change with wide-reaching consequences.

I'm also a little confused by the docs for listening. It implies that one should be using a multicodec, but looking at examples it seems that node.handle can take an arbitrary string.

I believe those docs are old. We use strings for protocols (usually of the form /name/version). As a matter of fact, we originally wanted to use IPFS paths pointing to the spec (/ipfs/QmId/my_spec/1_0.md) but didn't for performance reasons (for now).

I think PeerInfo should contain a list of the things that the node is listening on, and I'm a little confused as to what data type that should really be.

We'd have a list of strings. However, this does motivate adding these protocols to the multicodec table and using multicodecs instead (for space efficiency).

@RangerMauve
Copy link

@Stebalien is this the correct repo to open the issue in?

motivate adding these protocols to the multicodec table and using multicodecs instead

I think that the nature of the multicodec table will make it harder for people to use custom (application specific) protocols.

What about using hashes of protocol names? That way you can use a string to represent it in code, but store the existence of it in a binary multihash representation. This will also give a bit of privacy as to what the protocol is actually called and will make it more annoying to reverse the purpose of services.

@Stebalien
Copy link
Member Author

@RangerMauve

is this the correct repo to open the issue in?

Yes.

think that the nature of the multicodec table will make it harder for people to use custom (application specific) protocols.

We'd would have to support both.

What about using hashes of protocol names? That way you can use a string to represent it in code, but store the existence of it in a binary multihash representation. This will also give a bit of privacy as to what the protocol is actually called and will make it more annoying to reverse the purpose of services.

I believe they'd generally end up longer that way. For example, bitswap is /bitswap/1.1.0 atm (14 bytes) while the hash would probably be 32 bytes. I'm also generally scared of "a bit of privacy" as it tends to lead people into a false sense of privacy where non really exists.

@RangerMauve
Copy link

RangerMauve commented Jan 30, 2018

I just noticed that there is already a protocols section in the protobuf for the the Identify service.

Isn't that exactly what we were just talking about?

Edit: It's in the go implementation, too.

@Stebalien
Copy link
Member Author

@RangerMauve It is. However, we aren't really using it properly.

@RangerMauve
Copy link

RangerMauve commented Jan 30, 2018

@Stebalien

So the groundwork for this would be:

When handling a new protocol in swarm, add it to your local PeerInfo's protocol list.

That should be a pretty simple change, and it's totally backwards compatible.

From there, protocols that rely on other protocols will have to wait for ident to finish and check if a protocol is present in the PeerInfo.
This will have to be delayed while the ecosystem upgrades to versions of libp2p that have this change.

Would PRs for these two changes be welcome, or is there more discussion needed?

@Stebalien
Copy link
Member Author

@RangerMauve we need to discuss this in a new issue (and invite anyone interested). Mostly concerning whether or not we actually want this, specific usecases, requirements like signing peer info's, etc.

@RangerMauve
Copy link

K, I'll write something up. Thank you!

@RangerMauve
Copy link

Looking at the docs in multiformats-select, it looks like you can get information about what the other end supports through the multistream protocol. I guess this means that one doesn't have to bother using PeerInfo anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

4 participants