Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rendezvous protocol #44

Closed
wants to merge 17 commits into from
Closed
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions rendezvous/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Rendezvous Protocol

The protocol described in this specification is intended to provide a
lightweight mechanism for generalized peer discovery. It can be used
for bootstrap purposes, real time peer discovery, application specific
routing, and so on. Any node implementing the rendezvous protocol can
act as a rendezvous point, allowing the discovery of relevant peers in
a decentralized fashion.

## Use Cases

Depending on the application, the protocol could be used in the
following context:
- During bootstrap, a node can use known rendezvous points to discover
peers that provide critical services. For instance, rendezvous can
be used to discover circuit relays for connectivity restricted
nodes.
- During initialization, a node can use rendezvous to discover
peers to connect with the rest of the application. For instance,
rendezvous can be used to discover pubsub peers within a topic.
- In a real time setting, applications can poll rendezvous points in
order to discover new peers in a timely fashion.
- In an application specific routing setting, rendezvous points can be
used to progressively discover peers that can answer specific queries
or host shards of content.

### Replacing ws-star-rendezvous

We intend to replace ws-star-rendezvous with a few rendezvous daemons
and a fleet of p2p-circuit relays. Real-time applications will
utilize rendezvous both for bootstrap and in a real-time setting.
During bootstrap, rendezvous will be used to discover circuit relays
that provide connectivity for browser nodes. Subsequently, rendezvous
will be utilized throughout the lifetime of the application for real
time peer discovery by registering and polling rendezvous points.
This allows us to replace a fragile centralized component with a
horizontally scalable ensemble of daemons.

### Rendezvous and pubsub

Rendezvous can be naturally combined with pubsub for effective
real-time discovery. At a basic level, rendezvous can be used to
bootstrap pubsub: nodes can utilize rendezvous in order to discover
their peers within a topic. Contrariwise, pubsub can also be used as
a mechanism for building rendezvous services. In this scenerio, a
number of rendezvous points can federate using pubsub for internal
real-time distribution, while still providing a simple interface to
clients.


## The Protocol

The rendezvous protocol provides facilities for real-time peer
discovery within application specific namespaces. Peers connect to the
rendezvous point and register their presence in one or more
namespaces.

Peers registered with the rendezvous point can be discovered by other
nodes by querying the rendezvous point. The query specifies the
namespace for limiting application scope and optionally a maximum
number of peers to return. The namespace can be omitted in the query,
which asks for all peers registered to the rendezvous point.

The query can also include a cookie, obtained from the response to a
previous query, such that only registrations that weren't included in
the previous response will be returned. This allows peers to
progressively refresh their network view without overhead, which
greatly simplifies real time discovery. It also allows for pagination
of query responses, so that large numbers of peer registrations can be
managed.

### Registration Lifetime
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there are several comments referencing "namespace ownership" but that concept does not appear to be present in this document. If it's a part of the protocol or caller contract it's probably worth explicitly mentioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespace ownership is beyond the scope of the protocol I think -- it quickly gets into ACLs and that's a rathole I don't want to go down :)


Registration lifetime is controlled by an optional TTL parameter in
the `REGISTER` message. If a TTL is specified, then the registration
persists until the TTL expires. If no TTL was specified, then a default
of 2hrs is implied. There may be a rendezvous point-specific upper bound
on TTL, with a minimum such value of 72hrs. If the TTL of a registration
is inadmissible, the rendezvous point may reject the registration with
an `E_INVALID_TTL` status.

Peers can refresh their registrations at any time with a new
`REGISTER` message; the TTL of the new message supersedes previous
registrations. Peers can also cancel existing registrations at any
time with an explicit `UNREGISTER` message.

### Interaction

Clients `A` and `B` connect to the rendezvous point `R` and register for namespace
`my-app` with a `REGISTER` message:

```
A -> R: REGISTER{my-app, {QmA, AddrA}}
R -> A: {OK}
B -> R: REGISTER{my-app, {QmB, AddrB}}
R -> B: {OK}
```

Client `C` connects and registers for namespace `another-app`:
```
C -> R: REGISTER{another-app, {QmC, AddrC}}
R -> C: {OK}
```

Another client `D` can discover peers in `my-app` by sending a `DISCOVER` message; the
rendezvous point responds with the list of current peer reigstrations and a cookie.
```
D -> R: DISCOVER{ns: my-app}
R -> D: {[REGISTER{my-app, {QmA, Addr}}
REGISTER{my-app, {QmB, Addr}}],
c1}
```

If `D` wants to discover all peers registered with `R`, then it can omit the namespace
in the query:
```
D -> R: DISCOVER{}
R -> D: {[REGISTER{my-app, {QmA, Addr}}
REGISTER{my-app, {QmB, Addr}}
REGISTER{another-app, {QmC, AddrC}}],
c2}
```

If `D` wants to progressively poll for real time discovery, it can use
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear to me the intended use so perhaps this comment is moot, but: might the progressive poller also want information about unregisters? If they don't get that information then it seems like they'll incrementally build a registration set of non-decreasing size and decreasing usefulness because the list the caller keeps will start to include many stale peers who have unregistered over time. Depending on churn rate the number of stale registrations the caller has could potentially dwarf the number of actually registered peers. There are obvious solutions on the caller side but unclear to a reader like me what's intended here. Might be worth a comment at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting the unregistrations in a discovery response is problematic for a couple of reasons:

  • requires the rendezvous point to keep track of unregisters modulo the poller's state, which is expensive and couples too tightly.
  • complicates the processing of the response in the client, as it assumes a state machine that may simply not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that explicit unregisters are expected to be rare events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another points is that the progressive discovery mechanism is not intended to build a consistent replica of the rendezvou's point internal state, but rather discover new peers since the last discovery.
A rendezvous federation protocol (TBD) would certainly forward unregistrations though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that explicit unregisters are expected to be rare events

I think that explicit unregisters might be useful when a node does a clean shutdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course!

The question is whether it is worth the trouble propagating this in discover queries, or simply let the TTLs expire stale entries in clients.
I really don't like having to track unregistrations in the database though; Unregister should simply signal deletion and no further propagation.

Copy link
Member

@victorb victorb May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leave it up to the clients to decide when they want to disconnect. If they want to decrease the amount of registrations, they can purge it locally (checking if ttl expired and remove that registration would be one way, bad ping could be another). Seems to simplify by not having the unregister.

the cookie obtained from a previous response in order to only ask for
new registrations.

So here we consider a new client `E` registering after the first query,
and a subsequent query that discovers just that peer by including the cookie:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like might not be a good idea to require registries to keep cookies forever (and probably not even across restarts) so the language here should probably indicate that what is returned when you pass the cookie is only likely an increment set; it could be the full set if the cookie is too old, unknown, the peer reset, whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registries shouldn't keep state for cookies at all, that's why they are client-side cookies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a stale cookie could elicit an arbitrary set in response.
I think we should simply return an error if the cookie has become stale, which is something the registry can easily detect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an E_INVALID_COOKIE status code that can fail a progressive discovery and signal a restart.

```
E -> R: REGISTER{my-app, {QmE, AddrE}}
R -> E: {OK}
D -> R: DISCOVER{ns: my-app, cookie: c1}
R -> D: {[REGISTER{my-app, {QmE, AddrE}}],
c3}
```

### Protobuf

```protobuf
message Message {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there need to be error types.
For ex. when a peer tries to register for an ID that the peer does not own.
Or when the peer tries to register a namespace that is way too long (say a big .jpg in base64).

Copy link
Contributor Author

@vyzo vyzo Apr 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want a RegisterResponse?
Sure, we can do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also DiscoverResponse could include an optional error, too

Copy link
Contributor Author

@vyzo vyzo Apr 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, although it's utility would be rather limited.
What is considered an error in this case? A non-existent namespace is equivalent to empty, and namespace name that is too long will always be empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, although it's utility would be rather limited.

Let's not do this then.

So you want a RegisterResponse?

The errors that it would throw would be:

  • E_NAMESPACE_TOO_LONG
  • E_OWNERSHIP_NOT_TRUSTED (or similar)

What do you think of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E_NAMESPACE_TOO_LONG

that could be generalized to E_INVALID_NAMESPACE to convey rejected characters for some implementation. Or it could be a different error altogether.

E_OWNERSHIP_NOT_TRUSTED

Maybe a generic E_NOT_AUTHORIZED is more apt?

Another one we want is E_INVALID_MULTIADDR perhaps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a generic E_NOT_AUTHORIZED is more apt?

Yes

Another one we want is E_INVALID_MULTIADDR perhaps.

Good idea, although I don't know if we should validate those as the clients might have newer incompatible protocol tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have E_INVALID_PEER_INFO to signal generic rejection of the peer info, and we can see what kind of validation makes sense.

enum MessageType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to merge register and unregister requests? Or dropping the message type completly?
The client could simply check the length of the array for all types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want unregister so that we can depart from a namespace without having to disconnect.
The message type is needed to disambiguate the protocol.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For messages with enums it is easy to to create messages that accidentally say something they don't intend to. The way I've seen this happen is that if an enum field has a default value that carries some semantics (name of a message, or a status) then the reader can't distinguish between a programmer explicitly setting that value versus not setting it at all. Or at least the API I was using a few years ago didn't distinguish between the two by default. Anyway, in order to avoid accidentally having semantics for enum fields the pattern was to have the default enum value always be something like UNSET or NOT_SET so that it was clear when someone explicitly set the value versus not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly see some value in defensive programming, but I am not sure I like polluting the protocol with something that protects against PEBKAC-style programming errors.

It would also be rather inconsistent as I don't think we do this anywhere else.

Copy link
Contributor Author

@vyzo vyzo Apr 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the casual programmer, who is more likely to suffer from such PEBKAC, should never have to touch these protobufs directly.
These should be well abstracted by implementation libraries, and bugs of this type should be quickly weeded out there.

REGISTER = 0;
REGISTER_RESPONSE = 1;
UNREGISTER = 2;
DISCOVER = 3;
DISCOVER_RESPONSE = 4;
}

message PeerInfo {
optional bytes id = 1;
repeated bytes addrs = 2;
}

message Register {
optional string ns = 1;
optional PeerInfo peer = 2;
optional int64 ttl = 3; // in seconds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this were named ttlSec it would be unambiguous and the comment would be unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it (it's ugly -- what is ttlSec? It's ttl in seconds. Is there a ttl not in seconds?), but I also don't feel strongly about it -- so if people would prefer ttlSec as the field, we can change it.

}

enum RegisterStatus {
OK = 0;
E_INVALID_NAMESPACE = 100;
E_INVALID_PEER_INFO = 101;
E_INVALID_TTL = 102;
E_NOT_AUTHORIZED = 200;
}

message RegisterResponse {
optional RegisterStatus status = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always found an optional error string to be polite. For example if E_INVALID_NAMESPACE you could say what the max length is or what the problem is otherwise. Would save the caller from having to guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's a nicety we can add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as statusText.

}

message Unregister {
optional string ns = 1;
optional bytes id = 2;
}

message Discover {
optional string ns = 1;
optional int64 limit = 2;
optional bytes cookie = 3;
}

message DiscoverResponse {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might you want a way to signal errors? For example namespace too long or internal server error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.
We briefly discussed this with @mkg20001 and it seemed of limited utility for namespace errors, but internal server errors is probably something we want reported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a ResponseStatus for reporting errors.

repeated Register registrations = 1;
optional bytes cookie = 2;
}

optional MessageType type = 1;
optional Register register = 2;
optional RegisterResponse registerResponse = 3;
optional Unregister unregister = 4;
optional Discover discover = 5;
optional DiscoverResponse discoverResponse = 6;
}
```