Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial DHT notes #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
66 changes: 66 additions & 0 deletions org/dht.md
@@ -0,0 +1,66 @@
#DHT Implementation Guidelines

The following document lists some implementation guidelines and heuristics for
implementing a telehash Distributed HashTable.

> Note: several heuristics (especially numerical values) in this document are
> based on other networks with different usage patterns, or completely made up.
> Expect some of these values to change as the DHT becomes large enough to have
> observable behaviors.

##DHT Behavior
The DHT is based on Kademlia, with several important differences:
- Node identifiers are not random, but instead are hashnames, based on the
public portion of a RSA keypair.
- Node identifiers are 256 bits, not 160 bits
- All communication takes place over an open line, including DHT operations.
- Having a hashname, IP and port is insufficient to connect to a new peer. You
must have the RSA public key of the peer in order to talk to them
- Intermediate nodes introduce us to closer or destination nodes, and in the
process aid in exposing a public UDP port.
- The DHT mechanism is only used to find other hosts, and does not contain
find_value or store commands to find, store and replicate other values.

Some guidelines:

1. Peers can be sorted into five categories:
- Unconnected peers - have no line with the peer
- Unconnected seeds - have no line with the peer, but expectation of direct
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the distinction here is subtly different... you have a "complete" hashname, where you know the ip:port and pubkey so that you can (possibly) connect anytime (happens for seeds, but also after you get a connect but never form a line, or after a line is disconnected).

You then have an "incomplete" hashname, which is either just the hashname itself or it plus the ip:port, along with the hashname(s) that you discovered it via. So you can probably connect to it, but you need to ask someone else.

Since every hashname is a peer, you have one third kind, a hashname provided by the app that you are actively seeking, so you don't have any information yet but are in the process of trying to resolve it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I have:

  • A hashname, which is just a string. I also sometimes have a hashvalue in the DHT, which is the value as a big integer.
  • Pointer, with hashname, ip, and port
  • Peer, which also has a public key
  • Seed, which is a peer with an expectation that you can connect directly and the ability to create from JSON

These actually don't save information about pending open packets or lines, because as much as possible I would like these to be data types which are immutable

Instead, the switch has a table for going from hashname to a sent open, and a table from going from hashname to line.

The incomplete hashname (which for me is a pointer w/ reference to server that I seeked it from) doesn't exist as an explicit object today, because you only have a hashname in this state while trying to convert it to a full peer with a line.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can have a Pointer w/o also having the "via" (reference)? Also, you may have multiple Pointers for a hashname (different ip/ports for the same hashname)...

This feels like it's getting close though, It'd be great to get to a common set of terminology for these states across all the implementations, I'm struggling w/ the data models in the various switches I'm working on along these lines, so I appreciate the thought here :)

connectability and no need for introduction.
- Active Line - have had an exchange resulting in me receiving an ack in
the last 15 minutes.
- Inactive Line - no qualifying exchange in the last 15 minutes
- Bad Line - has had a number of sequential failures
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this state, should the behavior be to just error&delete any reference to it? Such that if a request happens again, it goes through a new seek flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure.

Comparing to Kademlia, I get the feeling that bad lines would be the ones which actually will get removed when new lines are created which would overflow the k-buckets. I think both inactive and bad lines can become "active" if traffic again flows.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inactive can revert back to active sure, but I think it's always safer to delete anything that a switch considers "bad", just to force removing any in-memory keys and restarting the whole state.

2. When receiving a "seek" command (similar to find_node in Kademlia), you
should strongly prefer returning active line peers.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you must only return active lines, because you have to be able to send a connect to them :)

I think there's going to be a heuristic that evolves here over time too when we get into using the "family" value... if you want to return say 10 hashnames, up to half of them should be from the same family.

If the sender is one of the closest, they must always be sent back also (it's how the sender discovers the IPP it's known as to another peer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an inactive line might be better than returning nothing; inactive just means I haven't talked to them in a bit, or had a few errors but not enough to recognize it has gone bad.

I admit I'm still unsure what family buys you over just having one huge DHT. Family seems much more useful if you are a distributed key/value store.

Good point. Seek should return the peer itself if the peer id is seeked

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on inactive.

Family is pretty experimental of a concept yet, but I believe it's very important in the long run, would take an in-person session to talk through it right now :)

3. Packet retry logic is not always required. Best to leave retry logic to
higher level concepts like finding a hashname or sending streamed data.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems out of place here... but I'm not sure how to better describe it :)

I believe switches will have built-in retry logic for "durable" channels, and the dht queries might default to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess the way to say it is that at the raw packet level there isn't any explicit ACK or retry logic. The code above it would cope with retrying (resend an open request, resend stream packets that haven't been acked in the window, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this does mean that retry and failure heuristics need to be defined for open,connect/peer, seek, etc.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see all the retry logic be grounded in a channel, and you only ever initiate the seek/open/etc process when a channel is being opened (has a packet being sent). In that case, the open logic doesn't need any inherent retry logic, since the packet waiting in a channel will get re-tried and ultimately re-trigger another open.

4. Cache sent open packets until a line is opened. Send the same open packet on
either side if you need to retry. This will eliminate potential issues with
different DH keys negotiated on each side.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, yep!

5. If you suspect you are behind a stateful firewall/NAT, you should send or
receive one packet within every minute to maintain your UDP port. It would
be recommended to ping the line which has gone the longest without
interaction.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, you need to ping every open line that has been idle within 60 seconds, so that all of the source::dest mappings stay alive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on that - do you just need to send one packet from your source out, or one per source::dest mapping? I thought it was the former.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time it's the latter, unfortunately... but one packet per minute per line isn't too bad, it's just kind of a pain to organize efficiently code-wise.

6. Keep track of ping failures, stream retries to an open line, or open
retries against a node. After a certain number of failures, that node can
be considered 'bad' and dropped from the DHT.
7. It is recommended to have a reasonably long initial delay before retrying
with an exponential falloff. Assume nodes who do not immediately respond
are actually under some congestion or CPU load - sending more traffic
probably won't help.
8. When seeking for a particular hashname, it is recommended putting all
established lines and seeked hashnames off of established lines in a
ordered list based on distance to the destination.
9. Rather than treating a peer/connect as something to be waited on and
retried, pursue several of the closest nodes to the destination at once
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By several, the magic number is 3, always be querying/trying three at once (when possible).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the origin of that magic number 3?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's suggested for a default concurrency level in the kademlia paper, but in my experience w/ unreliable systems it's a pretty decent starting point too :)

10. The DHT results should favor returning nodes which you have known about for
the longest period of time. Kademlia has buckets of a maximum size sorted by
LRU, but does not free up a bucket until one of the nodes in it is known to
be bad.

Two ways to start using network:
1. Start as a seed: Randomly pick a seed and ask it for peers closest to me (seek myself). Establish lines with them.

> Question: could I seek my complement node? (aka me xor 0xffff...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You always want to seek yourself, so that you fill up the buckets closest to you. I'm not sure what seeking other nodes would do to help anything in the bootstrapping process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either. Perhaps trying to build up more kbuckets would help you answer questions better and to essentially perform the first lookup for a future requested hashname ahead of time.

Seems like something a seed would want to do so that it can fill all its buckets, not just the ones closest to its hashname. Do you think a seed is something where you select a seed randomly in order to start with the system, or where you specifically request a seed from the list based on closeness of their hashname?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket maintenance should regularly re-seek itself to hashnames in each bucket, and those will intrinsically be returning other hashnames at roughly the same distance, so each bucket will naturally fill itself, it's a very nice self-balancing system that way.

On first boot any hashname should ping all the seeds it has to start. On later boots it should try to re-ping the seeds as well as any other cached hashnames it elected as high quality and/or that the app was using.

2. Seeds are treated as lines for the purpose of firing off seek requests.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A seed has no special meaning other than it had a pre-known IPP+pubkey to help with bootstrapping... when possible, apps should occasionally store out a cache of well-known hashnames and their info and use that in conjunction w/ the seed list when bootstrapping a switch.

This cache should always include any network-local hashnames detected too, so that a mesh can form when there's no internet connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have to do an open to start talking to a hashname, and you can't get the public key to send an open until you have connect/peer through another hashname, is there a way to tell if you have stateful NATs between the two of you?

This might affect keepalive packets, and would also indicate whether it is 'worth' caching their information, since you can connect to them directly without going through an intermediate peer.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detecting if you're behind a NAT is useful since if you aren't you don't need to send minutely keepalives, but other than that I'm not sure I know what you mean about caching or how else it might be useful?