Added RFCs

psych0tik · Feb 28, 2011 · 40d9328 · 40d9328
1 parent c52fd5f
commit 40d9328
Show file tree

Hide file tree

Showing 2 changed files with 302 additions and 0 deletions.
diff --git a/rfcs/DHTStore.txt b/rfcs/DHTStore.txt
@@ -0,0 +1,203 @@
+DHT Store
+
+I would like to propose the addition to DHT of 4 primitive functions. These functions will allow for loading
+and storing of small pieces of static or mutable data.
+get, put, getm, and putm.
+
+get:
+  A very simple function which sends a hash and receives a value in return. The value must be hashed
+  and compared to the requested hash.
+
+  A node may choose to lookup only some number of bytes of the beginning of the hash rather than the entire
+  hash. The first 20 bytes are mandatory and must be sent with the "target" key. Additional bytes of the
+  hash may be added with the optional "k" dictionary key.
+
+  If the responding node finds one or more entries which match the hash prefix provided, it may return
+  whichever one it likes and if the rest of the hash is incorrect then the requesting node shall ask again
+  with a larger piece of the hash.
+  Since over time, hash functions are obsoleted and replaced, the get function should exist for any kind of
+  hash function. I propose that the function name has an underscore and a protocol identifier appended to
+  it so get an entry from a SHA-1 key would require get_sha1, it is recommended for DHT coherence that all
+  nodes support the same set of hashes. Clearly entries hashed with one function should be stored in a
+  separate "namespace" from entries hashed with a different function.
+  I propose that all nodes shall support SHA-1 and SHA-256 hash functions. I propose 2 functions so that in
+  the event that SHA-1 is proven unsuitable for collision free storage, an upgrade to SHA-256 will not take
+  as long as an entire development release and adoption cycle.
+  These protocols must be identified by:
+  get_sha1 -- get request with SHA-1
+  get_sha256 -- put request with SHA-256
+  put_sha1 -- get request with SHA-1
+  put_sha256 -- put request with SHA-256
+
+  get_sha256 request:
+  {
+     "a": {
+        "id": <20 byte ID of sending node>,
+        "k": <Optional number of bytes which follow the first 20, to be used as a discriminator on lookup.>
+        "target": <160 high bits of the hash represented as a string.>
+     },
+     "q": "get_sha256",
+     "t": <transaction-id>,
+     "y": "q"
+  }
+
+  In response to a get request where a matching entry is found, the responding node shall return a string
+  which is no longer than 767 bytes. The node also returns it's own Id, a write token and however many ipv4
+  and/or ipv6 nodes will fit within the maximum packet size.
+  The write token does not play the role which it does with announce_peer requests where it attempts to
+  prevent denial of service by signing someone else up for a torrent. In this case it serves 2 purposes,
+  #1 It prevents abusive nodes from evading IP blacklists by spoofing source address.
+  #2 It attaches identity to an announcement since one must be in possession of the IP address which they
+     announce from, thus discouraging those who would abuse an opportunity to anonymously store things on
+     other people's hard drives.
+
+  get_sha256 response:
+  {
+     "r": {
+        "v": <A string of length less than 768 whose SHA-256 is equal to the requested "token" + "k">,
+        "id": <20 byte id of sending node>,
+        "token": <write-token>,
+        "nodes": <n * compact IPv4-port pair>,
+        "nodes6": <n * compact IPv6-port pair>
+     },
+     "t": <transaction-id>,
+     "y": "r"
+  }
+
+put:
+  A put request is used to store a static piece of data in the hash table. It must be less than 768 bytes
+  in length. This limit is to prevent abuse of the table which is meant only for discovery but also to
+  limit the feasibility of storing legally problematic or morally troubling content.
+  Any attempt to store more than this size string should be met with an error message.
+  The storing node generates a hash and bounces it back as a confirmation that it did actually store the
+  content and it received it all correctly.
+
+  put_sha256 request:
+  {
+     "a": {
+        "id": <20 byte ID of sending node>,
+        "v": <A string of length less than 768>
+     },
+     "q": "put_sha256",
+     "token": <write-token as obtained by previous request>,
+     "t": <transaction-id>,
+     "y": "q"
+  }
+
+  put_sha256 response:
+  {
+     "r": {
+        "id": <20 byte id of sending node>,
+        "k": <The entire hash of "v" as computed by the storing node.>
+     },
+     "t": <transaction-id>,
+     "y": "r"
+  }
+
+
+getm:
+  In order to be able to have syndication feeds, one needs to be able to have entries whose IDs are static
+  while their content is mutable. Mutable content requires cryptographic public key signature to verify the
+  changes to the content. I propose that the getm function like the get function has an underscore and a
+  protocol identifier appended to it. Getting mutable data which is signed with ECDSA using curve nistp256
+  will be done with: getm_dsap256. If a node who supports ECDSA with nistp256 is issued a getm request for
+  a protocol which it does not support, it must not return an entry even if it has an entry which matches
+  the id. If a node supports multiple protocols then it must store entries for each protocol in a separate
+  namespace lest a broken protocol be used to overwrite entries for an unbroken protocol.
+
+  As with get, a requesting node may send a request with only some number of bytes of the beginning of the
+  key. If the responding node has one or more entries for that key then she shall send whichever one she
+  likes. If the requesting node cannot validate the signature on the item then he may ask again with a
+  larger segment of the key.
+
+  I propose that every node support ECDSA with NISTp256 and RSA-2048. Whether or not point compression
+  should be used is a matter for debate but the worst case is RSA-2048 which will contain a 256 bytes for
+  key and 256 for signature in the putm message, the value itself may be as much as 767 bytes and the
+  node id will add another 20. The packet size will still only be 1299 and has 175 bytes of space for
+  miscellaneous entries and bencoding overhead before reaching the common MTU of 1500
+  (including IP and UDP headers). mget packets will necessarily be smaller than mput packets since the key
+  is omitted and the requests which comprise the majority of the packets will be even smaller since they
+  can omit most of the key from the request. In comparison with the gigantic RSA-2048 keys, 64 byte
+  uncompressed p256 keys seem tiny. 
+  These protocols must be identified by:
+  getm_dsap256 -- getm request with ECDSA-NISTp256
+  getm_rsa2048 -- putm request with RSA-2048
+  putm_dsap256 -- getm request with ECDSA-NISTp256
+  putm_rsa2048 -- putm request with RSA-2048
+
+  getm_dsap256 request:
+  {
+     "a": {
+        "id": <20 byte ID of sending node>,
+        "k": <Optional additional bytes of the key to discriminate which entry to get.>
+        "target": <160 high bits of the key represented as a string>
+     },
+     "q": "getm_dsap256",
+     "t": <transaction-id>,
+     "y": "q"
+  }
+
+  getm_dsap256 response:
+  {
+     "r": {
+        "v": <A string of length less than 768 whose signature matches the requested key>,
+        "sig": <A string representation of a signature on the content of the string "v">,
+        "seq": <An integer which represents the version number of the content>,
+        "id": <20 byte id of sending node>,
+        "token": <write-token>,
+        "nodes": <n * compact IPv4-port pair>,
+        "nodes6": <n * compact IPv6-port pair>
+     },
+     "t": <transaction-id>,
+     "y": "r"
+  }
+
+
+putm:
+  A putm request is used to store a mutable piece of data in the hash table. The stored data must be less
+  than 768 bytes in length as with "put". If the storing node already has a valid entry signed with the
+  same key and the stored entry's sequence number ("seq") is greater than or equal to the sequence number
+  in the announced entry then the storing node does not store the entry but instead responds with an
+  error message. The storing node must do signature verification on the entry before storing it and if
+  verification fails then it must not store the entry and instead return an error message.
+
+  putm_dsap256 request:
+  {
+     "a": {
+        "id": <20 byte ID of sending node>,
+        "seq": <An integer which represents the version number of the content>,
+        "v": <A string of length less than 768>,
+        "sig": <A signature on the sequence number and value>
+        "k": <The entire key which is used to sign the value "v" and the sequence number "seq">
+     },
+     "q": "putm_dsap256",
+     "token": <write-token as obtained by previous request>,
+     "t": <transaction-id>,
+     "y": "q"
+  }
+
+  putm_dsap256 response:
+  {
+     "r": {
+        "id": <20 byte id of sending node>,
+     },
+     "t": <transaction-id>,
+     "y": "r"
+  }
+
+Signature verification:
+  In order to make it maximally difficult to attack the bencoding parser, signing and verification of the
+  value and sequence number should be done as follows:
+  1. encode value and sequence number separately
+  2. concatenate "3:seq" and the encoded sequence number and "1:v" and the encoded value.
+  3. sign or verify the result.
+  sequence number 1 of value "Hello World!" would be converted to: 3:seqi1e1:v12:Hello World!
+  In this way it is not possible to convince a node that part of the length is actually part of the
+  sequence number even if the parser contains certain bugs. Furthermore it is not possible to have a
+  verification failure if a bencoding serializer alters the order of entries in the dictionary.
+
+Expiration:
+  Without re-announcement, these entries should expire in twice as long as normal peer announcements.
+  The logic for making them last longer is that they are more static in nature. It is up to the developer
+  who designs a protocol based on these primitives to decide whether subscribers will re-announce or whether
+  the publisher will do all announcing.
diff --git a/rfcs/HonestID.txt b/rfcs/HonestID.txt
@@ -0,0 +1,99 @@
+Node ID Honesty - Request for Comments
+
+Please direct comments at:
+#dns-p2p on efnet
+#cjdns on efnet
+#bittorrent on freenode
+cjd on efnet
+calebdelisle [at) lav@bit dot c0m
+
+This proposed protocol attempts to offer DHT nodes the means to prove to each other that they chose their
+IDs randomly. Furthermore it attempts to offer mobility and resist IP address spoofing attacks by avoiding
+reliance on the integrity of the IP network for confidence. A side effect is that this protocol provides
+confidentiality and integrity of DHT communications.
+
+Creating a node ID:
+  The node will generate a public/private key pair using the curve25519 algorithm.
+  The low 160 bits of the public key will be used as the node id. The remaining 96 bits will be referred to
+  as the "key prefix" it is needed in order to authenticate messages from a node and thus must be passed in
+  find_node and get_peers responses.
+  TODO: Understand whether curve25519 public keys will be random enough to satisfy requirements for the 
+        DHT protocol.
+
+When Alice asks Bob for directions to Zack, Bob predictably provides the node IDs and IP addresses of 8 
+nodes who he knows and are closest to Zack. Bob must also provide the key prefixes of those of the nodes who
+support Honest ID.
+
+Bob sends Alice a typical "find_node" or "get_peers" response except an extra key is added to the table:
+{
+  "t":"aa",
+  "y":"r",
+  "r": {
+    "id":"0123456789abcdefghij",
+    "nodes": "<Charlie'sID+IPAddr:port><Dave'sId+IpAddr:port><Elinor'sId+IpAddr:port>..."
+    "hi": "<bitmask><Charlie'sKeyPrefix><Elinor'sKeyPrefix>"
+  }
+}
+The "hi" key looks up a string and the first byte in the string is a bitmask. Since no more than 8 nodes are
+ever sent, the bitmask will tell Alice which nodes the key prefixes belong to. If Bob sends 3 nodes, where
+nodes one and three support Honest ID and number two doesn't, the bitmask would read 10100000 and the
+key prefixes would be in the same order as the nodes.
+
+When Alice wants to send a message to one of the nodes who we will call Charlie, Alice
+composes her message and enciphers it with a secret generated from her private key, Charlie's public key,
+and an 8 byte nounce. Alice derives Charlie's public key from Charlie's key prefix and node id as provided
+by Bob, then she prepends her public key and finally prepends a single byte null pad.
+
+[0x00][Alice's Public key][nounce][Alice+Charlie crypto-authed message]
+
+When Charlie receives the message, he determines that it is encrypted by the first byte being null.
+He then reads the next 32 bytes as Alice's public key and generates a shared secret using it and his
+private key. Then he reads the next 8 bytes as a nounce and using the secret and nounce to decipher the 
+message. He tags the message with the low 160 bits of Alice's public key as her node id. It is critical
+that after the message is fed through the bencoding engine, the message is modified by inserting or
+overwriting any node id which Alice may have sent lest Alice encrypt with one ID and insert another in
+the message. Since Alice has sent Charlie a valid message, he can be confident that she is honest about
+her node id and if she is in his routing table, her entry must be modified to include her key prefix and
+the fact that she is known honest.
+
+After handling the message and crafting his reply, Charlie sends his response with the same encoding.
+
+[0x00][Charlie's Public key][nounce][Alice+Charlie crypto-authed message]
+
+After receiving the response, Alice decodes it using the same protocol and now knows that Charlie is
+honest because he was able to generate the response.
+
+Packet size overhead:
+Although this protocol modification prepends a significant amount of data to each message, packet overhead
+is minimal. There is a null byte, a 32 byte public key, and an 8 byte nounce comprising a whopping 41 bytes
+but since the receiving party must set the node id in the message to insure integrity, the sending party
+may safely omit it. A node id is made of "2id:20" + 20 bytes of id, a total of 27 bytes. Omitting this entry
+brings the total overhead down to a measly 14 bytes.
+By far the greatest overhead will be in "find_node" and "get_peers" requests which will incur an additional
+worst case 96 bytes for the 12 byte key prefixes of the 8 nodes sent plus one byte for the bitmask and
+7 bytes for the string key "2hi:97". This overhead is unavoidable without adding a handshake.
+
+Processor overhead:
+Curve25519 holds records for Diffie-Hellman key negotiation and in addition, nodes may store the shared
+secret for peers whom they talk the most with. Because the public keys are sent with every packet, nodes
+need not store anything and can handle all traffic statelessly.
+
+About nounces:
+"With an n-bit hash code, there's a roughly 39.3% chance of a collision with 2^n/2 items"
+"With one million items and a perfect 64-bit hash function, the chance of getting a collision is 1 in
+3.7x107— or roughly half as likely as winning the UK National Lottery jackpot."
+See: http://www.javamex.com/tutorials/collections/strong_hash_code.shtml
+Implementations may generate nounces randomly or using a counter. Any implementation which choses to use
+a counter must compare the public keys of the 2 nodes as big endian integers and the node possessing the
+lesser key shall count odd numbers and the node possessing the greater key shall count even numbers.
+Randomly choosing nounces is acceptable and in cases where the nodes are not in the same routing table it
+is expected. Clearly there is no additional risk from Alice using a counter and Charlie using random
+nounces.
+
+Enforcing the change:
+Rather than simply blacklisting all nodes who do not comply after some arbitrary cutoff date, implementers
+should take the more gentle approach of allowing new nodes who implement the protocol to evict old good 
+nodes from buckets thus promoting the protocol rather than forcing it. It is recommended that nodes rate
+limit the eviction of old good nodes lest they have their routing tables flushed by a denial of service
+attack while the swarm is in transition. The rate at which old good nodes should be evicted is a topic
+in need of research.