From 40d9328a46659885df0b6aae633458f6d30778e6 Mon Sep 17 00:00:00 2001 From: cjdelisle Date: Mon, 28 Feb 2011 14:55:29 -0500 Subject: [PATCH] Added RFCs --- rfcs/DHTStore.txt | 203 ++++++++++++++++++++++++++++++++++++++++++++++ rfcs/HonestID.txt | 99 ++++++++++++++++++++++ 2 files changed, 302 insertions(+) create mode 100644 rfcs/DHTStore.txt create mode 100644 rfcs/HonestID.txt diff --git a/rfcs/DHTStore.txt b/rfcs/DHTStore.txt new file mode 100644 index 000000000..7488e4741 --- /dev/null +++ b/rfcs/DHTStore.txt @@ -0,0 +1,203 @@ +DHT Store + +I would like to propose the addition to DHT of 4 primitive functions. These functions will allow for loading +and storing of small pieces of static or mutable data. +get, put, getm, and putm. + +get: + A very simple function which sends a hash and receives a value in return. The value must be hashed + and compared to the requested hash. + + A node may choose to lookup only some number of bytes of the beginning of the hash rather than the entire + hash. The first 20 bytes are mandatory and must be sent with the "target" key. Additional bytes of the + hash may be added with the optional "k" dictionary key. + + If the responding node finds one or more entries which match the hash prefix provided, it may return + whichever one it likes and if the rest of the hash is incorrect then the requesting node shall ask again + with a larger piece of the hash. + Since over time, hash functions are obsoleted and replaced, the get function should exist for any kind of + hash function. I propose that the function name has an underscore and a protocol identifier appended to + it so get an entry from a SHA-1 key would require get_sha1, it is recommended for DHT coherence that all + nodes support the same set of hashes. Clearly entries hashed with one function should be stored in a + separate "namespace" from entries hashed with a different function. + I propose that all nodes shall support SHA-1 and SHA-256 hash functions. I propose 2 functions so that in + the event that SHA-1 is proven unsuitable for collision free storage, an upgrade to SHA-256 will not take + as long as an entire development release and adoption cycle. + These protocols must be identified by: + get_sha1 -- get request with SHA-1 + get_sha256 -- put request with SHA-256 + put_sha1 -- get request with SHA-1 + put_sha256 -- put request with SHA-256 + + get_sha256 request: + { + "a": { + "id": <20 byte ID of sending node>, + "k": + "target": <160 high bits of the hash represented as a string.> + }, + "q": "get_sha256", + "t": , + "y": "q" + } + + In response to a get request where a matching entry is found, the responding node shall return a string + which is no longer than 767 bytes. The node also returns it's own Id, a write token and however many ipv4 + and/or ipv6 nodes will fit within the maximum packet size. + The write token does not play the role which it does with announce_peer requests where it attempts to + prevent denial of service by signing someone else up for a torrent. In this case it serves 2 purposes, + #1 It prevents abusive nodes from evading IP blacklists by spoofing source address. + #2 It attaches identity to an announcement since one must be in possession of the IP address which they + announce from, thus discouraging those who would abuse an opportunity to anonymously store things on + other people's hard drives. + + get_sha256 response: + { + "r": { + "v": , + "id": <20 byte id of sending node>, + "token": , + "nodes": , + "nodes6": + }, + "t": , + "y": "r" + } + +put: + A put request is used to store a static piece of data in the hash table. It must be less than 768 bytes + in length. This limit is to prevent abuse of the table which is meant only for discovery but also to + limit the feasibility of storing legally problematic or morally troubling content. + Any attempt to store more than this size string should be met with an error message. + The storing node generates a hash and bounces it back as a confirmation that it did actually store the + content and it received it all correctly. + + put_sha256 request: + { + "a": { + "id": <20 byte ID of sending node>, + "v": + }, + "q": "put_sha256", + "token": , + "t": , + "y": "q" + } + + put_sha256 response: + { + "r": { + "id": <20 byte id of sending node>, + "k": + }, + "t": , + "y": "r" + } + + +getm: + In order to be able to have syndication feeds, one needs to be able to have entries whose IDs are static + while their content is mutable. Mutable content requires cryptographic public key signature to verify the + changes to the content. I propose that the getm function like the get function has an underscore and a + protocol identifier appended to it. Getting mutable data which is signed with ECDSA using curve nistp256 + will be done with: getm_dsap256. If a node who supports ECDSA with nistp256 is issued a getm request for + a protocol which it does not support, it must not return an entry even if it has an entry which matches + the id. If a node supports multiple protocols then it must store entries for each protocol in a separate + namespace lest a broken protocol be used to overwrite entries for an unbroken protocol. + + As with get, a requesting node may send a request with only some number of bytes of the beginning of the + key. If the responding node has one or more entries for that key then she shall send whichever one she + likes. If the requesting node cannot validate the signature on the item then he may ask again with a + larger segment of the key. + + I propose that every node support ECDSA with NISTp256 and RSA-2048. Whether or not point compression + should be used is a matter for debate but the worst case is RSA-2048 which will contain a 256 bytes for + key and 256 for signature in the putm message, the value itself may be as much as 767 bytes and the + node id will add another 20. The packet size will still only be 1299 and has 175 bytes of space for + miscellaneous entries and bencoding overhead before reaching the common MTU of 1500 + (including IP and UDP headers). mget packets will necessarily be smaller than mput packets since the key + is omitted and the requests which comprise the majority of the packets will be even smaller since they + can omit most of the key from the request. In comparison with the gigantic RSA-2048 keys, 64 byte + uncompressed p256 keys seem tiny. + These protocols must be identified by: + getm_dsap256 -- getm request with ECDSA-NISTp256 + getm_rsa2048 -- putm request with RSA-2048 + putm_dsap256 -- getm request with ECDSA-NISTp256 + putm_rsa2048 -- putm request with RSA-2048 + + getm_dsap256 request: + { + "a": { + "id": <20 byte ID of sending node>, + "k": + "target": <160 high bits of the key represented as a string> + }, + "q": "getm_dsap256", + "t": , + "y": "q" + } + + getm_dsap256 response: + { + "r": { + "v": , + "sig": , + "seq": , + "id": <20 byte id of sending node>, + "token": , + "nodes": , + "nodes6": + }, + "t": , + "y": "r" + } + + +putm: + A putm request is used to store a mutable piece of data in the hash table. The stored data must be less + than 768 bytes in length as with "put". If the storing node already has a valid entry signed with the + same key and the stored entry's sequence number ("seq") is greater than or equal to the sequence number + in the announced entry then the storing node does not store the entry but instead responds with an + error message. The storing node must do signature verification on the entry before storing it and if + verification fails then it must not store the entry and instead return an error message. + + putm_dsap256 request: + { + "a": { + "id": <20 byte ID of sending node>, + "seq": , + "v": , + "sig": + "k": + }, + "q": "putm_dsap256", + "token": , + "t": , + "y": "q" + } + + putm_dsap256 response: + { + "r": { + "id": <20 byte id of sending node>, + }, + "t": , + "y": "r" + } + +Signature verification: + In order to make it maximally difficult to attack the bencoding parser, signing and verification of the + value and sequence number should be done as follows: + 1. encode value and sequence number separately + 2. concatenate "3:seq" and the encoded sequence number and "1:v" and the encoded value. + 3. sign or verify the result. + sequence number 1 of value "Hello World!" would be converted to: 3:seqi1e1:v12:Hello World! + In this way it is not possible to convince a node that part of the length is actually part of the + sequence number even if the parser contains certain bugs. Furthermore it is not possible to have a + verification failure if a bencoding serializer alters the order of entries in the dictionary. + +Expiration: + Without re-announcement, these entries should expire in twice as long as normal peer announcements. + The logic for making them last longer is that they are more static in nature. It is up to the developer + who designs a protocol based on these primitives to decide whether subscribers will re-announce or whether + the publisher will do all announcing. diff --git a/rfcs/HonestID.txt b/rfcs/HonestID.txt new file mode 100644 index 000000000..9cf5bcd5b --- /dev/null +++ b/rfcs/HonestID.txt @@ -0,0 +1,99 @@ +Node ID Honesty - Request for Comments + +Please direct comments at: +#dns-p2p on efnet +#cjdns on efnet +#bittorrent on freenode +cjd on efnet +calebdelisle [at) lav@bit dot c0m + +This proposed protocol attempts to offer DHT nodes the means to prove to each other that they chose their +IDs randomly. Furthermore it attempts to offer mobility and resist IP address spoofing attacks by avoiding +reliance on the integrity of the IP network for confidence. A side effect is that this protocol provides +confidentiality and integrity of DHT communications. + +Creating a node ID: + The node will generate a public/private key pair using the curve25519 algorithm. + The low 160 bits of the public key will be used as the node id. The remaining 96 bits will be referred to + as the "key prefix" it is needed in order to authenticate messages from a node and thus must be passed in + find_node and get_peers responses. + TODO: Understand whether curve25519 public keys will be random enough to satisfy requirements for the + DHT protocol. + +When Alice asks Bob for directions to Zack, Bob predictably provides the node IDs and IP addresses of 8 +nodes who he knows and are closest to Zack. Bob must also provide the key prefixes of those of the nodes who +support Honest ID. + +Bob sends Alice a typical "find_node" or "get_peers" response except an extra key is added to the table: +{ + "t":"aa", + "y":"r", + "r": { + "id":"0123456789abcdefghij", + "nodes": "..." + "hi": "" + } +} +The "hi" key looks up a string and the first byte in the string is a bitmask. Since no more than 8 nodes are +ever sent, the bitmask will tell Alice which nodes the key prefixes belong to. If Bob sends 3 nodes, where +nodes one and three support Honest ID and number two doesn't, the bitmask would read 10100000 and the +key prefixes would be in the same order as the nodes. + +When Alice wants to send a message to one of the nodes who we will call Charlie, Alice +composes her message and enciphers it with a secret generated from her private key, Charlie's public key, +and an 8 byte nounce. Alice derives Charlie's public key from Charlie's key prefix and node id as provided +by Bob, then she prepends her public key and finally prepends a single byte null pad. + +[0x00][Alice's Public key][nounce][Alice+Charlie crypto-authed message] + +When Charlie receives the message, he determines that it is encrypted by the first byte being null. +He then reads the next 32 bytes as Alice's public key and generates a shared secret using it and his +private key. Then he reads the next 8 bytes as a nounce and using the secret and nounce to decipher the +message. He tags the message with the low 160 bits of Alice's public key as her node id. It is critical +that after the message is fed through the bencoding engine, the message is modified by inserting or +overwriting any node id which Alice may have sent lest Alice encrypt with one ID and insert another in +the message. Since Alice has sent Charlie a valid message, he can be confident that she is honest about +her node id and if she is in his routing table, her entry must be modified to include her key prefix and +the fact that she is known honest. + +After handling the message and crafting his reply, Charlie sends his response with the same encoding. + +[0x00][Charlie's Public key][nounce][Alice+Charlie crypto-authed message] + +After receiving the response, Alice decodes it using the same protocol and now knows that Charlie is +honest because he was able to generate the response. + +Packet size overhead: +Although this protocol modification prepends a significant amount of data to each message, packet overhead +is minimal. There is a null byte, a 32 byte public key, and an 8 byte nounce comprising a whopping 41 bytes +but since the receiving party must set the node id in the message to insure integrity, the sending party +may safely omit it. A node id is made of "2id:20" + 20 bytes of id, a total of 27 bytes. Omitting this entry +brings the total overhead down to a measly 14 bytes. +By far the greatest overhead will be in "find_node" and "get_peers" requests which will incur an additional +worst case 96 bytes for the 12 byte key prefixes of the 8 nodes sent plus one byte for the bitmask and +7 bytes for the string key "2hi:97". This overhead is unavoidable without adding a handshake. + +Processor overhead: +Curve25519 holds records for Diffie-Hellman key negotiation and in addition, nodes may store the shared +secret for peers whom they talk the most with. Because the public keys are sent with every packet, nodes +need not store anything and can handle all traffic statelessly. + +About nounces: +"With an n-bit hash code, there's a roughly 39.3% chance of a collision with 2^n/2 items" +"With one million items and a perfect 64-bit hash function, the chance of getting a collision is 1 in +3.7x107— or roughly half as likely as winning the UK National Lottery jackpot." +See: http://www.javamex.com/tutorials/collections/strong_hash_code.shtml +Implementations may generate nounces randomly or using a counter. Any implementation which choses to use +a counter must compare the public keys of the 2 nodes as big endian integers and the node possessing the +lesser key shall count odd numbers and the node possessing the greater key shall count even numbers. +Randomly choosing nounces is acceptable and in cases where the nodes are not in the same routing table it +is expected. Clearly there is no additional risk from Alice using a counter and Charlie using random +nounces. + +Enforcing the change: +Rather than simply blacklisting all nodes who do not comply after some arbitrary cutoff date, implementers +should take the more gentle approach of allowing new nodes who implement the protocol to evict old good +nodes from buckets thus promoting the protocol rather than forcing it. It is recommended that nodes rate +limit the eviction of old good nodes lest they have their routing tables flushed by a denial of service +attack while the swarm is in transition. The rate at which old good nodes should be evicted is a topic +in need of research.