Table of Contents
The transport layer provides a similar service to TCP: reliable, congestion-controlled delivery of FNP messages. However, the needs of FNP are slightly different from those of a typical TCP application in two ways: first, FNP does not require messages to be delivered in the same order in which they were sent (although we must do a thorough audit of the protocol to confirm this before deploying the new transport layer); and second, FNP connections are typically long-lived (whereas TCP connections in environments like the web are often used to transfer only a few KB of data). The first difference affects how out-of-order packets are handled by the receiver, and makes selective acknowledgements more efficient that cumulative acknowledgements; the second difference affects congestion control. Apart from these differences, the design of the new transport layer is closely based on TCP, and provides duplicate detection, retransmission, and congestion control in similar ways to TCP.
The transport layer deals with plaintext payloads, each containing one or more complete FNP messages and identified by a payload number. The transport layer does not make use of the unique sequence numbers used by the encryption and authentication layer (see NewPacketFormat). This means the transport layer may receive the same payload more than once, due to retransmissions or packet duplication by the network. To detect and discard duplicates, the receiver maintains a sliding window starting at the lowest payload number not yet received (which is zero when the connection is first established) and covering the next 2^16 payloads. A bitmap is used to record which payloads within the window have been received. (The window is much larger than those used by IPSec and Datagram TLS because of the need to handle retransmissions.)
Payloads below the window, or inside the window but marked as already received, are acknowledged and discarded. (Duplicates must be acknowledged because the original ack may have been lost.)
Payloads inside the window but not marked as already received are acknowledged, marked as received (which may cause the window to slide forward), and passed to the FNP layer for processing.
Payloads above the window should never be received, because the sender must not send payload n+2^16 until payload n has been acknowledged. Receipt of such a payload indicates a faulty sender, and it may be appropriate to drop the connection and inform the user.
Packets may be lost by the network as a result of congestion or other problems; the transport layer uses acknowledgements and timeouts to work out which packets have been lost. TCP uses cumulative acknowledgements, but to support out-of-order delivery Freenet uses selective acknowledgements: every payload is explicitly acknowledged, even if it arrives out of order. This makes it easy to maintain an estimate of the round-trip time: the sender uses an exponential moving average of the time between a packet being sent and the acknowledgement being received.
Payloads that have been sent but not acknowledged must be stored by the sender in case they need to be retransmitted. To improve efficiency, the sender may store the entire packet, but packets that contain an ack with no payload must not be stored, since they will not be acknowledged.
The sender must not transmit payload n+2^16 until payload n has been acknowledged. This limits the size of the sender's retransmission buffer and the size of the receiver's duplicate detection window, preventing denial-of-service attacks.
When the sender receives an ack, it checks its retransmission buffer for the acknowledged payload. If the payload is found, it is removed (this may allow new payloads to be sent if the sender was waiting for payload n to be acknowledged before sending payload n+2^16). While checking the retransmission buffer, the sender also performs fast retransmission: if a payload number greater than n is acknowledged and payload n was last transmitted more than FRTO round-trip times ago, payload n is retransmitted (and the congestion window is modified - see below). Payload n remains in the sender's retransmission buffer, but its timestamp is updated to prevent it from being retransmitted again until another FRTO round-trip times have elapsed. Like the similar mechanism in TCP, this mechanism is called fast retransmission because it infers the loss of a packet before the packet times out, by observing the delivery of subsequent packets. The suggested value for FRTO is 1.5 - values closer to 1 will detect losses more quickly but may trigger unnecessary retransmissions if packets are reordered by the network. TCP uses a slightly different method that we should probably copy: if a payload number greater than n+3 is acknowledged and payload n has never been retransmitted, payload n is retransmitted.
Fast retransmission should detect and repair most packet losses, but in cases of severe congestion or long periods without sending new data, a fallback mechanism is needed. This mechanism is the retransmission timer: every 100ms the node checks all its connections for payloads that were last sent at least RTO round-trip times ago and have not yet been acknowledged. Any payloads that meet these conditions are retransmitted, their timestamps are updated, and the connection's congestion window is modified (see below). The node's connections should be checked in a random order each time, to avoid giving some connections preferential access to the available bandwidth. A suggested value for RTO is 4. TCP uses a more sophisticated scheme that tracks the variance as well as the average of the round-trip time. RFC 2988 give the best description of the algorithm, but here's a summary:
Initially the retransmission timeout (rto) is set to 3 seconds. We keep a smoothed estimate of the round-trip time (rtt) and a smoothed estimate of the variance (rttVar). G is the granularity of the retransmission timer, in our case 0.1 seconds. ALPHA is 1/8 and BETA is 1/4, so the estimate of the variance is more sensitive to changes than the estimate of the round-trip time. (Note that these are separate from the similarly named parameters ALPHA and BETA for the congestion window.)
When we get the first sample of the round-trip time (r): rtt = r rttVar = r/2 rto = rtt + max (G, 4 * rttVar)
For each subsequent sample: rttVar = (1.0 - BETA) * rttVar + BETA * abs (rtt - r) rtt = (1.0 - ALPHA) * rtt + ALPHA * r rto = rtt + max (G, 4 * rttVar)
Round-trip time samples must not be taken from packets that were retransmitted, because the ack might be for the original packet or the retransmission.
Congestion control The aim of congestion control is to make efficient use of the available bandwidth between the sender and the receiver, without exceeding the available bandwidth and causing packet loss. The requires constant adaptation, as the available bandwidth may be constantly changing.
Like TCP, the new congestion control algorithm has two states: slow start and congestion avoidance. Contrary to its name, slow start is used to get a rough estimate of the available bandwidth as quickly as possible; congestion avoidance is used to fine-tune the estimate provided by slow start.
The sender uses an adaptive congestion window to determine the maximum amount of data that can be in flight (sent but not yet acknowledged) at any time. The number of bytes in flight and the congestion window are modified when packets are transmitted, retransmitted or acknowledged. Specifically:
- The congestion window is initially set to the maximum packet size, and the connection is initially in slow start.
- When a packet is being constructed, FNP messages can only be added to the packet as long as its total size (including acks and headers) plus the number of bytes in flight would not exceed the congestion window.
- When a packet is first transmitted, its total size is added to the number of bytes in flight.
- When a packet is first acknowledged, its total size is subtracted from the number of bytes in flight. If the connection is in slow start, ALPHA * size / MAX_PACKET_SIZE bytes are added to the congestion window, where 'size' is the total size of the acknowledged packet. If the connection is in congestion avoidance, ALPHA * size / MAX_PACKET_SIZE / cwnd bytes are added to the congestion window, where 'cwnd' is the current size of the congestion window in bytes. Thus the congestion window grows exponentially during slow start, and linearly during congestion avoidance, until a loss occurs. The maximum size of the congestion window is 1,000,000 bytes.
- When a packet is fast retransmitted, the number of bytes in flight is not modified (the lost packet is assumed to have left the network, but the retransmitted packet will take its place). The congestion window is multiplied by BETA, and if the connection is in slow start it switches to congestion avoidance. (The minimum size of the congestion window is the maximum packet size.)
- When a packet is retransmitted due to the retransmission timer, the congestion window is reset to the maximum packet size and the connection enters slow start. The number of bytes in flight is not modified. (A retransmission timeout is taken as an indication of severe congestion; returning to slow start allows the available bandwidth to be re-estimated.)
- When no new packets have been sent for IDLE round-trip times, the congestion window is reset to the maximum packet size and the connection enters slow start. (After a long time idle, the old estimate of the available bandwidth may no longer be accurate.)
The user may want to limit the node's outgoing bandwidth. This can be implemented using a token bucket, ie a counter that is incremented at a fixed rate and decremented whenever bandwidth is used. When a packet is being constructed, FNP messages can only be added to the packet as long as its total size (including acks and headers) would not exceed the bandwidth counter. When a packet is transmitted or retransmitted, its total size is subtracted from the bandwidth counter.
FIXME: do we want to shrink packets and thereby make sends less efficient in response to bandwidth pressure? Why not wait until we can send a larger packet?
FIXME: searches now take priority over transfers.