An RFC-compliant12 implementation of IP/TCP over a virtual UDP link layer, built with Rust.3
To build the nodes, simply run make in the base directory:
$ makeTo run the nodes, first convert a .net configuration file into individual node .lnx
configuration files using tools/net2lnx <.net file>; this creates a virtual network connected by a
UDP link layer abstraction (see the Appendix for more information). Given a .lnx file, run the
node with
$ ./node <.lnx file>To benchmark the node's sending capabilities on a lossy network (in this example, we use ABC.net), follow the following configuration (make sure to have in/ and out/ directories; additionally, tmux might help):
- In one terminal, go to the
test/directory and run the lossy reference IP node with 2% packet loss:$ ./ip_node_lossy ./B.lnx > lossy 0.02 - In another terminal, run the node and start the
recvfilebenchmark:$ ./node test/C.lnx > rf_benchmark out/<file> <port> <num_times>
- In one last terminal, run the node and start the
sendfilebenchmark (here,192.168.0.4is the virtual IP address of thetest/C.lnxnode; make sure to use the same<port>and<num_times>values):$ ./node test/A.lnx > sf_benchmark in/<file> 192.168.0.4 <port> <num_times>
Afterwards, results should appear in the benchmarks/ directory; files appear as send_<secs_since_UNIX_EPOCH>.txt. Happy testing!
To generate your own test files, see the appendix.
sf_benchmark/rf_benchmarkdo not handle failure well; they will survive a failed three-way handshake (althoughrf_benchmarkwill never finish, since it will wait for too many files), but if asendfilecall hits the data orFINre-transmission limit (currently20),sf_benchmarkwill forever halt. This isn't a large issue, but it does complicate benchmarking with more iterations/a lossier network.
Note: Previous benchmarks used an RTO lower bound of 10ms, instead of 1ms as recommended on local networks. We've moved the old benchmarks into the
benchmarks/old/directory, and updated the existing results.
Performance tests were conducted on a 2018 13" MacBook Pro with a 2.3 GHz Quad-Core Intel Core i5 and 16GB RAM in a Ubuntu 20.04 VM with 2 cores and 2GB RAM.
We observed the following performance for the reference node after 20 sends of 1MB files:
- Over a non-lossy network: 300ms
- Over a 2% lossy network: 10s
We ran four performance benchmarks for our node, two for the non-lossy network and two for the 2% lossy network; in each, we sent a 1MB file 50 times.
| File name | Loss % | Reference Node | Our Node | Factor Speedup |
|---|---|---|---|---|
| send_nonlossy_1651652179_home.txt | 0% | 300ms | 98.9592ms | 3.0316x |
| send_nonlossy_1651659042_home.txt | 0% | 300ms | 85.8941ms | 3.4927x |
| send_1651695516_home.txt | 2% | 10s | 782.7130ms | 12.7761x |
| send_1651695733_home.txt | 2% | 10s | 779.0113ms | 12.8368x |
The results are rather promising; although our node varies significantly in time spent, it is able to consistently match and outperform the reference node.
Similar performance tests were conducted on the CS department machines (specifically, mlab1e-l) with even more impressive results.
We observed the following performance for the reference node after 20 sends of 1MB files:
- Over a non-lossy network: 80ms
- Over a 2% lossy network: 12s
We again ran four performance benchmarks for our node, two for the non-lossy network and two for the 2% lossy network, with the same configuration.
| File name | Loss % | Reference Node | Our Node | Factor Speedup |
|---|---|---|---|---|
| send_nonlossy_1651656764_dept.txt | 0% | 80ms | 27.2158ms | 2.9395x |
| send_nonlossy_1651657713_dept.txt | 0% | 80ms | 27.0319ms | 2.9595x |
| send_1651693707_dept.txt | 2% | 12s | 311.0645ms | 38.5772x |
| send_1651638729_home.txt | 2% | 12s | 305.6566ms | 39.2597x |
The slowdown on the department machines for the reference node on a 2% lossy network is peculiar, but surprisingly consistent. Our node outperforms the reference node on our local machine, but really shines on the department machines, with sends consistently 1-2s to complete, compared to the reference node's 12s. We hypothesize this may be because the reference node does not make full use of concurrency while our node does, allowing great performance increases on a machine with more cores.
A central Node structure is the centerpiece of our TCP/IP implementation, providing a command-line interface for all network activities. A node parses a .lnx file, which configures its virtual network interfaces (links). Each node provides the following functionality:
==================== Socket creation/shutdown ====================
help (h) : Print this list of commands.
interfaces (li) : Print information about each interface, one per line.
routes (lr) : Print information about the route to each known destination, one per line.
quit (q) : Quit this node, closing all open sockets.
==================== Socket creation/shutdown ====================
sockets (ls) : Print information about each socket (ID, IP, Port, State).
a [port] : Spawn a socket, bind it to the given port, and start accepting connections on that port.
c [ip] [port] : Attempt to connect to the given ip address, in dot notation, on the given port.
sd [id] [read|write|both]: Shutdown a socket. "read"/"r" closes only the reading side; "write"/"w"
closes only the writing side; "both" closes both. Default is "write".
cl [id]: v_close on the given socket.
==================== Data transmission commands ====================
s [sid] [data] : Send a string to a socket. Blocks until v_write() returns.
r [sid] [n_bytes] [y|n]: Try to read data from a socket. If last argument "y", blocks until n_bytes are
read or connection closes. If "n" (default), returns whenever v_read() returns.
sf [filename] [ip] [port]: Connect to the given IP and port, send the entirety of the specified file, and
close the connection.
rf [filename] [port]: Listen for a connection on the given port. Once established, write everything you can
read from the socket to the given file. Once the other side closes the connection, close
the connection as well.
==================== IP/network commands ====================
send [ip] [protocol] [payload] : sends payload with the specified protocol to the virtual IP address.
up [integer] : Bring an interface "up" (it must be an existing interface, probably one you brought down).
down [integer] : Bring an interface "down".
==================== Logging commands ====================
log [id]: print SendControlBuffer and RecvControlBuffer information about socket with the specified id.
sf_benchmark [filename] [ip] [port] [n]: Send a file to the destination socket n times, then display benchmarks.
rf_benchmark [filename] [port] [n]: Receive a file on the provided port n times, then display benchmarks.
Most of the TCP RFC is supported, including:
- IPv4 header checksum generation/validation
- Configurable MSS/window sizes
- Three-way handshake and graceful shutdown, including simultaneous
SYNs/FINs SYN/FINre-transmission and timeouts- Adherence to the TCP state diagram
- Zero-window probing with exponential backoff
- Data re-transmission with exponential backoff
- Send/receive sequence space maintenance (including circular sequence numbers)
- Out-of-order segments [TODO]
Currently, urgent pointers, RST packets, TCP options, congestion control, and security/precedence fields are not supported. Nagle's algorithm, delayed ACKs, SYN/FIN segments with data, clock-generated ISNs, SWS avoidance, selective ACKs, and Jacobson's + Korn's SRTT algorithm are in progress, but not completely supported at the moment. See the TODOs for more.
The TCPModule struct is the primary TCP interface responsible for all socket operations:
pub struct TCPModule {
ip_module: InternetModule, // interface to IP layer
sockets: SocketTable, // list of all sockets
listen_queue: Arc<DashMap<SocketEntry, SynchronizedQueue<SocketID>>>, // sockets from listeners
pending_socks: Arc<DashSet<SocketEntry>>, // sockets that potentially need re-transmission
send_tx: SyncSender<IPPacket>, // channel for sockets to forward IP packets
accept_tx: Sender<IPPacket>, // channel for listener sockets to accept connections
segment_tx: Sender<IPPacket>, // channel to handle incoming segments
}On creation, the TCPModule struct additionally spawns 5 handler threads (send_loop, accept_loop, segment_loop, retransmission_loop, and dead_socket_loop). We describe each field and handler in the following sections. In addition, a TCP protocol handler is provided to the network layer (allowing the IP module to forward TCP packets to the TCP module), which sends packets either to the accepting connections or processing segments handler, depending on the destination socket.
The TCPModule struct supports most of the functionality expected from a socket API:
/**
* Creates a new socket and binds the socket to an address/port. If addr is nil/0, bind to any
* available interface.
*
* After binding, moves socket into LISTEN state (passive OPEN in the RFC).
*
* Returns:
* - socket number on success or negative number on failure
*/
pub fn v_listen(&self, addr: Ipv4Addr, port: u16) -> TCPResult<SocketID>;
/**
* Accept a requested connection from the listening socket's connection queue.
*
* Returns:
* - new socket handle on success or error on failure
*/
pub fn v_accept(&self, id: SocketID) -> Result<SocketID>;
/**
* Creates a new socket and connects to an address (active OPEN in the RFC).
*
* Returns:
* - the socket number on success, or Err on failure
*/
pub fn v_connect(&self, addr: Ipv4Addr, port: u16) -> Result<SocketID>;
/**
* Read on an open socket (RECEIVE in the RFC). REQUIRED to block when there is no available
* data. All reads should return at least one data byte unless failure or EOF occurs.
*
* Returns:
* - (num bytes read) or (negative number on failure) or (0 on EOF and shutdown_read) or (0 if
* nbyte = 0)
*/
pub fn v_read(&self, id: SocketID, buf: &mut [u8], n_bytes: usize) -> TCPResult<usize>;
/**
* Write on an open socket (SEND in the RFC). Write is REQUIRED to block until all bytes are in
* the send buffer.
*
* Returns:
* - (num bytes written) or (negative number on failure)
*/
pub fn v_write(&self, id: SocketID, buf: &[u8]) -> TCPResult<usize>;
/**
* Shutdown an connection.
*
* - if `how` is WriteClose, close the writing part (CLOSE from RFC, i.e. send FIN)
* - if `how` is ReadClose, close the reading part (no equivalent; all v_reads should return
* 0).
* - if `how` is BothClose, close both ends.
*
* Returns:
* - nothing on success, error on failure.
*/
pub fn v_shutdown(&self, id: SocketID, how: ShutdownType) -> TCPResult<()>;
/**
* Close the socket, making the underlying connection inaccessible to the TCP API functions.
* The connection will finish retransmitting any data not yet ACK'd.
*
* Returns:
* - Nothing on success, error on failure.
*/
pub fn v_close(&self, id: SocketID) -> TCPResult<()>;Each socket (Transmission Control Buffer/TCB in the RFC) is stored in a TCP module-wide SocketTable, accessible both by a socket ID (assigned from 0 upwards) and a SocketEntry (a four-tuple of (src_addr, src_port, dst_addr, dst_port)), and has the following fields:
pub struct Socket {
src_sock: SocketAddrV4, // source <address>:<port>
dst_sock: SocketAddrV4, // destination <address>:<port>
tcp_state: Arc<Mutex<TCPState>>, // current state of the socket
time_wait: Arc<Mutex<Option<Instant>>>, // timer for TimeWait (and other closing states)
zero_probe: Arc<Mutex<Option<Instant>>>, // zero-probing re-transmission timer
zp_counter: Arc<AtomicU32>, // counter to determine interval between zero-probes
zp_timeout: Arc<AtomicU32>, // counter to determine timeout for zero-probing
prtt: Arc<Mutex<Duration>>, // predicted RTT
rtx_q: Arc<Mutex<VecDeque<SegmentEntry>>>, // segment retransmission queue
snd: Arc<Mutex<SendControlBuffer>>, // SND circular buffer
rcv: Arc<Mutex<RecvControlBuffer>>, // RCV circular buffer
send_tx: SyncSender<IPPacket>, // interface to IP module for packet forwarding
nagles: Arc<AtomicBool>, // flag for Nagle's algorithm
}A socket's current state in the TCP state diagram is tracked via tcp_state. Each socket contains two circular buffers, snd and rcv, responsible for maintaining send/receive sequence space values (e.g. SND.{UNA, NXT, WND} and RCV.{NXT, WND}). nagles may be used to enable/disable Nagle's algorithm (... which currently doesn't work anyways 😅).
Although each socket possesses information about timeouts, the main TCP module's
pending_socksis currently missing (more on this later); this makes handling re-transmission particularly ugly within socket methods, which unfortunately forces re-transmission handling in other interfaces. Ideally, a socket should be responsible for putting itself on/taking itself of thepending_socksset.
Additionally,
snd/rcvare guarded by mutexes, but each is locked within other handlers that may call socket-specific methods (e.g.send_buffer,recv_buffer, etc.); this results in an unfortunate API/separation of concerns, wherein calling functions must first do some processing withsnd/rcvbefore calling socket methods. We're not quite sure how to remedy this yet, given Rust's safe synchronization model.
Each socket additionally has a few timers/counters:
time_wait: records the time since entering a closing state (FIN-WAIT-1,FIN-WAIT-2,CLOSING,LAST-ACK, andTIME-WAIT).zero_probe: records the time since the last zero-probe.zp_counter/zp_timeout: records the number of times the node has zero-probed with a response (counter) or without (timeout).prtt: computes the predicted RTT, adhering to RFC 793's naive computation (we attempted Jacobson's + Korn's algorithm, but ran out of time).
Sockets communicate with other sockets through the send_{tx, rx} channel; segments are sent to the send_loop handler, which is responsible for interfacing with the ip_module and forwarding packets.
In hindsight, more feedback from the
send_loopwould help in providing user responses to failedsend/shutdown/closeattempts; right now, thesend_loopis a very thin loop that continually processes segments from the channel, which is unfortunately a bottleneck for packet transmission. Perhaps the channel may be done away with altogether, and just directly sent through the IP module.
A listener socket for incoming connections may be established by providing a port number to v_listen; the port number is first validated (it must be > 1024, and not currently in use), then v_listen creates a socket on 0.0.0.0:<port> in the LISTEN state (PASSIVE OPEN in the RFC). A synchronized queue is added for the socket in the TCP module's listen_queue map; when incoming connections arrive, the listener socket spawns a new TCB, and if the three-way handshake is successfully established, the newly accepted socket is added to the listener socket's listen_queue. Subsequent v_accept calls on the listener socket's ID take from this queue to return new connections.
A connection is made via the v_connect call by providing the destination IP address/port combination (ACTIVE OPEN in the RFC); a SYN segment is sent, and the socket is put on the pending_socks set. When a SYN+ACK segment is received, the socket transitions into the ESTABLISHED state and removed from the pending_socks set.
Segments to ACTIVE OPEN sockets are forwarded from the TCP protocol handler through the accept_{tx, rx} channel, and processed by the accepting connections handler. If the appropriate values are ACKed from a valid source to the valid destination, the handler performs the above transition from SYN-SENT to ESTABLISHED.
In hindsight, there is no real reason for the separation between accepting connections and handling segments (described later); we originally thought it'd be a nice abstraction, but only yielded an arbitrary division of responsibilities. We kept it out of time constraints, but would like to combine them into a more modular handler later.
Data is sent/received via the v_read/v_write calls (SEND and RECEIVE in the RFC, respectively); after supplying a buffer, the TCP module will send/receive data from/into the buffer to the other end of the connection. v_read blocks until at least 1 byte is received in the RCV buffer; v_write blocks until all bytes are written into the SND buffer (but not necessarily fully sent out and ACKed!).
When sending out, all bytes from
SND.NXTtoSND.UNA + SND.WNDare sent; if a large buffer is written, this may spam the connection with multiple data segments, perhaps before the receiving end may respond to all of them. If not enough time elapses betweenACKs, the receiving node mayACKtoo little a sequence number, causing unnecessary data re-transmissions, and slowing throughput through the underlying link. This problem is especially exacerbated when an early segment is dropped from the network.
The core of our TCP implementation is the segment_loop, responsible for handling all incoming segments. The sequence number is first checked for acceptability; if outside of the window, the segment is dropped. Then, depending on the state and segment, the handler responds according to the RFC. As a non-exhaustive list:
- if in
SYN-RCVDand theSYNisACKed, transition intoESTABLISHEDstate. - if
SND.UNA < SEG.ACK <= SND.NXT, updateSND.UNA(and related values); ifSND.UNA == SEG.ACKandSND.WND != SEG.WND, update the window (a bug in RFC 793, later corrected in the errata). - If
SEG.WND == 0orSND.WND == 0 && SEG.WND > 0, start/stop zero-probing. - if the
FINbit is set, transition into a closing state depending on the current state. - If the
SNDbuffer is non-empty, instead of just sending anACKback, send all bytes fromSND.NXTtoSND.UNA + SND.WND.
View the code in tcp_utils.rs for the entire handling process (although it adheres to the RFC rather strictly).
Because all processing of segments is done here, and each socket's
snd/rcvmust be locked before processing, the receiving socket is essentially blocked for the duration of an iteration of thesegment_loop; this may be problematic for simultaneoussend/recvcalls, as only one thread may act on the socket for both sending and receiving bytes. Moreover, each of the socket's fields are essentially wrapped inArc<Mutex<T>>s—which are accessed across multiple threads—causing increased lock contention that can considerably slow down segment handling.
Additionally, because async is not currently supported and re-transmission handling is incredibly unergonomic, delayed
ACKs are incredibly complex and thus not currently supported. This can slow down processing, and result in a flood ofACKsegments when out-of-order segments are received.
Finally, if a large amount of segments are being sent at once, chances are
v_writeis also sending segments; when we then check if theSNDbuffer is non-empty and send bytes fromSND.NXTonwards, duplicate segments may be sent, flooding the network.
The majority of the re-transmission logic (for SYN, FIN, and data segments) is done within the retransmission_loop handler. Whenever a socket has any need for re-transmission (i.e. sending a SYN/FIN/data segment, or zero-probing), it should be added to the TCP module's pending_socks set to indicate that re-transmission is necessary; and whenever the pending segment is fully ACKed, the socket should be removed. Every 5000μs, the retransmission_loop iterates through the entire pending_socks set, and handles each pending socket.
We'd really prefer not to busy-loop through the pending sockets list, as each of the data structures accessed (the
pending_socksqueue, thesocketstable, and many of the socket's fields, e.g. its timers) are protected behind synchronization primitives. This greatly increases lock contention, and wastes CPU power on often-unnecessary checks. We limit the effect of this by sleeping for5000μsevery iteration; since the lower bound on re-transmission is10ms, this should allow the loop to process re-transmissions in a timely fashion while not consuming too much CPU power. Still, we'd prefer to transition into a cleaner model (i.e. async), as the current structure is highly inflexible, rather inefficient, and unconducive to further changes like delayed ACKs, congestion control, effective zero-probing, and RTO calculations, among other stuff.
Each socket maintains a rtx_q of pending segments, that must be ordered starting from the lowest SEQ number. All pending segments are checked against the socket's SND.UNA; if the end of the segment (SEG.SEQ + SEG.LEN) is below the SND.UNA (i.e. it was ACKed before), it is removed from the re-transmission queue. If a socket's rtx_q is empty, it must have been fully acknowledged, and is thus removed from the pending_socks set. Otherwise, the pending segment is checked against its exponentially backoff-ed timeout (computed via the segment's re-transmission counter and the socket's own timers), and either returned to the pending re-transmission queue or sent again with an increased re-transmission counter (if timeout).
- Because
SYN/FINsegments are also appended tortx_q, handling of these segments is almost identical to data segments, so very little special casing is necessary; similarly with zero-probing. The primary difference lies in calculating timeouts (it gets slightly wonky with zero-probing) and re-transmission limits (currently3forSYNsegments and20for data/FIN/zero-probing segments).
In general, our re-transmission code is not particularly modular, and breaks separation of concerns; this led to multiple deadlocks/panics while implementing. It's pretty ugly, but it works for now :,)
When sending a large file, there is a brief window in which the sending thread could block indefinitely: if the sending socket has fully sent out its send buffer (i.e. SND.NXT = SND.UNA + SND.WND) and the receiving window hits 0, the socket will not begin zero-probing (as there's nothing with which to probe). The receiving socket will never then update the sending socket about its increased window size, and the sending socket will remain forever stuck. We went with a rather naive way of fixing the code: in dead_socket_loop, a thread loops over every socket every second, and if the socket:
- is in a writable state (i.e.
ESTABLISHEDorCLOSE-WAIT), - has a non-empty
SNDbuffer, - has an empty
SNDwindow (i.e.SND.WND == 0), - and is not currently zero-probing,
the thread initiates zero-probing for the socket. This is rather inefficient, but every second is really not that bad in terms of CPU usage (and with our current performance, shouldn't negatively impact sending speeds for super large files; although maybe that just says something about our performance...)
Another, perhaps better, fix could be to include the pending_socks set in either the socket's send_buffer() method, or as a field (which could make the code slightly more ergonomic elsewhere anyways); this would require greater architectural adjustments, but could feasibly work.
TODO: this section is currently under development.
network.rs contains the abstraction of our UDP link layer. The NetworkInterface struct
represents each network interface (network cards in real life):
pub struct NetworkInterface {
pub id: u8, // the unique ID of this network interface.
pub src_addr: net::Ipv4Addr, // the source IP address of this interface.
pub dst_addr: net::Ipv4Addr, // the destination IP address of this interface.
pub link_if: LinkInterface, // the link interface of the destination.
}Each Node contains an vector of NetworkInterfaces, akin to a server having multiple network cards.
For routing, our network uses the RIP protocol. On node start, 5 handlers (threads) are created for the RIP protocal:
- Checking timeouts
- Sending triggered updates
- Sending periodic updates
- Listening for messages and sending them to a channel
- Receiving messages from channel and processing them
In the captures/ directory, some Wireshark captures of our nodes in action exist. To use Wireshark with our UDP link layer abstraction, view the README.md in the rip_dissector/ directory. We've annotated a view captures here:
This is a packet capture of two nodes running on ABC.net running A.lnx and C.lnx respectively, with the reference ip_node_lossy running B.lnx with 2% packet loss. To view output from only the sending node, enter the display filter
not cs168rip and udp.srcport == 5000
Similarly, to view output from only the receiving node, enter the display filter
not cs168rip and udp.srcport == 5002
- Frame #39 is the initial
SYNsegment; it isSYN+ACKed by the receiving node in Frame #41, and finallyACKed by the sending node in Frame #43. - The sending node begins transmitting data in segments of size
MSS=1024(same as the reference node). Frame #43 shows a sent segment, which is later acknowledged by the receiving node in Frame #131.- Inspecting the lossy node (
udp.srcport == 5001), the packet withSEG.SEQ=1025is dropped, meaning only one segment actually made it into the other node'sRCVbuffer (at least without out-of-order segments), which required the sending node to re-transmit everything fromSEG.SEQ=1025toSEG.SEQ=65536(Frames #302 to #391). The segments are then successfullyACKed by the receiving node in Frames #392 to #417, then again from #514 to #610.
- Inspecting the lossy node (
- Frame #12111 marks the connection teardown, initiated by the sending node via a
FINsegment; the receiving nodeACKs theFINin Frame #12113, then sends its ownFINin Frame #12115. It is finallyACKed by the sending node in Frame #12117.
This is a packet capture using the same set-up as above.
- Frame #5 is the initial
SYNsegment; it isSYN+ACKed by the receiving node in Frame #7, and finallyACKed by the sending node in Frame #9. - The sending node then begins transmitting data in segments of size
MSS=536B(as specified by the RFC). Frame #11 shows a sent segment (later forwarded by the middle node in Frame #27), which is later acknowledged by the receiving node in Frame #263. - Starting from Frame #379 until roughly Frame #500, segments were likely lost by the middle node earlier, and thus re-transmitted by the sending node. The receiving node
ACKs the re-transmitted segments from Frame #609 to roughly Frame #640. - Frame #14645 marks the connection teardown, initiated by the sending node via a
FINsegment; the receiving nodeACKs theFINin Frame #14647, then sends its ownFINin Frame #14648. It is finallyACKed by the sending node in Frame #14656.
Certain bugs exist within the implementation; although most likely won't crash our node, they will halt some functionality:
- Re-transmission is not fully correct; our current re-transmission limit is set at
20, but a socket may timeout itself even if it might not hit the limit; whenever a segment is re-transmitted, a counter is increased by one, and if the limit is hit, the socket shuts down. This doesn't take into account the status of the receiving node; even if the node correctACKs a segment, if its value is lower than the re-transmittiing segment'sSEQ, then the counter is not reset. This causes sockets to sometimes shut down even when connected and sending data. - Closing a socket via
v_close()/quit(on the node) doesn't wait until re-transmission is completely done, and the other end of the socket cannot connect to the socket following av_close()operation (even though it should be able to, according to the reference node). Such a change---rendering a socket both completely inaccessible to the socket API, yet still capable of receiving/re-transmitting---would take large architectural rehauls, something that I'd like not to do right now (given the immense technical debt we've since accrued). - Special-casing around the send sequence number space is too lax; if someone sent a file over
2^32B, it would likely break the send/receive buffers' circularity. Fixing it shouldn't be difficult, just annoying. Perhaps I'll tackle this in the far future. - Corner cases within the state diagram are not well tested; I would not be surprised if some bug emerged in the transitions.
Additionally, there are multiple TODOs that would (hopefully) improve performance of our node:
- Accepting out-of-order segments: we should ideally finish this before interactive grading...
The infrastructure is mostly there, we just need to add a priority queue to store out-of-order
segments; as more segments arrive, if the priority queue already has later segments, we could
just take out of there. I believe the only change needed should be within
recv_buffer(as we clear the receive buffer, if later segments exist, flush them as well) and the writing data section inmake_segment_loop. - A working implementation of Nagle's Algorithm/Delayed ACK: the code is already there for
Nagle's Algorithm, but it fails to send data when the amount of data is less than
MSS(maybe this is intended?), causing small sends to become lost in the void (without a large send later). The current structure makes implementing delayed ACK quite painful; the below change would help... -
asyncRust/transitioning to tokio: the only real method of handling re-transmission at the moment is essentially busy-looping through sockets that might need re-transmission, which is both unconducive for optimizations like delayed ACK, and overall inefficient. It may be too convoluted, butasynccould hypothetically clean up the code structure immensely. - Structural cleanup: in general, the code would benefit from structural re-writes. Currently, essentially all of the segment handling is done in one handler (
segment_handler), leading to a monolithic handler function prone to many bugs (this happens elsewhere too...). Async will help, but cleanup in general would be helpful. - Congestion control: there is too much technical debt to tackle this at the moment, but it would sure be cool...
TODO
To create your own test files, use the following command:
dd if=/dev/urandom of=<out_file> bs=<size> count=1This copies <size> bytes from /dev/urandom into <out_file> (we've generated one such file, in/long, with size=1M).
To verify that input and output files are identical, you can either use
diff <in_file> <out_file>or
sha1sum <in_file> <out_file>Footnotes
-
Developed for Brown's CSCI1680: Computer Networks.
- TCP handout: https://cs.brown.edu/courses/csci1680/s22/content/tcp.pdf
- IP handout: https://cs.brown.edu/courses/csci1680/s22/content/ip.pdf