Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression in 2.5.0 - iperf test w/packet loss frequently times out #157

Closed
MagnusS opened this issue Jun 27, 2015 · 41 comments
Closed

Comments

@MagnusS
Copy link
Member

MagnusS commented Jun 27, 2015

A few weeks ago I was able to run the iperf test w/uniform packet loss 150 times locally without timeout (as mentioned here) with the master branch. I just repeated the experiment with current master (f31810c) and after a few attempts I've been unable to run more than max 8 tests in a row. Release 2.5.0 (455263d) also times out frequently.

With rev aab5709 the test runs fine. To rule out a bug in the newest version of the test I've also tried using the tests from master with aab5709 and got the same result (no timeouts). I also tried to double the timeout, in case it was caused by the reduced performance of the recently merged debug branch, but the test still times out - the connection seems to just stall when the test fails (see pcap output below).

This is the command I use - it runs 100 iperf tests and terminates if one of them fails.

for c in $(seq 1 100); do echo $c; date; ./test.byte test iperf 2 -v || break ; done

I've also increased the data size from 10mb back to 25mb in lib_tests/lib_iperf.ml, as this was set lower to reduce timeouts in Travis. With 10mb the test is less likely to timeout, but it is still unreliable (I was able to run 20 tests with 10mb vs 8 with 25mb).

Here are the last packets in the pcap output from three failed tests.

28749 109.963717   10.0.0.101 -> 10.0.0.100   TCP 1514 [TCP Previous segment not captured] 14476 > 5001 [PSH, ACK] Seq=21089701 Ack=1 Win=262140 Len=1460
28750 109.963846   10.0.0.101 -> 10.0.0.100   TCP 1514 14476 > 5001 [PSH, ACK] Seq=21091161 Ack=1 Win=262140 Len=1460
28751 109.963906   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 28748#1] 5001 > 14476 [ACK] Seq=1 Ack=21088241 Win=262140 Len=0
28752 109.964019   10.0.0.101 -> 10.0.0.100   TCP 1514 14476 > 5001 [PSH, ACK] Seq=21092621 Ack=1 Win=262140 Len=1460
28753 109.964080   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 28748#2] 5001 > 14476 [ACK] Seq=1 Ack=21088241 Win=262140 Len=0
29957 104.624361   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 29951#3] 5001 > 10208 [ACK] Seq=1 Ack=21978841 Win=262140 Len=0
29958 104.624419   10.0.0.101 -> 10.0.0.100   TCP 1514 [TCP Fast Retransmission] 10208 > 5001 [PSH, ACK] Seq=21978841 Ack=1 Win=262140 Len=1460
29959 104.624589   10.0.0.100 -> 10.0.0.101   TCP 54 5001 > 10208 [ACK] Seq=1 Ack=21984681 Win=262140 Len=0
29960 104.624684   10.0.0.101 -> 10.0.0.100   TCP 1514 [TCP Previous segment not captured] 10208 > 5001 [PSH, ACK] Seq=21986141 Ack=1 Win=262140 Len=1460
29961 104.624788   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 29959#1] 5001 > 10208 [ACK] Seq=1 Ack=21984681 Win=262140 Len=0
9338 103.079093   10.0.0.100 -> 10.0.0.101   TCP 54 5001 > 14140 [ACK] Seq=1 Ack=6841561 Win=262140 Len=0
9339 103.079269   10.0.0.101 -> 10.0.0.100   TCP 1514 [TCP Previous segment not captured] 14140 > 5001 [PSH, ACK] Seq=6843021 Ack=1 Win=262140 Len=1460
9340 103.079326   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 9338#1] 5001 > 14140 [ACK] Seq=1 Ack=6841561 Win=262140 Len=0
9341 103.079439   10.0.0.101 -> 10.0.0.100   TCP 1514 14140 > 5001 [PSH, ACK] Seq=6844481 Ack=1 Win=262140 Len=1460
9342 103.079501   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 9338#2] 5001 > 14140 [ACK] Seq=1 Ack=6841561 Win=262140 Len=0

This test sent 92 dup ack's before stalling:

29331  96.955952   10.0.0.101 -> 10.0.0.100   TCP 1514 [TCP Previous segment not captured] 18257 > 5001 [PSH, ACK] Seq=21510181 Ack=1 Win=262140 Len=1460
29332  96.956058   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 29330#1] 5001 > 18257 [ACK] Seq=1 Ack=21508721 Win=262140 Len=0
29333  96.956126   10.0.0.101 -> 10.0.0.100   TCP 1514 18257 > 5001 [PSH, ACK] Seq=21511641 Ack=1 Win=262140 Len=1460
29334  96.956231   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 29330#2] 5001 > 18257 [ACK] Seq=1 Ack=21508721 Win=262140 Len=0
29335  96.956295   10.0.0.101 -> 10.0.0.100   TCP 1514 18257 > 5001 [PSH, ACK] Seq=21513101 Ack=1 Win=262140 Len=1460
[...]
29514  96.979388   10.0.0.100 -> 10.0.0.101   TCP 54 [TCP Dup ACK 29330#92] 5001 > 18257 [ACK] Seq=1 Ack=21508721 Win=262140 Len=0
29515  96.979492   10.0.0.101 -> 10.0.0.100   TCP 1514 18257 > 5001 [PSH, ACK] Seq=21644501 Ack=1 Win=262140 Len=1460
@avsm
Copy link
Member

avsm commented Jul 1, 2015

Can you bisect this one back to see if a particular changeset caused it, Magnus?

@yomimono
Copy link
Contributor

yomimono commented Jul 1, 2015

I'm about to attempt to confirm/deny that this is still failing since #156 was merged, also.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 1, 2015

Sure - I'll try to find the commit.

@yomimono
Copy link
Contributor

yomimono commented Jul 1, 2015

Still failing in a855bac (current latest) , unfortunately.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 1, 2015

I'm running a script now to automatically test commits back in time to see when it starts failing. It will probably take a few hours to complete.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 1, 2015

It looks like the test started failing after 7cecc9f.

@samoht
Copy link
Member

samoht commented Jul 1, 2015

eeeek. The main change here is 7cecc9f#diff-5d0c376089097530d3f7f9c4082b6443R27 when maybe changes the semantics of retransmission.... Could you add a test which fail if such retransmissions appears? Would be much easier to avoid future regressions... and I can try to revert part of the patch to see if that fixes the bug or not.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

Hm - not sure what we should test for yet. The retransmissions mostly look normal until the connection stalls (at least to me, maybe someone else can spot something). It's only in rare cases that I've seen lots of retransmissions, so it could be a separate problem that usually corrects itself.

If you want to try to revert some of the changes it's relatively easy to reproduce/test with the for-loop above. It fails after just a few attempts (<10). The pcap output is stored in tests/pcap/tcp_iperf_two_stacks_uniform_packet_loss.pcap

@yomimono
Copy link
Contributor

yomimono commented Jul 2, 2015

It's probably valuable to generate a trace with mirage-profile when we run this test as well as a packet dump; a trace from a failing test would probably be instructive.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

(if testing with master it's faster to reproduce with 25_000_000 here)

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

Hm - would it be useful to add tracing in vnetif as well? Since both client and server are running in the same process we could (in theory at least :-)) track everything that happens as a result of a Netif.write call.

@talex5
Copy link
Contributor

talex5 commented Jul 2, 2015

Any reason why we're doing performance testing using bytecode? I'd imagine native would be faster.

Anyway, here's a trace showing it being slow:

http://test.roscidus.com/traces/2015-07-02/tcp-perf-slow/

trace

Test branch: https://github.com/talex5/mirage-tcpip/tree/trace-perf (aborts if a request takes > 1s)

Raw trace: http://test.roscidus.com/traces/2015-07-02/tcp-perf-slow/trace.ctf.bz2

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

@talex5 thanks!

The test is not really intended to be a performance test right now. It just verifies that we can transfer data between two stacks over different backends.

@talex5
Copy link
Contributor

talex5 commented Jul 2, 2015

Here's a slightly suspicious bit of another run:

trace2

This is just before the system sleeps for a long time (the grey area on the right). We're doing a TCP fast transmit, but get stuck waiting for take tx_ack. I think this is due to the change in the code. Before, the retransmit was async. Now, we wait for it.

However, I don't know why that should be a problem.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

Looks like the modified test in https://github.com/talex5/mirage-tcpip/tree/trace-perf also fails with the last "good" commit f583cad, so the traces above could be from states the stack is able to recover from. Would it be possible to just label the write as slow in the trace and not exit until the test times out? It seems that it can take a while to recover from the packet loss in some cases (15 sec +).

@talex5
Copy link
Contributor

talex5 commented Jul 2, 2015

Yes, but you might need a very large trace buffer if you want to do that. If 1s delays are expected, then it might be easier to just raise the threshold to whatever you would consider unacceptable.

@balrajsingh
Copy link
Member

I think as @talex5 observed the async fast retransmit may be the change
responsible for the stall. Is it possible to just revert this and check
again?

On Thu, Jul 2, 2015 at 1:43 PM, Thomas Leonard notifications@github.com
wrote:

Yes, but you might need a very large trace buffer if you want to do that.
If 1s delays are expected, then it might be easier to just raise the
threshold to whatever you would consider unacceptable.


Reply to this email directly or view it on GitHub
#157 (comment)
.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

@balrajsingh Sure, I can check that - is it this xmit call?

7cecc9f#diff-5c19dbcb696d8a908ac70c39feacd849R352

@talex5
Copy link
Contributor

talex5 commented Jul 2, 2015

@MagnusS how about using a fixed seed for the random number generator in Uniform_packet_loss? As it is, a test may fail simply due to randomness.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

@balrajsingh @samoht The test still fails unfortunately...

@MagnusS
Copy link
Member Author

MagnusS commented Jul 2, 2015

@talex5 Hm - yes, but is (or was) very unlikely... but I guess we could get unlucky and drop a lot of packets in a row for instance.

Probably a good idea to add a seed - if we can seed the tcp/ip-stack as well we could have reproducible test runs

@balrajsingh
Copy link
Member

Thanks @MagnusS, is there a pcap trace of a failed flow, or could you
create one - hopefully there'll be some clues in the trace.

On Thu, Jul 2, 2015 at 3:09 PM, Magnus Skjegstad notifications@github.com
wrote:

@talex5 https://github.com/talex5 Hm - yes, but is (or was) very
unlikely... but I guess we could get unlucky and drop a lot of packets in a
row for instance.

Probably a good idea to add a seed - if we can seed the tcp/ip-stack as
well we could have reproducible test runs


Reply to this email directly or view it on GitHub
#157 (comment)
.

@talex5
Copy link
Contributor

talex5 commented Jul 2, 2015

@MagnusS Given the low rate of dropped packets (1%), is it reasonable that sending a segment between the two test stacks might take more than 10s? I would have thought TCP should recover before then.

@samoht
Copy link
Member

samoht commented Jul 2, 2015

The test still fails unfortunately...

did you push your patch somewhere?

@MagnusS
Copy link
Member Author

MagnusS commented Jul 3, 2015

@samoht I've pushed it here with @talex5's tracing version of the test (but with exit and debug disabled): https://github.com/magnuss/mirage-tcpip/tree/trace-iperf

Trace and pcap from failing test:
http://www.skjegstad.com/data/030715/iperf_test_timeout_c026b99.pcap.gz
http://www.skjegstad.com/data/030715/iperf_test_timeout_c026b99.ctf.gz

(the trace buffer may be too small here though)

@talex5
Copy link
Contributor

talex5 commented Jul 3, 2015

@MagnusS would be useful to put a MProf.Trace.label in Uniform_packet_loss so we can see on the trace where a packet was dropped.

(also, if wire.ml would log the sequence number when it transmits a packet, we could correlate the packets in the trace with the ones in the pcap file).

Looking at the trace, it looks like the send_empty_ack thread sent an ack (tcp-to-ip+0), then some threads on the transmit side got notified of something, but didn't do anything. Maybe the ack was dropped? What should happen in that case?

@MagnusS
Copy link
Member Author

MagnusS commented Jul 3, 2015

Good idea - pushed a patch with labels for xmit-[seq] and pkt_drop now.

Another trace and pcap:
http://www.skjegstad.com/data/030715/iperf_test_timeout_25694cb.pcap.gz
http://www.skjegstad.com/data/030715/iperf_test_timeout_25694cb.ctf.gz

Console output

Iperf server: Received connection.
Iperf client: Made connection to server.
Iperf server: t = 1, rate = 58996d KBits/s, totbytes = 7374460, live_words = 23251
Tcp.Segment: TCP retransmission on timer seq = 474847255
Iperf server: t = 4, rate = 2706d KBits/s, totbytes = 8389160, live_words = 44786
write_and_check took > 1s!
Tcp.Segment: TCP retransmission on timer seq = 477158435
Iperf server: t = 7, rate = 6163d KBits/s, totbytes = 10700340, live_words = 41321
write_and_check took > 1s!
Tcp.Segment: TCP retransmission on timer seq = 477262095
Iperf server: t = 10, rate = 276d KBits/s, totbytes = 10804000, live_words = 26961
write_and_check took > 1s!
Tcp.Segment: TCP retransmission on timer seq = 478330815
Iperf server: t = 17, rate = 1221d KBits/s, totbytes = 11872720, live_words = 28783
write_and_check took > 1s!
Iperf server: t = 18, rate = 4754d KBits/s, totbytes = 12466940, live_words = 24853
Tcp.Segment: TCP retransmission on timer seq = 480960275
Iperf server: t = 34, rate = 1018d KBits/s, totbytes = 14502180, live_words = 27096
write_and_check took > 1s!
ARP: timeout 10.0.0.101
ARP: timeout 10.0.0.100
Tcp.Segment: TCP retransmission on timer seq = 482642195
ARP: transmitting probe -> 10.0.0.100
ARP responding to: who-has 10.0.0.100?
ARP: updating 10.0.0.100 -> 02:50:2a:16:6d:02
Iperf server: t = 71, rate = 364d KBits/s, totbytes = 16184100, live_words = 30371
ARP: transmitting probe -> 10.0.0.101
ARP responding to: who-has 10.0.0.101?
ARP: updating 10.0.0.101 -> 02:50:2a:16:6d:03
write_and_check took > 1s!
Iperf server: t = 72, rate = 21456d KBits/s, totbytes = 18866120, live_words = 27786
ARP: timeout 10.0.0.101

@talex5
Copy link
Contributor

talex5 commented Jul 3, 2015

So, we dropped two packets close together. In both cases, the receiver sent a duplicate ack because it saw that a segment was missing. In the first case, we did a fast retransmission, but in the second we didn't for some reason. Is there some kind of back-off going on here?

Note: if you reenable logging, that will show up in the trace too, which might be helpful.

@balrajsingh
Copy link
Member

There is indeed!

For tl;dr-ers: this TCP is designed to recover from the usual case on the
internet of at most 1 pkt loss per window. More than 1pkt loss may require
the timer to kick, which is supposed to be read as a severe event and which
slows TCP right down. It would then additively recover to the available
rate again (and not exponentially as it does at the beginning of the flow
using the misnamed slow start), which will feel like a slow down.

Long version:

Fast rexmit is designed to recover with almost no loss in transmission rate
when 1 packet per window is lost (equiv in a way to 1 pkt per RTT or 1 of
the packets in flight). This kind of loss is actually part of TCP itself as
it probes a larger and larger congestion window till a pkt loss occurs,
then halves the cwnd and starts over to probe the window again additively.

When the '3 duplicate ack signal' (I.e. 3 pure ack pkts that don't advance
the ack number) is received, TCP rexmits the missing pkt. If the rexmitted
pkt is also lost then the only way to recover is via the retransmission
timer.

If multiple pkts in a window are lost then the first loss will cause a
whole lot of dup acks (one for each pkt above the lost pkt). After the
first 3 of these are rxed fast rexmit will send the lost pkt. Now when this
pkt is acked the ack will move on to the next lost pkt. If this ack opens
up enough window to send 3 more pkts at the top of the window and if there
are 3 pkts to send then there will be 3 dup acks again and fast rexmit will
kick in again. If however close by pkts are lost or if a rexmitted pkt is
lost or there isn't any more data to send then the only way to recover from
that loss is by the timer which will slow things down a lot temporarily.

That said, there is always the possibility that there is a bug in the logic
of this recovery code.

BTW, one common hack used is to set the number of dup acks received to 2
while the TCP is still working in the original window where the first loss
occurred so then the ack to the rexmitted pkt will point to the next lost
pkt and it will be seen as the third dup ack which will cause an immediate
re-xmission of that pkt. This (or equiv) I think is actually a suggestion
in the RFC.

I'll look thru the code again tonight.

Thanks.
On 3 Jul 2015 11:58 am, "Thomas Leonard" notifications@github.com wrote:

So, we dropped two packets close together. In both cases, the receiver
sent a duplicate ack because it saw that a segment was missing. In the
first case, we did a fast retransmission, but in the second we didn't for
some reason. Is there some kind of back-off going on here?

Note: if you reenable logging, that will show up in the trace too, which
might be helpful.


Reply to this email directly or view it on GitHub
#157 (comment)
.

@talex5
Copy link
Contributor

talex5 commented Jul 3, 2015

Thanks @balrajsingh! I wonder if the rto (retransmission timeout) calculation may be the problem. Here are some numbers I got with some extra debug:

Tcp.Window: srtt=18.0000 rttvar=9.0000 raw_rto=54.0000
Tcp.Window: rto: raw=54.000000, plus backoff_count 1 = 108.000000
Tcp.Segment: PUSHING TIMER - new time = 108.000000, new seq = 466500511

So, it thinks it the "smoothed round-trip time" between the two local stacks is 18s? Could it be including the delayed retransmissions in the srtt calculations? If so, this would explain the excessive 108s retransmission delay.

@balrajsingh
Copy link
Member

The rto is supposed to back off exponentially with each loss and then when
it has success (a whole window of packets with no loss) it should smoothly
adjust itself back. It'll be interesting to see how the rto and related
variables change over the lifetime of this flow.

On Fri, Jul 3, 2015 at 3:33 PM, Thomas Leonard notifications@github.com
wrote:

Thanks @balrajsingh https://github.com/balrajsingh! I wonder if the rto
(retransmission timeout) calculation may be the problem. Here are some
numbers I got with some extra debug:

Tcp.Window: srtt=18.0000 rttvar=9.0000 raw_rto=54.0000
Tcp.Window: rto: raw=54.000000, plus backoff_count 1 = 108.000000
Tcp.Segment: PUSHING TIMER - new time = 108.000000, new seq = 466500511

So, it thinks it the "smoothed round-trip time" between the two local
stacks is 18s? Could it be including the delayed retransmissions in the
srtt calculations? If so, this would explain the excessive 108s
retransmission delay.


Reply to this email directly or view it on GitHub
#157 (comment)
.

@talex5
Copy link
Contributor

talex5 commented Jul 3, 2015

@balrajsingh right - but srtt shouldn't be getting large though should it?

@balrajsingh
Copy link
Member

I'm not sure, I'll look. It is supposed to be RFC 2988, if it doesn't
please fix.

There may be a problem there because SRTT is a blend of current SRTT with a
real measurement. If any loss or other problem occurs then the measurement
then its not a real measurement and should not be used - the flag
rtt_timer_on manages that. There may be a problem there. Again it'll be
informative to track how this variable changes.

On Fri, Jul 3, 2015 at 5:14 PM, Thomas Leonard notifications@github.com
wrote:

@balrajsingh https://github.com/balrajsingh right - but srtt shouldn't
be getting large though should it?


Reply to this email directly or view it on GitHub
#157 (comment)
.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 4, 2015

Here's another trace+pcap of a failing test with more debugging enabled (code is here):

http://www.skjegstad/data/040715/iperf_test_timeout_11c3aa1.ctf.gz
http://www.skjegstad/data/040715/iperf_test_timeout_11c3aa1.pcap.gz

Here's the console output:

Tcp.PCB: process-synack: [channels=0 listens=0 connects=1]                                                                                                                                       [360/49796]
Tcp.PCB: new-client-connection: [channels=0 listens=0 connects=0]
Tcp.PCB: process-ack: [channels=0 listens=1 connects=0]
Iperf server: Received connection.
Iperf client: Made connection to server.
Tcp.Segment: TCP fast retransmission seq=466590242, dupack=466590242
Tcp.Segment: TCP fast retransmission seq=466759602, dupack=466759602
Tcp.Segment: TCP fast retransmission seq=467415142, dupack=467415142
Tcp.Segment: TCP fast retransmission seq=467482302, dupack=467482302
Tcp.Segment: TCP fast retransmission seq=467498362, dupack=467498362
Tcp.Segment: PUSHING TIMER - new time=1.000000, new seq=467504202
Tcp.Segment: TCP retransmission on timer seq = 467504202
Iperf server: t = 4, rate = 2085d KBits/s, totbytes = 1042440, live_words = 33109
Tcp.Segment: PUSHING TIMER - new time = 9.000000, new seq = 467504202
write_and_check took > 1s!
Tcp.Segment: TCP fast retransmission seq=467594722, dupack=467594722
Tcp.Segment: TCP fast retransmission seq=467892562, dupack=467892562
Tcp.Segment: TCP fast retransmission seq=468263402, dupack=468263402
Tcp.Segment: PUSHING TIMER - new time=1.000000, new seq=468280922
Tcp.Segment: TCP retransmission on timer seq = 468280922
Iperf server: t = 14, rate = 621d KBits/s, totbytes = 1819160, live_words = 34055
Tcp.Segment: PUSHING TIMER - new time = 22.500000, new seq = 468280922
write_and_check took > 1s!
Tcp.Segment: PUSHING TIMER - new time=22.500000, new seq=468283842
Tcp.Segment: TCP retransmission on timer seq = 468283842
Iperf server: t = 59, rate = 1d KBits/s, totbytes = 1822080, live_words = 34342
Tcp.Segment: PUSHING TIMER - new time = 270.000000, new seq = 468283842
write_and_check took > 1s!
Tcp.Segment: TCP fast retransmission seq=468755422, dupack=468755422
Tcp.Segment: TCP fast retransmission seq=468793382, dupack=468793382
Tcp.Segment: TCP fast retransmission seq=469196342, dupack=469196342
Iperf server: t = 60, rate = 8982d KBits/s, totbytes = 2944820, live_words = 23343
Tcp.Segment: TCP fast retransmission seq=469408042, dupack=469408042
Tcp.Segment: TCP fast retransmission seq=469504402, dupack=469504402
Tcp.Segment: TCP fast retransmission seq=469546742, dupack=469546742
ARP: timeout 10.0.0.101
ARP: timeout 10.0.0.100

The timer went up to 270 seconds when the test failed - the test timeout is 120 sec. Could the timer not be triggered properly after commit 7cecc9f? I'll try to wrap the other xmit-calls in the timer in async as well to see if that helps.

@MagnusS
Copy link
Member Author

MagnusS commented Jul 4, 2015

Looks like it works now! This call to xmit also had to be in Lwt.async: 7cecc9f#diff-5c19dbcb696d8a908ac70c39feacd849R298

I've run about 30 tests so far and the timer seems to never go above 1-2 seconds. I'll run a few more tests and prepare a PR :-)

It would be really nice to have some of this debugging and tracing available by default in the tests! Would it be possible to enable profiling automatically when it's supported?

@samoht
Copy link
Member

samoht commented Jul 4, 2015

yay! thanks for spotting this. If you have any idea to test that we don't have more regression (apart in forbidding me to push new code to that repository) that would be great. And yes, turning the profiling on during the test is a good idea!

@talex5
Copy link
Contributor

talex5 commented Jul 4, 2015

@MagnusS you'd have to detect that the mirage-profile.unix ocamlfind module was present and conditionally compile some extra code. For now, I'd suggest just leaving the trace initialising code in there, but commented out.

MagnusS added a commit to MagnusS/mirage-tcpip that referenced this issue Jul 7, 2015
samoht added a commit that referenced this issue Jul 7, 2015
Fix timer issue from #157 + test improvements
This was referenced Jul 7, 2015
@talex5
Copy link
Contributor

talex5 commented Aug 5, 2015

(continuing the discussion in mirage/ocaml-git#117)

This has started happening again. My last two builds of CueKeeper on Travis failed with timeouts in the tcpip tests.

First failure:

#=== ERROR while installing tcpip.2.6.0 =======================================#
# opam-version 1.2.2
# os           linux
# command      make test TESTFLAGS=-v
# path         /home/travis/.opam/system/build/tcpip.2.6.0
# compiler     system (4.02.1)
# exit-code    2
# env-file     /home/travis/.opam/system/build/tcpip.2.6.0/tcpip-35867-62756b.env
# stdout-file  /home/travis/.opam/system/build/tcpip.2.6.0/tcpip-35867-62756b.out
# stderr-file  /home/travis/.opam/system/build/tcpip.2.6.0/tcpip-35867-62756b.err
### stdout ###
# [...]
[ERROR]     iperf    2   iperf with two stacks and uniform packet loss.
[SKIP]      iperf    3   iperf with two stacks, basic tests, longer.
[SKIP]      iperf    4   iperf with two stacks and uniform packet loss, longer.
# -- iperf.002 Failed --
# iperf with two stacks and uniform packet loss.
# ./_tests/iperf.002.output:
# --
# [exception] OUnitTest.OUnit_failure("iperf test timed out after 120.000000 seconds")
# 
# 1 error! in 1.047s. 17 tests run.

Failure on rebuild:

#=== ERROR while installing tcpip.2.6.0 =======================================#
# opam-version 1.2.2
# os           linux
# command      make test TESTFLAGS=-v
# path         /home/travis/.opam/system/build/tcpip.2.6.0
# compiler     system (4.02.1)
# exit-code    2
# env-file     /home/travis/.opam/system/build/tcpip.2.6.0/tcpip-35903-c947fc.env
# stdout-file  /home/travis/.opam/system/build/tcpip.2.6.0/tcpip-35903-c947fc.out
# stderr-file  /home/travis/.opam/system/build/tcpip.2.6.0/tcpip-35903-c947fc.err
### stdout ###
# [...]
[ERROR]     iperf    2   iperf with two stacks and uniform packet loss.
[SKIP]      iperf    3   iperf with two stacks, basic tests, longer.
[SKIP]      iperf    4   iperf with two stacks and uniform packet loss, longer.
# -- iperf.002 Failed --
# iperf with two stacks and uniform packet loss.
# ./_tests/iperf.002.output:
# --
# [exception] OUnitTest.OUnit_failure("iperf test timed out after 120.000000 seconds")
# 
# 1 error! in 1.128s. 17 tests run.

Did something change recently?

@MagnusS
Copy link
Member Author

MagnusS commented Aug 6, 2015

(continuing the discussion from mirage/ocaml-git#117)

@talex5 wrote

We could still have the 120s timeout, but they'd be virtual seconds, so not affected by the load on the machine running the tests (I'm assuming that's the cause).

Hm... How would we advance the virtual clock? I guess we could base it on how fast we can transfer data between the stacks and adjust the clock to a target throughput, but I'm not sure how well that would work with packet loss and varying throughput.

I think we should at least change the timeout to be reset for every new byte received (with normal clock for now) - then it would also be independent of the amount of data transferred. And two minutes without a single byte transferred seems very slow even for Travis :-)

@talex5
Copy link
Contributor

talex5 commented Aug 6, 2015

@MagnusS when all callbacks scheduled for the current time have run, the clock advances automatically to the next scheduled event (sleep timeout). At least, that's how the CueKeeper test clock works:

https://github.com/talex5/cuekeeper/blob/94ee4e17d24da75cace5b15f10842baf8f557f58/tests/test.ml#L27

We should probably add a (virtual) delay to the virtual network to simulate packet transmission time. This would also allow simulation of networks of varying speeds.

Resetting the timeout after each byte might work too, but we also want to know when the stack really is behaving slowly, and if we make it too generous we might miss that. After all, the last time this happened it was a real bug...

@yomimono
Copy link
Contributor

yomimono commented Oct 1, 2016

I think we've fixed the specific problems mentioned here with recent updates to tcpip. I'm closing this issue; please feel free to open new ones for work mentioned here that has yet to be completed.

@yomimono yomimono closed this as completed Oct 1, 2016
samoht pushed a commit to samoht/mirage-tcpip that referenced this issue Apr 4, 2017
samoht added a commit to samoht/mirage-tcpip that referenced this issue Apr 4, 2017
Fix timer issue from mirage#157 + test improvements
avsm added a commit to avsm/opam-repository that referenced this issue Feb 3, 2019
CHANGES:

* Use `Lwt_dllist` instead of `Lwt_sequence`, due to the latter being deprecated
  upstream in Lwt (ocsigen/lwt#361) (mirage/mirage-tcpip#388 by @avsm).
* Remove arpv4 and ethif sublibraries, now provided by ethernet and arp-mirage
  opam packages (mirage/mirage-tcpip#380 by @hannesm).
* Upgrade from jbuilder to dune (mirage/mirage-tcpip#391 @avsm)
* Switch from topkg to dune-release (mirage/mirage-tcpip#391 @avsm)

### v3.6.0 (2019-01-04)

* The IPv4 implementation now supports reassembly of IPv4 fragments (mirage/mirage-tcpip#375 by @hannesm)
  - using a LRU cache using up to 256KB memory
  - out of order fragments are supported
  - maximum number of fragments is 16
  - timeout between first and last fragment is 10s
  - overlapping fragments are dropped

* IPv6: use correct timeout value after first NS message (mirage/mirage-tcpip#334 @djs55)

* Use `Ipaddr.pp` instead of `Ipaddr.pp_hum` due to upstream
  interface changes (mirage/mirage-tcpip#385 @hannesm).

### v3.5.1 (2018-11-16)

* socket stack (tcp/udp): catch exception in recv_from and accept (mirage/mirage-tcpip#376 @hannesm)
* use mirage-random-test for testing (Stdlibrandom got removed from mirage-random>1.2.0, mirage/mirage-tcpip#377 @hannesm)

### v3.5.0 (2018-09-16)

* Ipv4: require Mirage_random.C, used for generating IPv4 identifier instead of using OCaml's stdlib Random directly (mirage/mirage-tcpip#371 @hannesm)
* Tcp: use entire 32 bits at random for the initial sequence number, thanks to Spencer Michaels and Jeff Dileo of NCC Group for reporting (mirage/mirage-tcpip#371 @hannesm)
* adjust to mirage-protocols 1.4.0 and mirage-stack 1.3.0 changes (mirage/mirage-tcpip#371 @hannesm)
  Arp no longer contains the type alias ethif
  Ethif no longer contains the type alias netif
  Static_ipv4 no longer contains the type alias ethif and prefix
  Ipv6 no longer contains the type alias ethif and prefix
  Mirage_protocols_lwt.IPV4 no longer contains the type alias ethif
  Mirage_protocols_lwt.UDPV4 and TCPV4 no longer contain the type alias ip
* remove unused types: 'a config, netif, and id from socket and direct stack (mirage/mirage-tcpip#371 @hannesm)
* remove usage of Result, depending on OCaml >= 4.03.0 (mirage/mirage-tcpip#372 @hannesm)

### v3.4.2 (2018-06-15)

Note the use of the new TCP keep-alive feature can cause excessive amounts
of memory to be used in some circumstances, see
  mirage/mirage-tcpip#367

* Ensure a zero UDP checksum is sent as 0xffff, not 0x0000 (mirage/mirage-tcpip#359 @stedolan)
* Avoid leaking a file descriptor in the socket stack if the connection fails (mirage/mirage-tcpip#363 @hannesm)
* Avoid raising an exception with `Lwt.fail` when `write` fails in the socket stack (mirage/mirage-tcpip#363 @hannesm)
* Ignore `EBADF` errors in `close` in the socket stack (mirage/mirage-tcpip#366 @hannesm)
* Emit a warning when TCP keep-alives are used (mirage/mirage-tcpip#368 @djs55)

### v3.4.1 (2018-03-09)

* expose tcp_socket_options in the socket stack, fixing downstream builds (mirage/mirage-tcpip#356 @yomimono)
* add missing dependencies and constraints (mirage/mirage-tcpip#354 @yomimono, mirage/mirage-tcpip#353 @rgrinberg)
* remove leftover ocamlbuild files (mirage/mirage-tcpip#353 @rgrinberg)

### v3.4.0 (2018-02-15)

* Add support for TCP keepalives (mirage/mirage-tcpip#338 @djs55)
* Fix TCP deadlock (mirage/mirage-tcpip#343 @mfp)
* Update the CI to test OCaml 4.04, 4.05, 4.06 (mirage/mirage-tcpip#344 @yomimono)

### v3.3.1 (2017-11-07)

* Add an example for user-space `ping`, and some socket ICMPv4 fixes (mirage/mirage-tcpip#336 @djs55)
* Make tcpip safe-string-safe (and buildable by default on OCaml 4.06.0) (mirage/mirage-tcpip#341 @djs55)

### v3.3.0 (2017-08-08)

* Test with current mirage-www master (mirage/mirage-tcpip#323 @yomimono)
* Improve the Tcp.Wire API (mirage/mirage-tcpip#325 @samoht)
* Add dependency from stack-unix to io-page-unix (@avsm)
* Replace dependency on cstruct.lwt with cstruct-lwt (mirage/mirage-tcpip#322 @yomimono)
* Update to lwt 3.0 (mirage/mirage-tcpip#326 @samoht)
* Replace oUnit with alcotest (mirage/mirage-tcpip#329 @samoht)
* Fix stub linking on Xen (mirage/mirage-tcpip#332 @djs55)
* Add support for ICMP sockets on Windows (mirage/mirage-tcpip#333 @djs55)

### v3.2.0 (2017-06-26)

* port to jbuilder. Build time is now roughly 4-5x faster than the old oasis-based build system.
* packs have been replaced by module aliases.

### v3.1.4 (2017-06-12)

* avoid linking to cstruct.ppx in the compiled library and only use it at build time (mirage/mirage-tcpip#316 @djs55)
* use improved packet size support in `mirage-vnetif>=0.4.0` to test the MTU fixes in mirage/mirage-tcpip#313.

### v3.1.3 (2017-05-23)

* involve the IP layer's MTU in the TCP MSS calculation (hopefully correctly) (mirage/mirage-tcpip#313, by @yomimono)

### v3.1.2 (2017-05-14)

* impose a maximum TCP MSS of 1460 to avoid sending over-large datagrams on 1500 MTU links
  (mirage/mirage-tcpip#309, by @hannesm)

### v3.1.1 (2017-05-14)

* fix parsing 20-byte cstructs as ipv4 packets (mirage/mirage-tcpip#307, by @yomimono)
* udp: payload length parse fix (mirage/mirage-tcpip#307, by @yomimono)
* support lwt >= 2.7.0 (mirage/mirage-tcpip#308, by @djs55)

### v3.1.0 (2017-03-14)

* implement MTU setting and querying in the Ethernet module (compatibility with mirage-protocols version 1.1.0), and use this value to inform TCP's MSS. (mirage/mirage-tcpip#288, by @djs55)
* rename the ~payload argument of TCP/UDP marshallers to `~payload_len`, in an attempt to clarify that the payload will not be copied to the Cstruct.t returned by these functions (mirage/mirage-tcpip#301, by @talex5)
* functorize ipv6 over a random implementation (mirage/mirage-tcpip#298, by @olleolleolle and @hannesm)
* add tests for sending and receiving UDP packets over IPv6 (mirage/mirage-tcpip#300, by @mattgray)
* avoid float in TCP RTO calculations. (mirage/mirage-tcpip#295, by @olleolleolle and @mattgray)
* numerous bugfixes in header marshallers and unmarshallers (mirage/mirage-tcpip#301, by @talex5 and @yomimono)
* replace polymorphic equality in `_packet.equals` functions (mirage/mirage-tcpip#302, by @yomimono)

### v3.0.0 (2017-02-23)

* adapt to MirageOS 3 API changes (*many* PRs, from @hannesm, @samoht, and @yomimono):
  - replace error polyvars in many functions with result types
  - define and use error types
  - `connect` in various modules now returns the device directly or raises an exception
  - refer to mirage-protocols and mirage-stacks, rather than mirage-types
* if no UDP source port is given to UDP.write, choose a random one (mirage/mirage-tcpip#272, by @hannesm)
* remove `Ipv4.Routing.No_route_to_destination_address` exception; treat routing failures as normal packet loss in TCP (mirage/mirage-tcpip#269, by @yomimono)
* Ipv6.connect takes a list of IPs (mirage/mirage-tcpip#268, by @yomimono)
* remove exception "Refused" in TCP (mirage/mirage-tcpip#267, by @yomimono)
* remove DHCP module. Users may be interested in the replacement charrua-core (mirage/mirage-tcpip#260, by @yomimono)
* move Ipv4 to Static\_ipv4, which can be used by other IPv4 modules with their own configuration logic (mirage/mirage-tcpip#260, by @yomimono)
* remove `mode` from STACKV4 record and configuration; Ipv4.connect now requires address parameters and the module exposes no methods for modifying them. (mirage/mirage-tcpip#260, by @yomimono)
* remove unused `id` types no longer required by mirage-types (mirage/mirage-tcpip#255, by @yomimono)
* overhaul how `random` is used and handled (mirage/mirage-tcpip#254 and others, by @hannesm)
* fix redundant `memset` that zeroed out options in Tcp\_packet.Marshal.into\_cstruct (mirage/mirage-tcpip#250, by @balrajsingh)
* add vnetif backend for triggering fast retransmit in iperf tests (mirage/mirage-tcpip#248, by @MagnusS)
* fixes for incorrect timer values (mirage/mirage-tcpip#247, by @balrajsingh)
* add vnetif backend that drops packets with no payload (mirage/mirage-tcpip#246, by @MagnusS)
* fix a race when closing test pcap files (mirage/mirage-tcpip#246, by @MagnusS)

### v2.8.1 (2016-09-12)

* Set the TCP congestion window correctly when going into fast-recovery mode. (mirage/mirage-tcpip#244, by @balrajsingh)
* When TCP packet loss is discovered by timeout, allow transition into fast-recovery mode. (mirage/mirage-tcpip#244, by @balrajsingh)

### v2.8.0 (2016-04-04)

* Provide an implementation for the ICMPV4 module type defined in mirage-types 2.8.0.  Remove default ICMP handling from the IPv4 module, but preserve it in tcpip-stack-direct. (mirage/mirage-tcpip#195 by @yomimono)
* Explicitly require the use of an OCaml compiler >= 4.02.3 . (mirage/mirage-tcpip#195 by @yomimono)
* Explicitly depend on `result`. (mirage/mirage-tcpip#195 by @yomimono)

### v2.7.0 (2016-03-20)

* Raise Invalid\_argument if given an invalid port number in listen_{tcp,udp}v4
  (mirage/mirage-tcpip#173 by @matildah and mirage/mirage-tcpip#175 by @hannesm)
* Improve TCP options marshalling/unmarshalling (mirage/mirage-tcpip#174 by @yomimono)
* Add state tests and fixes for closure conditions (mirage/mirage-tcpip#177 mirage/mirage-tcpip#176 by @yomimono)
* Remove bogus warning (mirage/mirage-tcpip#178 by @talex5)
* Clean up IPv6 stack (mirage/mirage-tcpip#179 by @nojb)
* RST checking from RFC5961 (mirage/mirage-tcpip#182 by @ppolv)
* Transform EPIPE exceptions into `Eof (mirage/mirage-tcpip#183 by @djs55)
* Improve error strings in IPv4 (mirage/mirage-tcpip#184 by @yomimono)
* Replace use of cstruct.syntax with cstruct.ppx (mirage/mirage-tcpip#188 by @djs55)
* Make the Unix subpackages optional, so the core builds on Win32
  (mirage/mirage-tcpip#191 by @djs55)

### v2.6.1 (2015-09-15)

* Add optional arguments for settings in ip v6 and v4 connects (mirage/mirage-tcpip#170, by @Drup)
* Expose `Ipv4.Routing.No_route_to_destination_address` (mirage/mirage-tcpip#166, by @yomimono)

### v2.6.0 (2015-07-29)

* ARP now handles ARP frames, not Ethernet frames with ARP payload
  (mirage/mirage-tcpip#164, by @hannesm)
* Check length of received ethernet frame to avoid cstruct exceptions
  (mirage/mirage-tcpip#117, by @hannesm)
* Pull arpv4 module out of ipv4. Also add unit-tests for the newly created
  ARP library  (mirage/mirage-tcpip#155, by @yomimono)

### v2.5.1 (2015-07-07)

* Fix regression introduced in 2.5.0 where packet loss could lead to the
  connection to become very slow (mirage/mirage-tcpip#157, MagnusS, @talex5, @yomimono and
  @balrajsingh)
* Improve the tests: more logging, more tracing and compile to native code when
  available, etc (@MagnusS and @talex5)
* Do not raise `Invalid_argument("Lwt.wakeup_result")` everytime a connection
  is closed. Also now pass the raised exceptions to `Lwt.async_exception_hook`
  instead of ignoring them transparently, so the user can decide to shutdown
  its application if something wrong happens (mirage/mirage-tcpip#153, mirage/mirage-tcpip#156, @yomomino and @talex5)
* The `channel` library now lives in a separate repository and is released
  separately (mirage/mirage-tcpip#159, @samoht)

### v2.5.0 (2015-06-10)

* The test runs now produce `.pcap` files (mirage/mirage-tcpip#141, by @MagnusS)
* Strip trailing bytes from network packets (mirage/mirage-tcpip#145, by @talex5)
* Add tests for uniform packet loss (mirage/mirage-tcpip#147, by @MagnusS)
* fixed bug where in case of out of order packets the ack and window were set
  incorrectly (mirage/mirage-tcpip#140, mirage/mirage-tcpip#146)
* Properly handle RST packets (mirage/mirage-tcpip#107, mirage/mirage-tcpip#148)
* Add a `Log` module to control at runtime the debug statements which are
  displayed (mirage/mirage-tcpip#142)
* Writing in a PCB which does not have the right state now returns an error
  instead of blocking (mirage/mirage-tcpip#150)

### v2.4.3 (2015-05-05)

* Fix infinite loop in `Channel.read_line` when the line does not contain a CRLF
  sequence (mirage/mirage-tcpip#131)

### v2.4.2 (2015-04-29)

* Fix a memory leak in `Channel` (mirage/mirage-tcpip#119, by @yomimono)
* Add basic unit-test for channels (mirage/mirage-tcpip#119, by @yomimono)
* Add alcotest testing templates
* Modernize Travis CI scripts

### v2.4.1 (2015-04-21)

* Merge between 2.4.0 and 2.3.1

### v2.4.0 (2015-03-24)

* ARP improvements (mirage/mirage-tcpip#118)

### v2.3.1 (2015-03-31)

* Do not raise an assertion if an IP frame has extra trailing bytes (mirage/mirage-tcpip#221).

### v2.3.0 (2015-03-09)

* Fix `STACKV4` for the `DEVICE` signature which has `connect` removed
  (in Mirage types 2.3+).

### v2.2.3 (2015-03-09)

* Add ICMPv6 error reporting functions (mirage/mirage-tcpip#101)
* Add universal IP address converters (mirage/mirage-tcpip#108)
* Add `error_message` functions for human-readable errors (mirage/mirage-tcpip#98)
* Improve debug logging for ICMP Destination Unreachable packets.
* Filter incoming frames by MAC address to stop sending unnecessary RSTs. (mirage/mirage-tcpip#114)
* Unhook unused modules `Sliding_window` and `Profiler` from the build. (mirage/mirage-tcpip#112)
* Add an explicit `connect` method to the signatures. (mirage/mirage-tcpip#100)

### v2.2.2 (2015-01-11)

* Readded tracing and ARP fixes which got accidentally reverted in the IPv6
  merge. (mirage/mirage-tcpip#96)

### v2.2.1 (2014-12-20)

* Use `Bytes` instead of `String` to begin the `-safe-string` migration in OCaml
  4.02.0 (mirage/mirage-tcpip#93).
* Remove dependency on `uint` to avoid the need for a C stub (mirage/mirage-tcpip#92).

### v2.2.0 (2014-12-18)

Add IPv6 support. This changeset minimises interface changes to the existing
`STACKV4` interfaces to faciliate a progressive merge.  The only visible
interface changes are:

* `IPV4.set_ipv4_*` functions have been renamed `IPV4.set_ip_*` because they
  are shared between IPV4 and IPV6.
* `IPV4.get_ipv4` and `get_ipv4_netmask` now return a `list` of `Ipaddr.V4.t`
  (again because this is the common semantics with IPV6.)
* Several types that had `v4` in their names (like `IPV4.ipv4addr`) have lost
  that particle.

### v2.1.1 (2014-12-12)

* Improve console printing for the DHCP client to output line
  breaks properly on Xen consoles.

### v2.1.0 (2014-12-07)

* Build Xen stubs separately, with `CFLAGS` from `mirage-xen` 2.1.0+.
  This allows us to use the red zone under x86_64 Unix again.
* Adding tracing labels and counters, which introduces a new dependency on the
  `mirage-profile` package.

### v2.0.3 (2014-12-05)

* Fixed race waiting for ARP response (mirage/mirage-tcpip#86).
* Move the the code that configures IPv4 address, netmask and gateways
  after receiving a successful lease out of the `Dhcp_clientv4` module
  and into `Stackv4` (mirage/mirage-tcpip#87)

### v2.0.2 (2014-12-01)

* Add IPv4 multicast to MAC address mapping in IPv4 output processing
  (mirage/mirage-tcpip#81 from Luke Dunstan).
* Improve formatting of DHCP console logging, including printing out options
  (mirage/mirage-tcpip#83).
* Build with -mno-red-zone on x86_64 to avoid stack corruption on Xen (mirage/mirage-tcpip#80).

### v2.0.1 (2014-11-04)

* Fixed race condition in the signalling between the rx/tx threads under load.
* Experimentally switch to immediate ACKs in TCPv4 by default instead of delayed ones.

### v2.0.0 (2014-11-02)

* Moved 1s complement checksum C code here from mirage-platform.
* Depend on `Console_unix` and `Console_xen` instead of `Console`.
* [socket] Do not return an `Eof` when writing 0-length buffer (mirage/mirage-tcpip#76).
* [socket] Accept callbacks now run in async threads instead of being serialised
  (mirage/mirage-tcpip#75).

### v1.1.6 (2014-07-20)

* Quieten down the stack logging rate by not announcing IPv6 packet discards.
* Raise exception `Bad_option` for unparseable or invalid TCPv4 options (mirage/mirage-tcpip#57).
* Fix linking error with module `Tcp_checksum` by lifting it into top library
  (mirage/mirage-tcpip#60).
* Add `opam` file to permit easier local pinning, and fix Travis to use this.

### v1.1.5 (2014-06-18)

* Ensure that DHCP completes before the application is started, so that
  unikernels that establish outgoing connections can do so without a race.
  (fix from Mindy Preston in mirage/mirage-tcpip#53, followup in mirage/mirage-tcpip#55)
* Add `echo`, `chargen` and `discard` services into the `examples/`
  directory. (from Mindy Preston in mirage/mirage-tcpip#52).

### v1.1.4 (2014-06-03)

* [tcp] Fully process the last `ACK` in a 3-way handshake for server connections.
  This ensures that a `FIN` is correctly transmitted upon application-initiated
  connection close. (fix from Mindy Preston in mirage/mirage-tcpip#51).

### v1.1.3 (2014-03-01)

* Expose IPV4 through the STACKV4 interface.

### v1.1.2 (2014-03-27)

* Fix DHCP variable length option parsing for MTU responses, which
  in turns improves robustness on Amazon EC2 (fix from @yomimono
  via mirage/mirage-tcpip#48)

### v1.1.1 (2014-02-21)

* Catch and ignore top-level socket exceptions (mirage/mirage-tcpip#219).
* Set `SO_REUSEADDR` on listening sockets for Unix (mirage/mirage-tcpip#218).
* Adapt the Stack interfaces to the v1.1.1 mirage-types interface
  (see mirage/mirage#226 for details).

### v1.1.0 (2014-02-03)

* Rewrite of the library as a set of functors that parameterize the
  stack across the `V1_LWT` module types from Mirage 1.1.x.  This removes
  the need to compile separate Xen and Unix versions of the stack.

### v0.9.5 (2013-12-08)

* Build for either Xen or Unix, depending on the value of the `OS` envvar.
* Shift to the `mirage-types` 0.5.0+ interfaces, which breaks the
  socket backend (temporarily).
* Port the direct stack to the new interfaces.
* Add Travis CI scripts.

### v0.9.4 (2013-08-09)

* Use the `Ipaddr` external library and remove the Homebrew
  equivalents in `Nettypes`.

### v0.9.3 (2013-07-18)

* Changes in module Manager: Removed some functions from the `.mli
  (plug/unplug) and added some modifications in the way the Manager
  interacts with the underlying module Netif. The Netif.create function
  does not take a callback anymore.

### v0.9.2 (2013-07-09)

* Improve TCP state machine for connection teardown.
* Limit fragment number to 8, and coalesce buffers if it goes higher.
* Adapt to mirage-platform-0.9.2 API changes.

### v0.9.1 (2013-06-12)

* Depend on mirage-platform-0.9.1 direct tuntap interfaces.
* Version bump to catch up with mirage-platform.

### v0.5.2 (2013-02-08)

* Encourage scatter-gather I/O all the time, rather than playing tricks
  with packet header buffers. This simplifies the output path considerably
  and cuts minor heap allocations down.
* Install the packed `cmx` along with the `cmxa` to ensure that the
  compiler can do cross-module optimization (this is not a fatal error,
  but will impact performance if the `cmx` file is not present).

### v0.5.1 (2012-12-20)

* Update socket stack to use Cstruct 0.6.0 API

### v0.5.0 (2012-12-20)

* Update Cstruct API to 0.6.0
* [tcp] write now blocks if the write buffer and write window are full

### v0.4.1 (2012-12-14)

* Add iperf self-test that creates two VIFs and transmits across
  them. This is a useful local test which stresses the bridge
  code using just one VM.
* Add support for attaching existing devices when initialising the
  network manager, via an optional `attached` parameter.
* Constrain TCP connect to be a `unit Lwt.t` instead of a polymorphic
  return value.
* Expose IPv4 netmask function.
* Reduce ARP verbosity to the console.
* Fix TCP fast recovery to wait until all in-flight packets are
  acked, rather then exiting early.

### v0.4.0 (2012-12-11)

* Require OCaml-4.00.0 or higher, and add relevant build fixes
  to deal with module packing.

### v0.3.1 (2012-12-10)

* Fix the DHCP client marshalling for IPv4 addresses.
* Expose the interface MAC address in the Manager signature.
* Tweak TCP ISN calculation to be more friendly on a 32-bit host.
* Add Manager.create ?devs to control the number of Netif devices
  constructed by default.
* Add Ethif.set/disable_promiscuous to permit directly tapping
  a network interface.

### v0.3.0 (2012-09-04)

* Initial public release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants