-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent failure of test_rx_rate_limiting test #1809
Comments
This test seemingly started failing some time in late October to mid November (hard to tell without timestamped logs) but the error has been getting more frequent lately. Neither the test runner AMI nor the docker image, nor the guest images changed, and there haven't been any suspicious changes in the rate limiter code. Since the CI improvements went in, we have more verbose logs that show Detailed investigation# Get all the CI logs
docker run --rm -it \
-v $PWD/.aws:/root/.aws \
-v $PWD:/aws \
amazon/aws-cli \
s3 sync s3://firecracker-pr/pr-logs/x86_64/ logs/
# Find PRs that failed the test_rx_rate_limiting test
grep -nr "test_rx_rate_limiting" logs/ \
| grep -v PASSED \
| grep -v setup | grep -v teardown \
| grep -v call \
| cut -d: -f1 | sort | uniq >failed_prs
# Isolate the log snippets with test_rx_rate_limiting
for pr in `cat failed_prs`; do
cat $pr \
| grep -q "__\ test_rx_rate_limiting\[ubuntu_with_ssh\]" \
&& printf "\n$pr\n" \
>>rx_errors
cat $pr \
| grep -A 6 "__\ test_rx_rate_limiting\[ubuntu_with_ssh\]" \
>>rx_errors
done
# Manual inspection of rx_errors shows intensifying bandwidth issues...
# Find iperf retransmission logs
for pr in `grep logs rx_errors`; do
cat $pr | grep -q "Retr " && printf "\n$pr\n"
cat $pr | grep "Retr " -A 3
done logs/01756/raduiliescu-firecracker-86c1bc2.log.html
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.24 GBytes 3397961 KBytes/sec 4 935 KBytes
[ 4] 1.00-2.00 sec 3.27 GBytes 3424657 KBytes/sec 0 935 KBytes
[ 4] 2.00-3.00 sec 3.30 GBytes 3462204 KBytes/sec 0 936 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 76.3 MBytes 78088 KBytes/sec 101 182 KBytes
[ 4] 1.00-2.00 sec 103 MBytes 105437 KBytes/sec 2 954 KBytes
[ 4] 2.00-3.00 sec 92.5 MBytes 94750 KBytes/sec 2 954 KBytes
logs/01772/iulianbarbu-firecracker-d0610da.log.html
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.31 GBytes 3473313 KBytes/sec 0 954 KBytes
[ 4] 1.00-2.00 sec 3.42 GBytes 3583774 KBytes/sec 0 954 KBytes
[ 4] 2.00-3.00 sec 3.46 GBytes 3629465 KBytes/sec 0 954 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 108 MBytes 110438 KBytes/sec 10 556 KBytes
[ 4] 1.00-2.00 sec 95.0 MBytes 97291 KBytes/sec 6 646 KBytes
[ 4] 2.00-3.00 sec 103 MBytes 105818 KBytes/sec 5 764 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.35 sec 1001 MBytes 2937718 KBytes/sec 0 694 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.95 sec 100 MBytes 108433 KBytes/sec 4 950 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 53.0 MBytes 54298 KBytes/sec 8 700 KBytes
[ 4] 1.00-2.00 sec 50.5 MBytes 51741 KBytes/sec 320 451 KBytes
[ 4] 2.00-3.00 sec 50.6 MBytes 51790 KBytes/sec 255 281 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 30.0 MBytes 30700 KBytes/sec 348 416 KBytes
[ 4] 1.00-2.00 sec 41.5 MBytes 42518 KBytes/sec 423 294 KBytes
[ 4] 2.00-3.00 sec 50.6 MBytes 51861 KBytes/sec 4 574 KBytes
logs/01784/serban300-firecracker-e52c46c.log.html
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.42 GBytes 3588923 KBytes/sec 0 950 KBytes
[ 4] 1.00-2.00 sec 3.37 GBytes 3531073 KBytes/sec 0 950 KBytes
[ 4] 2.00-3.00 sec 2.69 GBytes 2822642 KBytes/sec 0 950 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 108 MBytes 110315 KBytes/sec 32 714 KBytes
[ 4] 1.00-2.00 sec 94.6 MBytes 96857 KBytes/sec 7 788 KBytes
[ 4] 2.00-3.00 sec 105 MBytes 107169 KBytes/sec 0 854 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.29 sec 1001 MBytes 3497590 KBytes/sec 0 971 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.94 sec 100 MBytes 109730 KBytes/sec 18 865 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 52.8 MBytes 54061 KBytes/sec 11 546 KBytes
[ 4] 1.00-2.00 sec 51.2 MBytes 52429 KBytes/sec 4 590 KBytes
[ 4] 2.00-3.00 sec 51.1 MBytes 52375 KBytes/sec 7 629 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 46.7 MBytes 47854 KBytes/sec 9 913 KBytes
[ 4] 1.00-2.00 sec 50.5 MBytes 51737 KBytes/sec 5 919 KBytes
[ 4] 2.00-3.00 sec 35.9 MBytes 36779 KBytes/sec 73 505 KBytes
logs/01790/iulianbarbu-firecracker-6c00e6a.log.html
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.33 GBytes 3488284 KBytes/sec 2 1008 KBytes
[ 4] 1.00-2.00 sec 3.53 GBytes 3703655 KBytes/sec 1 1008 KBytes
[ 4] 2.00-3.00 sec 3.57 GBytes 3739831 KBytes/sec 0 1008 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 88.4 MBytes 90467 KBytes/sec 18 447 KBytes
[ 4] 1.00-2.00 sec 104 MBytes 106908 KBytes/sec 47 676 KBytes
[ 4] 2.00-3.00 sec 103 MBytes 105893 KBytes/sec 3 788 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.36 sec 1001 MBytes 2844073 KBytes/sec 3 693 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.94 sec 100 MBytes 109205 KBytes/sec 58 682 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 29.5 MBytes 30184 KBytes/sec 30 284 KBytes
[ 4] 1.00-2.00 sec 51.1 MBytes 52333 KBytes/sec 185 392 KBytes
[ 4] 2.00-3.00 sec 46.7 MBytes 47787 KBytes/sec 4 595 KBytes
logs/01790/iulianbarbu-firecracker-8c80094.log.html
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.61 GBytes 3787847 KBytes/sec 1 950 KBytes
[ 4] 1.00-2.00 sec 3.65 GBytes 3821898 KBytes/sec 0 950 KBytes
[ 4] 2.00-3.00 sec 3.54 GBytes 3710287 KBytes/sec 0 950 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 107 MBytes 109795 KBytes/sec 10 527 KBytes
[ 4] 1.00-2.00 sec 96.3 MBytes 98627 KBytes/sec 1 634 KBytes
[ 4] 2.00-3.00 sec 106 MBytes 108300 KBytes/sec 5 690 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.29 sec 1001 MBytes 3481979 KBytes/sec 0 967 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 62.8 MBytes 64254 KBytes/sec 11 939 KBytes
[ 4] 1.00-1.36 sec 37.4 MBytes 105836 KBytes/sec 8 946 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
logs/01800/iulianbarbu-firecracker-41bb0a5.log.html
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 3.40 GBytes 3561895 KBytes/sec 1 1.04 MBytes
[ 4] 1.00-2.00 sec 3.43 GBytes 3600858 KBytes/sec 0 1.04 MBytes
[ 4] 2.00-3.00 sec 3.43 GBytes 3600682 KBytes/sec 0 1.04 MBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 107 MBytes 109860 KBytes/sec 8 677 KBytes
[ 4] 1.00-2.00 sec 87.4 MBytes 89467 KBytes/sec 8 969 KBytes
[ 4] 2.00-3.00 sec 92.6 MBytes 94810 KBytes/sec 4 969 KBytes
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.31 sec 1000 MBytes 3285652 KBytes/sec 4 969 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-0.93 sec 100 MBytes 110349 KBytes/sec 4 2.06 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
--
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 43.3 MBytes 44355 KBytes/sec 168 440 KBytes
[ 4] 1.00-2.00 sec 45.8 MBytes 46894 KBytes/sec 9 495 KBytes
[ 4] 2.00-3.00 sec 45.4 MBytes 46518 KBytes/sec 6 831 KBytes |
TCP slow start might cause the initial bandwidth measurements from the rate limiter integ tests to be false, skewing the comparison between non-rate-limited traffic (slowed by the TCP congestion algo) and the rate-limited one (slowed by Firecracker). Letting traffic flow for 2 extra seconds to skip over TCP slow start. Fixes firecracker-microvm#1809 Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Looks like the test is still failing sometimes even with #1815 😞 |
TCP slow start might cause the initial bandwidth measurements from the rate limiter integ tests to be false, skewing the comparison between non-rate-limited traffic (slowed by the TCP congestion algo) and the rate-limited one (slowed by Firecracker). Letting traffic flow for 2 extra seconds to skip over TCP slow start. Fixes firecracker-microvm#1809 Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
TCP slow start might cause the initial bandwidth measurements from the rate limiter integ tests to be false, skewing the comparison between non-rate-limited traffic (slowed by the TCP congestion algo) and the rate-limited one (slowed by Firecracker). Letting traffic flow for 2 extra seconds to skip over TCP slow start. Fixes firecracker-microvm#1809 Signed-off-by: Alexandra Iordache <aghecen@amazon.com>
Is this issue possibly kernel based? I think I got rid of it locally by building the latest linux kernel.... regressing to 4.14 also seemed to work better for me. |
It only occurs intermittently, are you confident that building another kernel alleviates the issue? |
I could reproduce it pretty consistently on my old, slow, non-supported (from a firecracker perspective) kit. Perhaps my kit is so old that this test would always have failed, or it was something else that was wrong in my environment that was resolved by switching kernel versions. So on reflection, maybe it fixed a problem, but maybe not this problem. |
But, if it helps at all, failure rates for me are: 4.9 - 60% So maybe it's fairer to say that 4.9 seems particularly bad, for me at least, rather than 5.7 fixes it.....
|
test_rx_rate_limiting is failing intermittently.
Example of output:
The text was updated successfully, but these errors were encountered: