Hi,
I currently try to stress-test an unikernel which uses the mirage-tcpip stack and see if we don't have a memory leak. I audited my application to check if new connections are properly closed - and by this way, check that if we have a memory leak, it's probably due to the underlying layer, in my case, mirage-tcpip.
The stress test is about a simple relay (compiled for Solo5/hvt and available here: bob-hvt). You can check the source code here: https://github.com/dinosaure/bob. The situation is: I try to launch 512 clients and check if they can find their peers from a shared password. On the other side, I monitor the relay.
The script (bash + parallel) to launch the stress is here:
#!/bin/bash
MAX=256
LEN=16
IP="10.0.0.2" # IP of the relay
PASSWORDS=()
for i in $(seq $MAX); do
P=$(echo $RANDOM | base64 | head -c $LEN);
while [[ " ${PASSWORDS[*]} " =~ " ${P} " ]]; do
P=$(echo $RANDOM | base64 | head -c $LEN);
done;
PASSWORDS+=( $P )
done
PASSWORDS=($(printf "%s\n" "${PASSWORDS[@]}" | sort -u))
IFS=$'\n'
COMMANDS=()
FILES=()
for i in ${!PASSWORDS[@]}; do
SEND="./bob.com send -r $IP file.txt -p ${PASSWORDS[$i]} -vvv 2> output.send.$i > /dev/null";
RECV="./bob.com recv -r $IP ${PASSWORDS[$i]} -vvv -o file.txt.$i 2> output.recv.$i > /dev/null";
COMMANDS+=( $SEND )
COMMANDS+=( $RECV )
FILES+=( "file.txt.$i" )
done
COMMANDS=( $(shuf -e "${COMMANDS[@]}") )
parallel -j $MAX ::: ${COMMANDS[@]}
HASHES=($(sha256sum ${FILES[@]}))
rm ${FILES[@]}
for hash in ${HASHES[@]}; do
echo $hash
done
It requires bob.com as the client, a random file.txt and the relay deployed somewhere (by default, we use one deployed at osau.re, my server). If you launch the stress-test, you can see that the relay lets clients to solve the initial handshake and allocate a secure room to let associated peers to share something (in my example, a simple file).
If I launch the relay with 40M, you can launch only one time such script. The next time will raise an Out_of_memory exception on the relay side. I tried to inrospect this memory leak. bob is quite simple (no TLS, just a bunch of packets between peers). I used mirage-memprof to introspect the memory used by my relay. You must change the config.ml and the unikernel.ml such as:
diff --git a/mirage/config.ml b/mirage/config.ml
index 01037e1..03874db 100644
--- a/mirage/config.ml
+++ b/mirage/config.ml
@@ -21,10 +21,12 @@ let bob =
package "bob" ~pin:"git+https://github.com/dinosaure/bob.git";
package "spoke" ~sublibs:[ "core" ];
package "psq";
+ package "memtrace-mirage";
]
~keys:[ Key.v port; Key.v secure_port ]
- (time @-> stackv4v6 @-> job)
+ (pclock @-> time @-> stackv4v6 @-> job)
+let pclock = default_posix_clock
let time = default_time
let stackv4v6 = generic_stackv4v6 default_network
-let () = register "bob" [ bob $ time $ stackv4v6 ]
+let () = register "bob" [ bob $ pclock $ time $ stackv4v6 ]
diff --git a/mirage/unikernel.ml b/mirage/unikernel.ml
index 1f34edf..bea018b 100644
--- a/mirage/unikernel.ml
+++ b/mirage/unikernel.ml
@@ -52,7 +52,7 @@ let serve_when_ready :
let always v _ = v
-module Make (Time : Mirage_time.S) (Stack : Tcpip.Stack.V4V6) = struct
+module Make (Pclock : Mirage_clock.PCLOCK) (Time : Mirage_time.S) (Stack : Tcpip.Stack.V4V6) = struct
let service ~port tcp =
let queue = Queue.create () in
let condition = Lwt_condition.create () in
@@ -437,8 +437,13 @@ module Make (Time : Mirage_time.S) (Stack : Tcpip.Stack.V4V6) = struct
end
module Bob_clear = Make (Stack.TCP)
+ module Memtrace = Memtrace.Make(Pclock)(Stack.TCP)
- let start _time stack =
+ let start _pclock _time stack =
+ ( Stack.TCP.create_connection (Stack.tcp stack) (Ipaddr.of_string_exn "10.0.0.1", 5678) >|= function
+ | Ok flow -> ignore (Memtrace.start_tracing ~context:None ~sampling_rate:1e-4 flow)
+ | Error e -> Logs.warn (fun m -> m "couldn't connect to tracing sink %a"
+ Stack.TCP.pp_error e)) >>= fun () ->
let rooms = Bob.Secured.make () in
let handshake socket = Lwt.return (Ok (socket, Stack.TCP.dst socket)) in
Lwt.join
Then, just before to launch the unikernel with solo5-hvt, you can retrieve the trace via: netcat -l -p 5678 > bob.trace. Afterthat, you can use the file with memtrace-viewer (on 4.13.1). From memtrace-viewer, I tried to find what is allocated and still alive just before the Out_of_memory. The nice things is: if you launch stress.sh one time, waiting few seconds, and relaunch it, you can see a "plateau" where almost everything should be a garbage. The second launch of stress.sh will call the GC but it seems clear for me that the GC did not delete everything.
One this picture, we can see the GC in action, few objects allocated before are GCed:

If we introspect more, we can see that some allocations are kept. Let's take an area where we have such allocation:


If we see what is allocated, the memtrace-viewer mentions new_server_connection. Moreover, it seems that a "thread" is kept infinitely, specially the send_empty_ack "thread":

|
let thread t pcb ~send_ack ~rx_ack = |
|
let { wnd; ack; _ } = pcb in |
|
|
|
(* Transmit an empty ack when prompted by the Ack thread *) |
|
let rec send_empty_ack () = |
|
Lwt_mvar.take send_ack >>= fun _ -> |
|
let ack_number = Window.rx_nxt wnd in |
|
let flags = Segment.No_flags in |
|
let options = [] in |
|
let seq = Window.tx_nxt wnd in |
|
ACK.transmit ack ack_number >>= fun () -> |
|
xmit_pcb t.ip pcb.id ~flags ~wnd ~options ~seq (Cstruct.create 0) >>= |
|
fun _ -> (* TODO: what to do if sending failed. Ignoring |
|
* errors gives us the same behavior as if the packet |
|
* was lost in transit *) |
|
send_empty_ack () in |
|
(* When something transmits an ACK, tell the delayed ACK thread *) |
|
let rec notify () = |
|
Lwt_mvar.take rx_ack >>= fun ack_number -> |
|
ACK.transmit ack ack_number >>= fun () -> |
|
notify () in |
|
send_empty_ack () <&> (notify ()) |
As you can see, this thread is an infinite loop. This is where the control flow becomes complex. It seems that such thread can be cancelled if an other thread raises an exception:
|
let catch_and_cancel = function |
|
| Lwt.Canceled -> () |
|
| ex -> |
|
(* cancel the other threads *) |
|
List.iter Lwt.cancel threads; |
|
Log.err (fun f -> f "thread failure: [%s]. Terminating threads and closing connection" |
|
(Printexc.to_string ex)); |
|
on_close (); |
|
!Lwt.async_exception_hook ex |
|
in |
|
List.iter (fun t -> Lwt.on_failure t catch_and_cancel) threads; |
But it's clear that in my situation, the cancellation never happens. It's really hard to follow what is going on because it's a mix between the lwt control flow (bind & async), exceptions and cancellation... I probably missed something about the cancellation of the send_empty_ack thread but the reality (memtrace-viewer) shows me that such thread is never cancelled. Due to the fact that this thread require the pcb value, it kepts multiples values and, specially the initial Cstruct.t given by Netif. So, of course, if such problem appears several times, we just accumulate Cstruct.t and a fiber under the hood.
On the other side (the application side), I added few logs which permit to "track" connections. And if you launch the stress.sh the first time and introspect logs, you can see that we have 512 new connections, and Flow.close is called 512 times. That mostly means that, on the application level, we should not have a leak about our usage of the TCP/IP stack (however, if you look memtrace-viewer, you can see few references to what Bob does, and it's normal, we want to keep a global state of the relay).
Finally, I don't expect a fix because I know that this project became more and more complex - and it's difficult to really understand what is going on. But if anyone has time to look at this leak in more detail, it would be very nice.
Hi,
I currently try to stress-test an unikernel which uses the
mirage-tcpipstack and see if we don't have a memory leak. I audited my application to check if new connections are properly closed - and by this way, check that if we have a memory leak, it's probably due to the underlying layer, in my case,mirage-tcpip.The stress test is about a simple relay (compiled for Solo5/hvt and available here: bob-hvt). You can check the source code here: https://github.com/dinosaure/bob. The situation is: I try to launch 512 clients and check if they can find their peers from a shared password. On the other side, I monitor the relay.
The script (
bash+parallel) to launch the stress is here:It requires
bob.comas the client, a randomfile.txtand the relay deployed somewhere (by default, we use one deployed atosau.re, my server). If you launch the stress-test, you can see that the relay lets clients to solve the initial handshake and allocate a secure room to let associated peers to share something (in my example, a simple file).If I launch the relay with 40M, you can launch only one time such script. The next time will raise an
Out_of_memoryexception on the relay side. I tried to inrospect this memory leak.bobis quite simple (no TLS, just a bunch of packets between peers). I usedmirage-memprofto introspect the memory used by my relay. You must change theconfig.mland theunikernel.mlsuch as:Then, just before to launch the unikernel with
solo5-hvt, you can retrieve the trace via:netcat -l -p 5678 > bob.trace. Afterthat, you can use the file withmemtrace-viewer(on 4.13.1). Frommemtrace-viewer, I tried to find what is allocated and still alive just before theOut_of_memory. The nice things is: if you launchstress.shone time, waiting few seconds, and relaunch it, you can see a "plateau" where almost everything should be a garbage. The second launch ofstress.shwill call the GC but it seems clear for me that the GC did not delete everything.One this picture, we can see the GC in action, few objects allocated before are GCed:

If we introspect more, we can see that some allocations are kept. Let's take an area where we have such allocation:

If we see what is allocated, the

memtrace-viewermentionsnew_server_connection. Moreover, it seems that a "thread" is kept infinitely, specially thesend_empty_ack"thread":mirage-tcpip/src/tcp/flow.ml
Lines 145 to 166 in 4bae549
As you can see, this thread is an infinite loop. This is where the control flow becomes complex. It seems that such thread can be cancelled if an other thread raises an exception:
mirage-tcpip/src/tcp/flow.ml
Lines 384 to 394 in 4bae549
But it's clear that in my situation, the cancellation never happens. It's really hard to follow what is going on because it's a mix between the lwt control flow (bind & async), exceptions and cancellation... I probably missed something about the cancellation of the
send_empty_ackthread but the reality (memtrace-viewer) shows me that such thread is never cancelled. Due to the fact that this thread require thepcbvalue, it kepts multiples values and, specially the initialCstruct.tgiven byNetif. So, of course, if such problem appears several times, we just accumulateCstruct.tand a fiber under the hood.On the other side (the application side), I added few logs which permit to "track" connections. And if you launch the
stress.shthe first time and introspect logs, you can see that we have 512 new connections, andFlow.closeis called 512 times. That mostly means that, on the application level, we should not have a leak about our usage of the TCP/IP stack (however, if you lookmemtrace-viewer, you can see few references to whatBobdoes, and it's normal, we want to keep a global state of the relay).Finally, I don't expect a fix because I know that this project became more and more complex - and it's difficult to really understand what is going on. But if anyone has time to look at this leak in more detail, it would be very nice.