Skip to content

Possible memory leak on new_server_connection #499

@dinosaure

Description

@dinosaure

Hi,

I currently try to stress-test an unikernel which uses the mirage-tcpip stack and see if we don't have a memory leak. I audited my application to check if new connections are properly closed - and by this way, check that if we have a memory leak, it's probably due to the underlying layer, in my case, mirage-tcpip.

The stress test is about a simple relay (compiled for Solo5/hvt and available here: bob-hvt). You can check the source code here: https://github.com/dinosaure/bob. The situation is: I try to launch 512 clients and check if they can find their peers from a shared password. On the other side, I monitor the relay.

The script (bash + parallel) to launch the stress is here:

#!/bin/bash

MAX=256
LEN=16
IP="10.0.0.2" # IP of the relay

PASSWORDS=()
for i in $(seq $MAX); do
	P=$(echo $RANDOM | base64 | head -c $LEN);
	while [[ " ${PASSWORDS[*]} " =~ " ${P} " ]]; do
		P=$(echo $RANDOM | base64 | head -c $LEN);
	done;
	PASSWORDS+=( $P )
done

PASSWORDS=($(printf "%s\n" "${PASSWORDS[@]}" | sort -u))

IFS=$'\n'
COMMANDS=()
FILES=()
for i in ${!PASSWORDS[@]}; do
	SEND="./bob.com send -r $IP file.txt -p ${PASSWORDS[$i]} -vvv 2> output.send.$i > /dev/null";
	RECV="./bob.com recv -r $IP ${PASSWORDS[$i]} -vvv -o file.txt.$i 2> output.recv.$i > /dev/null";
	COMMANDS+=( $SEND )
	COMMANDS+=( $RECV )
	FILES+=( "file.txt.$i" )
done

COMMANDS=( $(shuf -e "${COMMANDS[@]}") )

parallel -j $MAX ::: ${COMMANDS[@]}

HASHES=($(sha256sum ${FILES[@]}))
rm ${FILES[@]}

for hash in ${HASHES[@]}; do
	echo $hash
done

It requires bob.com as the client, a random file.txt and the relay deployed somewhere (by default, we use one deployed at osau.re, my server). If you launch the stress-test, you can see that the relay lets clients to solve the initial handshake and allocate a secure room to let associated peers to share something (in my example, a simple file).

If I launch the relay with 40M, you can launch only one time such script. The next time will raise an Out_of_memory exception on the relay side. I tried to inrospect this memory leak. bob is quite simple (no TLS, just a bunch of packets between peers). I used mirage-memprof to introspect the memory used by my relay. You must change the config.ml and the unikernel.ml such as:

diff --git a/mirage/config.ml b/mirage/config.ml
index 01037e1..03874db 100644
--- a/mirage/config.ml
+++ b/mirage/config.ml
@@ -21,10 +21,12 @@ let bob =
         package "bob" ~pin:"git+https://github.com/dinosaure/bob.git";
         package "spoke" ~sublibs:[ "core" ];
         package "psq";
+        package "memtrace-mirage";
       ]
     ~keys:[ Key.v port; Key.v secure_port ]
-    (time @-> stackv4v6 @-> job)
+    (pclock @-> time @-> stackv4v6 @-> job)
 
+let pclock = default_posix_clock
 let time = default_time
 let stackv4v6 = generic_stackv4v6 default_network
-let () = register "bob" [ bob $ time $ stackv4v6 ]
+let () = register "bob" [ bob $ pclock $ time $ stackv4v6 ]
diff --git a/mirage/unikernel.ml b/mirage/unikernel.ml
index 1f34edf..bea018b 100644
--- a/mirage/unikernel.ml
+++ b/mirage/unikernel.ml
@@ -52,7 +52,7 @@ let serve_when_ready :
 
 let always v _ = v
 
-module Make (Time : Mirage_time.S) (Stack : Tcpip.Stack.V4V6) = struct
+module Make (Pclock : Mirage_clock.PCLOCK) (Time : Mirage_time.S) (Stack : Tcpip.Stack.V4V6) = struct
   let service ~port tcp =
     let queue = Queue.create () in
     let condition = Lwt_condition.create () in
@@ -437,8 +437,13 @@ module Make (Time : Mirage_time.S) (Stack : Tcpip.Stack.V4V6) = struct
   end
 
   module Bob_clear = Make (Stack.TCP)
+  module Memtrace = Memtrace.Make(Pclock)(Stack.TCP)
 
-  let start _time stack =
+  let start _pclock _time stack =
+    ( Stack.TCP.create_connection (Stack.tcp stack) (Ipaddr.of_string_exn "10.0.0.1", 5678) >|= function
+    | Ok flow -> ignore (Memtrace.start_tracing ~context:None ~sampling_rate:1e-4 flow)
+    | Error e -> Logs.warn (fun m -> m "couldn't connect to tracing sink %a"
+                              Stack.TCP.pp_error e)) >>= fun () ->
     let rooms = Bob.Secured.make () in
     let handshake socket = Lwt.return (Ok (socket, Stack.TCP.dst socket)) in
     Lwt.join

Then, just before to launch the unikernel with solo5-hvt, you can retrieve the trace via: netcat -l -p 5678 > bob.trace. Afterthat, you can use the file with memtrace-viewer (on 4.13.1). From memtrace-viewer, I tried to find what is allocated and still alive just before the Out_of_memory. The nice things is: if you launch stress.sh one time, waiting few seconds, and relaunch it, you can see a "plateau" where almost everything should be a garbage. The second launch of stress.sh will call the GC but it seems clear for me that the GC did not delete everything.

One this picture, we can see the GC in action, few objects allocated before are GCed:
Screenshot 2022-11-03 133944

If we introspect more, we can see that some allocations are kept. Let's take an area where we have such allocation:
Screenshot 2022-11-03 134101

Screenshot 2022-11-03 134241

If we see what is allocated, the memtrace-viewer mentions new_server_connection. Moreover, it seems that a "thread" is kept infinitely, specially the send_empty_ack "thread":
Screenshot 2022-11-03 134344

let thread t pcb ~send_ack ~rx_ack =
let { wnd; ack; _ } = pcb in
(* Transmit an empty ack when prompted by the Ack thread *)
let rec send_empty_ack () =
Lwt_mvar.take send_ack >>= fun _ ->
let ack_number = Window.rx_nxt wnd in
let flags = Segment.No_flags in
let options = [] in
let seq = Window.tx_nxt wnd in
ACK.transmit ack ack_number >>= fun () ->
xmit_pcb t.ip pcb.id ~flags ~wnd ~options ~seq (Cstruct.create 0) >>=
fun _ -> (* TODO: what to do if sending failed. Ignoring
* errors gives us the same behavior as if the packet
* was lost in transit *)
send_empty_ack () in
(* When something transmits an ACK, tell the delayed ACK thread *)
let rec notify () =
Lwt_mvar.take rx_ack >>= fun ack_number ->
ACK.transmit ack ack_number >>= fun () ->
notify () in
send_empty_ack () <&> (notify ())

As you can see, this thread is an infinite loop. This is where the control flow becomes complex. It seems that such thread can be cancelled if an other thread raises an exception:

let catch_and_cancel = function
| Lwt.Canceled -> ()
| ex ->
(* cancel the other threads *)
List.iter Lwt.cancel threads;
Log.err (fun f -> f "thread failure: [%s]. Terminating threads and closing connection"
(Printexc.to_string ex));
on_close ();
!Lwt.async_exception_hook ex
in
List.iter (fun t -> Lwt.on_failure t catch_and_cancel) threads;

But it's clear that in my situation, the cancellation never happens. It's really hard to follow what is going on because it's a mix between the lwt control flow (bind & async), exceptions and cancellation... I probably missed something about the cancellation of the send_empty_ack thread but the reality (memtrace-viewer) shows me that such thread is never cancelled. Due to the fact that this thread require the pcb value, it kepts multiples values and, specially the initial Cstruct.t given by Netif. So, of course, if such problem appears several times, we just accumulate Cstruct.t and a fiber under the hood.

On the other side (the application side), I added few logs which permit to "track" connections. And if you launch the stress.sh the first time and introspect logs, you can see that we have 512 new connections, and Flow.close is called 512 times. That mostly means that, on the application level, we should not have a leak about our usage of the TCP/IP stack (however, if you look memtrace-viewer, you can see few references to what Bob does, and it's normal, we want to keep a global state of the relay).

Finally, I don't expect a fix because I know that this project became more and more complex - and it's difficult to really understand what is going on. But if anyone has time to look at this leak in more detail, it would be very nice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions