Memory leak on FreeBSD 12.1+ when communicating over `vici` socket. #966

mskalski · 2022-03-22T11:46:10Z

System (please complete the following information):

OS: FreeBSD 12.1, 12.2, 13.0, 13.1-BETA2
Kernel version (if applicable): 13.1-BETA2, the same on 12.1+
strongSwan version(s): latest from git (FreeBSD strongSwan U5.9.5/K13.1-BETA2)
Tested/confirmed with the latest version: yes

Describe the bug
On FreeBSD system (amd64, arm64) when communicating over vici socket memory leaks in terms of constantly increasing Virtual and Resident (VSS and RSS) memory of process occur, until all system memory is exhausted, when process charon is killed by kernel with message kernel: pid 903 (charon), jid 0, uid 0, was killed: failed to reclaim memory.

Any tool for memory leak detection tools (valgrind, ktrace) does not detect any memory leaks, increasing RSS is the only symptom.

Also this behaviour is not observed on Linux (tested on Ubuntu 20.04 and Debian 10 bookworm/sid).

To Reproduce
Steps to reproduce the behavior:

Download any vm image with FreeBSD 12.0+ (was tested also on latest amd64 13.1-BETA2 to confirm)
Configure virtual machine; for strongswan compilation give more memory, but for test 256 MB is enough.
Run VM and disable swap (to speed-up failure)
```
# swapoff /dev/gpt/swapfs
```

install required packages for strongswan compilation:

# pkg install git autoconf gperf autoconf-archive libtool m4 automake flex bison pkgconf gettext

get strongswan: git clone https://github.com/strongswan/strongswan

Compile strongswan:

cd strongswan
./autogen.sh
./configure --disable-kernel-netlink --enable-kernel-pfroute --enable-kernel-pfkey --disable-gmp --enable-openssl --enable-mediation --disable-scripts --with-group=wheel --enable-gcm --enable-ccm --enable-pkcs11
make -j4
make install

start strongswan: ipsec start

run in loop any command which communicates on vici interface, swanctl --stats is enough to reproduce error:

sh -c 'while swanctl --stats >/dev/null; do true; done'

or

sh -c 'while swanctl --stats >/dev/null; do sleep 0.1; done'

Observe increase of VSS and RSS (Virtual and resident) memory of charon process, using e.g. top or attached memlogger.py
After few hours charon should be killed by kernel due to not enough memory/swap space.

Expected behavior
Memory used by charon process should not constantly increase, and should not finally cause charon to be killed.

Logs/Backtraces

Attaching simple python script to monitor and write charon memory usage: memlogger.py - requires python3 and python package psutil (pkg install py38-psutil).
result of that command charon_mem.log
Increase of RSS memory used by charon from 9MB to 95MB took about 1.5 hour

Additional context
Problem occurred when monitored via vici socket state of charon daemon (tunnel definitions, SAs, etc), but it was also reproduced using simple swanctl --stats command repeated in loop.

No change in this beaviour is observed when using different configure's --with-printf-hooks= -- according to issue in pfsense: https://redmine.pfsense.org/issues/5149 this could be the reason, but tests with --with-printf-hooks=builtin, --with-printf-hooks=glibc and --with-printf-hooks=vstr did not fix the error.

The text was updated successfully, but these errors were encountered:

tobiasbrunner · 2022-03-22T12:08:46Z

Does it also happen if you use VICI over TCP instead of a UNIX socket?

mskalski · 2022-03-22T12:58:33Z

Changed charon.plugins.vici.socket = tcp://0.0.0.0:4502 in /usr/local/etc/strongswan.d/charon/vici.conf and swanctl.socket = tcp://127.0.0.1:4502 in /usr/local/etc/strongswan.d/swanctl.conf, attaching
charon_mem-tcp.log and RSS memory is still increasing.

In fact I think this behaviour is because frequent memory allocation and deallocation (malloc/free functions), which is used in vici plugin. And I observed that this increase can also be caused by SA renegotiations, but that is harder to isolate. I make some tests then.

One more note: I first observed this behaviour with strongswan version FreeBSD strongSwan U5.9.2/K12.1-RELEASE-p7(which was first version I started work with, not tested earlier).

tobiasbrunner · 2022-03-22T13:37:36Z

In fact I think this behaviour is because frequent memory allocation and deallocation (malloc/free functions), which is used in vici plugin.

That would indicate an issue in the memory allocator of FreeBSD's libc (fragmentation or something) or maybe even the kernel, not really strongSwan (in particular because it apparently isn't an issue on Linux). Did you have a look at the man page for FreeBSD's malloc()? There seem to be various options to tune it. For instance, the number of arenas or thread-specific caching (there are also some debug options).

mskalski · 2022-03-22T22:51:23Z

Yes, It's possible, that problem is due to wrong tuning of FreeBSD memory allocator.

But on the other hand - there is no special malloc configuration for charon daemon AFAIK and other applications on FreeBSD box are not affected, which are i.e. some running python daemons (which I believe do massive allocations and use multiple threads). What is specific in a way strongswan allocates memory that RSS process memory is increasing so much?

I made similar test test on FreeBSD 9.3 VM, which is the last release before incorporating jemalloc allocator (also downloaded VM image from freebsd archives) and results are the same. I had to enable --enable-kernel-libipsec option during strongswan compilation, because ipsec module is probably not compiled into 9.3 GENERIC kernel (specific net.inet.ipsec sysctls do not exist) and charon daemon could not start without working kernel-ipsec plugin, but test was similar, I used looped swanctl --stats command.

(Attached charon_mem.log)

I also changed definition of FreeBSD 13.1 VM to use only one cpu core, but this did not change anything.

Here is result of following command on 1CPU FreeBSD13.1-BETA2:

sh -c "MALLOC_CONF='stats_print:true,narenas:1' /usr/local/libexec/ipsec/charon 2>/var/log/charon-memdump-0.log"

Attached log of periodic monitoring of used memory charon_mem-1cpu-memstats.log and stats dumped by jemalloc library: charon-jemalloc-stats.log at charon daemon exit.

Unfortunately I don't know anything about debugging of jemalloc, so these numbers are rather cryptic for me :-)

tobiasbrunner · 2022-03-23T14:17:02Z

What is specific in a way strongswan allocates memory that RSS process memory is increasing so much?

No idea whatsoever.

I made similar test test on FreeBSD 9.3 VM, which is the last release before incorporating jemalloc allocator (also downloaded VM image from freebsd archives) and results are the same.

Unless the previous allocator behaved similarly, it might be something else then (maybe an issue with sockets).

Unfortunately I don't know anything about debugging of jemalloc, so these numbers are rather cryptic for me :-)

Don't really know either. Have you checked the jemalloc tuning guide, and the heap profiling and leak checking pages?

mskalski · 2022-03-23T17:42:27Z

Thank you very much for your help.

Yes, I've read these pages, unfortunately heap profiling and leak checking require profiling support which is not compiled into FreeBSD's libc by default. It will require recompile, which will be next step if I cannot find solution.

I filled bug report on FreeBSD bugzilla, maybe someone from FreeBSD team would notice something obvious I cannot see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262743
I will add comments here when I will know more.

I also remade again test with valgrind just to be sure, attaching log - no real leaks, but showed some warnings which can be interesting.

The same test, using swanctl --stats in a loop, (test last about 30 seconds, as valgrind requires much more memory):

root@freebsd13-1:~ # sh -c 'valgrind -s --leak-check=full --show-leak-kinds=all /usr/local/libexec/ipsec/charon 2> /var/log/charon-valgind-full.log'
00[DMN] Starting IKE charon daemon (strongSwan 5.9.5, FreeBSD 13.1-BETA2, amd64)
00[CFG] PKCS11 module '<name>' lacks library path
00[KNL] unable to set IPSEC_POLICY on socket: Protocol not available
00[NET] installing IKE bypass policy failed
00[KNL] unable to set IPSEC_POLICY on socket: Protocol not available
00[NET] installing IKE bypass policy failed
00[KNL] unable to set UDP_ENCAP: Invalid argument
00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed
00[KNL] unable to set IPSEC_POLICY on socket: Protocol not available
00[NET] installing IKE bypass policy failed
00[KNL] unable to set IPSEC_POLICY on socket: Protocol not available
00[NET] installing IKE bypass policy failed
00[KNL] unable to set UDP_ENCAP: Protocol not available
00[NET] enabling UDP decapsulation for IPv4 on port 4500 failed
00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts'
00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts'
00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts'
00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts'
00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls'
00[CFG] loading secrets from '/usr/local/etc/ipsec.secrets'
00[LIB] loaded plugins: charon pkcs11 aes des rc2 sha2 sha1 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem openssl fips-prf curve25519 xcbc cmac hmac ccm gcm drbg attr kernel-pfkey kernel-pfroute resolve socket-default stroke vici updown xauth-generic counters
00[DMN] removing pidfile '/var/run/charon.pid', process not running
00[JOB] spawning 16 worker threads
^C00[DMN] SIGINT received, shutting down

And valgrind's log: charon-valgrind-full.log

tobiasbrunner · 2022-03-24T17:25:25Z

While not fixing the issue, setting tcache:false in MALLOC_CONF seems to reduce the rate of the increase (to less than 20% of the default rate, at least in my quick tests). We constantly allocate memory in one thread and free it in another, which might be an issue for the allocator.

mskalski · 2022-08-08T16:15:37Z

Thank you again for all the help.

No change in this beaviour is observed when using different configure's --with-printf-hooks= -- according to issue in pfsense: https://redmine.pfsense.org/issues/5149 this could be the reason, but tests with --with-printf-hooks=builtin, --with-printf-hooks=glibc and --with-printf-hooks=vstr did not fix the error.

I was wrong - the --with-printf-hooks=builtin seems to resolve the problem. I must have missed that fact during previous tests.

I made long-lasting test (1.5 week) with more aggresive stressing and it looks that just few hours after start, memory footprint increases (not as much as for libc printf-hook), but when system needs more memory, unused one is released.

Note graph below - every day at 3:01 AM starts some cron job which checks the whole system, downloads some CVE reports, which is rather memory consuming task - on the graph you can see drops of used RSS memory at that time:

Attaching also script for tracking (sh-only based) and log of RSS memory usage for charon compiled with the --with-printf-hooks=builtin configure option.

memlogger.sh
charon_mem.log.gz

It looks like this option resolves my problem. Note, that strongswan FreeBSD package is by default compiled with this option, and when manual building from ports, this option can be changed (although it is also by default set to builtin).

Maybe this note is worth of adding to the FreeBSD installation documentation?
Or just maybe --with-printf-hooks=builtin configure option should be default for the FreeBSD OS?

I think this issue can be closed.

tobiasbrunner · 2022-08-18T08:04:28Z

Thanks for the update. I've added the option and a note to the docs.

mskalski added bug new labels Mar 22, 2022

tobiasbrunner removed the new label Mar 22, 2022

tobiasbrunner closed this as completed Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak on FreeBSD 12.1+ when communicating over `vici` socket. #966

Memory leak on FreeBSD 12.1+ when communicating over `vici` socket. #966

mskalski commented Mar 22, 2022 •

edited

tobiasbrunner commented Mar 22, 2022

mskalski commented Mar 22, 2022 •

edited

tobiasbrunner commented Mar 22, 2022

mskalski commented Mar 22, 2022 •

edited

tobiasbrunner commented Mar 23, 2022

mskalski commented Mar 23, 2022 •

edited

tobiasbrunner commented Mar 24, 2022

mskalski commented Aug 8, 2022

tobiasbrunner commented Aug 18, 2022

Memory leak on FreeBSD 12.1+ when communicating over vici socket. #966

Memory leak on FreeBSD 12.1+ when communicating over vici socket. #966

Comments

mskalski commented Mar 22, 2022 • edited

tobiasbrunner commented Mar 22, 2022

mskalski commented Mar 22, 2022 • edited

tobiasbrunner commented Mar 22, 2022

mskalski commented Mar 22, 2022 • edited

tobiasbrunner commented Mar 23, 2022

mskalski commented Mar 23, 2022 • edited

tobiasbrunner commented Mar 24, 2022

mskalski commented Aug 8, 2022

tobiasbrunner commented Aug 18, 2022

Memory leak on FreeBSD 12.1+ when communicating over `vici` socket. #966

Memory leak on FreeBSD 12.1+ when communicating over `vici` socket. #966

mskalski commented Mar 22, 2022 •

edited

mskalski commented Mar 22, 2022 •

edited

mskalski commented Mar 22, 2022 •

edited

mskalski commented Mar 23, 2022 •

edited