Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak (TCP_Mem Leak / Internet Slow Down) #164

Open
Vorror opened this issue May 28, 2023 · 14 comments
Open

Memory Leak (TCP_Mem Leak / Internet Slow Down) #164

Vorror opened this issue May 28, 2023 · 14 comments

Comments

@Vorror
Copy link

Vorror commented May 28, 2023

Bear with me here since narrowing down what was happening took two weeks. Anyway, so some background:

I'm using bonob in a docker container on the following system(this system is my media server that's only used to run docker containers):

System:

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.1 LTS
Release:	22.04
Codename:	jammy

So, I noticed my internet connection on my Ubuntu media server would become really slow, from 1 Gbs to ~20Mbs. I use an app called Speedtest Tracker to monitor this.

Screenshot from Speedtest Tracker:

CleanShot 2023-05-28 at 10 49 41@2x

I've been noticing the above behavior for weeks. My media server's download/internet speed would drop for the whole system and stay low; only a restart would fix it.

Cause

I figured out that the issue was bonob after killing apps one by one and waiting for the issue to reoccur; I figured out what was happening. There seems to be some TCP memory leak with bonob.

When bonob is first launched, my system looks like this(image from docker stats):

CleanShot 2023-05-28 at 10 59 16@2x

and my TCP Memory looks like this:

$ cat /proc/net/sockstat

sockets: used 1190
TCP: inuse 31 orphan 0 tw 0 alloc 188 mem 39
UDP: inuse 17 mem 44
UDPLITE: inuse 0
RAW: inuse 1
FRAG: inuse 0 memory 0

$ echo $(( 39 * 4096  / 2**20 ))MB
0MB

After listening to music for a few hours, my stats become this:

CleanShot 2023-05-28 at 11 06 54@2x

$ cat /proc/net/sockstat 

sockets: used 1189
TCP: inuse 30 orphan 0 tw 0 alloc 319 mem 117948
UDP: inuse 17 mem 45
UDPLITE: inuse 0
RAW: inuse 1
FRAG: inuse 0 memory 0

$ echo $( 117948 * 4096 / 2**20 ))MB
460MB

And inside my /var/log/syslog I'm seeing this message repeated:

TCP: out of memory -- consider tuning tcp_mem

bonob is using 460MB of TCP Memory! This causes the network speed of my server to drop until a) Killing the bonob container or b) restarting the computer.

Killing bonob's container immediately restores my TCP memory and fixes my speed issue.

@Vorror
Copy link
Author

Vorror commented May 28, 2023

For some more info,

In addition to using Speed Test Tracker, I also confirmed my internet speed using Speedtest's cli tool and can confirm the internet speed reduction. The internet speed could sometimes stay permanently below 9 Mbits/s!

In fact, the only reason I noticed this issue was occurring was that when trying to play Navidrome music using bonob on my Sonos speakers, there would be "buffering"(delay before playing and frequent pausing mid-song). My music server, Navidrome, and Sonos speaker are all on the same Lan, so there should never be buffering. So, the issue affects all TCP connections on the server rather than just ones to the internet.

The issue takes several hours to appear, and Ubuntu will never reclaim either the system memory being used or the TCP memory, even after not playing music for several hours/days. You must either stop/restart the bonob container or restart your computer.

@simojenki
Copy link
Owner

Thank you for the great analysis, that's an interesting issue.

How much memory do you give the bonob container?

With your listening pattern, are you simply listening to an album/playlist track by track or are you frequently changing tracks manually?

Unfortunately I cannot test this right now as my system is in a box whilst I move house.

@Vorror
Copy link
Author

Vorror commented May 30, 2023

@simojenki I use Portainer to managed my docker containers. I don't explicitly set a maximum size.

I have about 3000 songs in a queue and rarely manually change tracks. It simply plays for hours until I get bored.

@dbuttric
Copy link

dbuttric commented Oct 3, 2023

I don't mean to be adding fuel to the fire here, but I can confirm this.
I have a playlist with 5400 tracks. I left it playing overnight, and when I woke up in the AM, the TCP memory use was at 150MB.

Here is my setup:
Debian 12 bookworm
16GB RAM
Intel(R) Core(TM) i5-4570
Running Navidrome and Bonob using dockerr-compose.

Then I started doing my rating-my-songs behavior.

I left the playlist running, and just started rating the songs that are playing. Once I rated them, I would fast forward to the next song. After doing this for a few minutes, the TCP memory use is now 500MB.

At this time, the docker container is using 350MB, and TCP is using 500MB.
So the rating and fast forwarding may cause memory to balloon. I have not done any more extensive testing here.

So that I've said it, I have created a playlist with about 3 hours of music in it. I start this in the morning and then start it again around lunch time. This seems to keep things behaving.

One thing I'd love to do is be able to attach a debugger to the containerized app, to see if I can identify where this problem is happening. But I've never debugged a container before, much less an app like this. So I'd need some help doing that.

@simojenki
Copy link
Owner

Interesting...

Why do you say it is TCP memory usage? What command did you run to see that?

Are you transcoding your tracks? One thing I have noticed is that sometimes a transcoding (ffmpeg) process in the ND container will be still running well after the track has completed, I think some processes are being left behind, perhaps on skip track?

@dbuttric
Copy link

dbuttric commented Oct 8, 2023

No, I'm not transcoding anything.
In the OP, @Vorror mentions the use of the /proc/net/sockstat to analyze TCP stack kernel memory use.
I've been watching the value that comes back from that, I have a oneliner:
awk '/TCP/ {print $NF}' /proc/net/sockstat
Evaluate that and do the math:
(sockstat reports memory in kernel pages.)
echo $(($(awk '/TCP/ {print $NF}' /proc/net/sockstat)* 4096 / 1024 ))"KBytes"
and you get back a value of the kbytes used by TCP to keep sockets working.
At this point, I'm not sure whether this is a bonob or a navidrome issue. I'm trying to do some packet capturing. I'm completely clueless about monitoring networking in docker containers.
Thoughts:
Perhaps the docker-compose.yaml that I'm suing needs to make reference to docker internal IP's of things, instead of external? Is it possible that there's a leak in the networking stack in docker that causes streams to consume more memory if you use external references?, if for example I set the BNB_SUBSONIC_URL to the internal IP of the navidrome server, instead of its external hostname?
I'm just throwing darts at this point, but anything that will help us eliminate possible causes can help use identify the actual cause here.
David

@simojenki
Copy link
Owner

What happens if you restart one of either bonob or navidrome, does the memory get released? That may help determine which container is leaking.

re: docker-compose.yml, you should be able to just refer to the name of the container, rather than either internel or external IP.

Given the wide spread use of docker (and friends) I'd be surprised if the issue is caused by docker/Linux kernel, I'm guessing it's going to be an app issue somewhere.

@dbuttric
Copy link

dbuttric commented Oct 9, 2023

I'll do some more testing this evening.
When I restart the continers, the tcp memory is released.
Tonight I will experiment with recreating the high memory condition, and I'll restart one each time, to see if I can discover which service is holding the memory.

Thanks!
David

@dbuttric
Copy link

dbuttric commented Oct 9, 2023

Here's what I can see:
dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem
80784KBytes
dbuttric@jack2:/var/local/docker/navidrome$ docker restart 29b00e1ac55f_navidrome_navidrome_1
29b00e1ac55f_navidrome_navidrome_1
dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem
44996KBytes

It looks like if I restart navidrome, all the memory does not get cleared, only about half.

Where as if restart bonob, all the memory gets cleared out...
dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem
122340KBytes
dbuttric@jack2:/var/local/docker/navidrome$ docker restart navidrome_bonob_1
navidrome_bonob_1
dbuttric@jack2:/var/local/docker/navidrome$ /usr/local/sbin/gettcpmem
1392KBytes

Thoughts?
Anything else you need me to do to help get to the cause of this, I'm here and available.
Thanks alot!
David

@simojenki
Copy link
Owner

I guess for some reason bonob is keeping open a connection to navidrome, perhaps because it is unaware that the sonos device has moved onto the next track.

@simojenki
Copy link
Owner

I've tidied up the way streams between bonob -> navidrome are handled, could you please try again with the 'master' build/release of bonob and see what behaviour you get?

@dbuttric
Copy link

dbuttric commented Oct 9, 2023

This worked!
I'm now seeing TCP memory utilization that looks like the other applications on my media server.
Thank you!

@dbuttric
Copy link

dbuttric commented Oct 9, 2023

I'm running different playlists to two separate speakers.
So far, memory utilization is great!
I see some times where memory use goes from 156K to 0K 10 seconds later.
Definitely better behavior than before.
I will let this run for a few more hours and report back.

@dbuttric
Copy link

dbuttric commented Oct 9, 2023

I've been at this for a few hours now.
I'm not seeing any of the same behaviors.

I think this is fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants