Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak on Linux/Ubuntu #1074

Open
tatokis opened this issue Nov 21, 2013 · 13 comments

Comments

Projects
None yet
5 participants
@tatokis
Copy link
Contributor

commented Nov 21, 2013

I have noticed a memory leak on mumble for Ubuntu (Both stable and snapshot PPAs).
There is a similar issue (#959) and I suspect that I have figured out more information about this leak.
It appears that the when the Echo option in the configuration menu is set to multichannel, it causes mumble to start using a lot of ram.
I ran it for 1:30 hours with both echo set to disabled and multichannel.
It appears that it started using almost 1.1GB of RAM when it was set to multichannel, but with echo disabled it didn't use more than 30MB.
In both cases mumble was sitting in the tray idling, connected to a server and muted.
If left running for more time it will consume even more RAM (it reached almost 3GB one time)
Running Ubuntu 13.10 amd64 and current mumble version is 1.2.4-1~ppa2~saucy1 with 8GB of ram.
Here is how much RAM mumble was using with echo enabled the time it was started
mumble1
The setup on Pulseaudio with Echo set to multichannel
mumble2
RAM usage when mumble was started
mumble3
Mumble running for 1:30 hours and its ram usage
mumble4
Configuration, with echo set to disabled now for the next test
mumble5
Here's how much RAM mumble used when it started with echo disabled
mumble6
and here is how much ram it used after idling for 1:30 hours
mumble7

@Kissaki Kissaki added the mumble label May 14, 2015

@jstrader

This comment has been minimized.

Copy link

commented Mar 30, 2016

Ran into this issue yesterday and ran it through Valgrind Massif today. The allocation for this memory leak is:
AudioInput::addEcho(void const*, unsigned int) (AudioInput.cpp:478)
short *outbuff = new short[iEchoFrameSize];

It appears to expose itself when echo correction is enabled and the speaker output rate is greater than the microphone input rate. This seems to be most common when Mumble is operating at the same time an application that implements a low-level "sink-input" in the parlance of PulseAudio, causing two independent callback streams to essentially double the data being passed to echo correction without deallocating the memory. Many games, remote desktop, and other forms of cross-platform software appear to implement these lower level hooks; in my case it is the game Cities: Skylines implementing an ALSA plug-in that appears to launch its own PulseAudio threads, though I am not familiar enough with these systems to know if this is the exact issue.

I am commenting on this bug, because I am 100% certain it's the same issue; however, this could potentially be the same bug described in many different memory and "unresponsive" issues for this repository. I will avoid referencing all of the issues in case they are not the same bug, but a quick search of open issues with "memory" will return them. The code shows the potential for the issue to be cross-platform, though I am unsure if Windows implements callbacks in the same manner to cause this. Attached is a text file with the relevant output from Massif, used with a build of Mumble tag "1.2.12" (same version as in the Ubuntu Wily PPA) that has debug symbols active. If you want more information, let me know and I'll try to get back to it as quickly as I can.

mumble leak ms_print.txt

@mkrautz mkrautz added this to the 1.3.0 milestone Apr 13, 2016

@mkrautz mkrautz self-assigned this Apr 13, 2016

@mkrautz

This comment has been minimized.

Copy link
Member

commented May 6, 2016

Thank you very much for the detailed analysis @jstrader.

It's been a while since you reported this, so I'd like to give you a status update. (For you, and for any onlookers perhaps interested in helping to fix the issue...)

I tried a bit back to follow along with the code, and concluded I needed to actually reproduce the PulseAudio setup myself before I could get further.

Also, I found that the code at
https://github.com/mumble-voip/mumble/blob/master/src/mumble/AudioInput.cpp#L399-L423
was behaving weirdly. In particular, the logic related to iJitterSeq and iMinBuffered seems like it was not complete when it was committed.

I haven't yet had the chance to set up a test environment where I can investigate the memory leak myself, though!

Thanks again.

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 25, 2017

Finally found some time to get back to this.

I can reproduce a memory leak when I do this:

while [ 1 ]; do dd if=/dev/zero of=/dev/stdout count=1 bs=64 | aplay --rate=8000 -; done

This is using PulseAudio with AudioInput rate=44100, and AudioOutput rate=48000 as @jstrader describes.

@jstrader

This comment has been minimized.

Copy link

commented Jan 27, 2017

Sorry, the original response to my comment fell through the cracks of my email somehow. I dug up the notes and builds from last March. I had determined at the time that the risk of introducing a regression would be too high to warrant developing a fix myself, because the logic is too deep and the refactor to fix this issue is likely too complex to have someone unfamiliar with the code just pick it up.

I can confirm that the reproduction from @mkrautz above is the same leak that was exposed when I submitted my comment. I'm not certain that this issue is a simple allocation leak though.

Out of curiosity, I threw a couple qWarning calls in to dump the amount of data coming from the PulseAudioSystem::read_callback calls to see the ratio of addMic/addEcho buffers, expecting to see about twice as much data when reproducing the bug. Without the aplay command, it was 867,328/803,422 bytes or almost exactly the difference for the 48,000/44,100 I'm currently running; with aplay, it was 50,656/1,795,600 bytes or about 38 times larger echo buffer than what it should be. Both tests were across about 3,500 calls, to give an idea of just how much more that callback is firing with aplay.

This may either be a bug in PulseAudio sending incorrect events or a flag somewhere in their "thorough" API that is not being passed. If it is not a PulseAudio issue, then I honestly don't know that there is a fix beyond implementing a ring buffer and having it just quietly disable echo correction when it overruns, which doesn't really feel like a bug fix so much as mitigation against the leak.

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 27, 2017

Thanks for the update, @jstrader. FWIW, it's also possible to reproduce the issue with Pulse-native tools like 'pacat'. So the problem is seemingly not related to Pulse's ALSA plugin after all.

I'm trying to reproduce on Ubuntu Precise as we speak, to see if PulseAudio 1.1 also has the "bug".

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 27, 2017

OK, it happens there as well. So I don't think we can classify it as a regression!

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2017

FWIW, I can reproduce the same behavior without Mumble, by just recording the PulseAudio monitor stream:

ctr is the following program:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <stdint.h>
#include <time.h>

static uint64_t now() {
	struct timespec ts;
	int err = clock_gettime(CLOCK_MONOTONIC, &ts);
	if (err == -1) {
		exit(2);
	}
	
	uint64_t out = 0;
	out += (ts.tv_sec * 1000000);
	out += (ts.tv_nsec / 1000);
	return out;
}

int main(int argc, char *argv) {
	uint64_t timestamp = now();
	uint64_t bytes = 0;

	while (1) {
		char buf[4096];

		int nread = read(0, &buf, sizeof(buf));
		if (nread == -1) {
			fprintf(stderr, "err = %i: %s\n", errno, strerror(errno));
			exit(1);
		}

		bytes += nread;

		uint64_t nao = now();
		uint64_t elapsed = nao - timestamp;
		if (elapsed >= 1000000) {
			fprintf(stderr, "wrote %lu shorts in 1 sec\n", bytes/2);
			bytes = 0;
			timestamp = nao;
		}
	}
}

I run:

mkrautz@ubuntu:~$ parec --format=s16le --rate=48000 --channels=1 --device="alsa_output.pci-0000_02_02.0.analog-stereo.monitor" | ./ctr 
wrote 45124 shorts in 1 sec
wrote 48424 shorts in 1 sec
wrote 48645 shorts in 1 sec
wrote 48545 shorts in 1 sec
wrote 48554 shorts in 1 sec
# I run the loop that triggers the error.
wrote 92245 shorts in 1 sec
wrote 153388 shorts in 1 sec
wrote 170318 shorts in 1 sec
wrote 151427 shorts in 1 sec
wrote 171301 shorts in 1 sec
wrote 152405 shorts in 1 sec
wrote 126881 shorts in 1 sec
# I've ended running the loop
wrote 48681 shorts in 1 sec
wrote 48354 shorts in 1 sec
wrote 47386 shorts in 1 sec
^C

The loop being the following snippet from earlier:

while [ 1 ]; do dd if=/dev/zero of=/dev/stdout count=1 bs=64 | aplay --rate=8000 -; done

... I'm quite baffled by this behavior from the monitor device.

@tatokis

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2017

Also reproducible here.
Here's a snippet of pulseaudio's debug info while the aplay loop is running.
https://tasossah.com/mumblepulselog.txt

I think this might not be a mumble issue after all.

Additionally, from what I can see when launching parec, PA switches the source output to 48KHz and then passes audio straight to parec without any resampling.

Relevant log here as well
https://tasossah.com/mumblepulselog_parec.txt

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 28, 2017

Hmm, I wonder if our implementation is too simplistic. (But I guess parec is, too, then?)...
Perhaps we need to listen to more events to be able to catch underruns, rewinds, holes?

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 29, 2017

02:28:55 <@mkrautz> Whooa21: hmmm... not reproducible in general on Ubuntu 9.04
02:29:17 <@mkrautz> Whooa21: there are some cases where the monitor source generates more data than expected when parec'ing, but it's not excessive...
02:29:57 <@mkrautz> I just had to try an older PulseAudio... I was pretty certain the bug wasn't there when slicer wrote that code... :)
New messages
02:30:15 <@mkrautz> this is pulseaudio 0.9.14
@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 29, 2017

I asked on #pulseaudio (freenode):

02:46:20 <mkrautz> hi, I'm experiencing a problem where I can trigger a monitor source to generate more samples than expected (> sample rate) when an underrun occurs in the corresponding sink... this is mostly a question if I'm missing something obvious. Short write-up at https://gist.github.com/mkrautz/d0994d82eb900c336d64ac971fc1847e

Got a response:

05:04:15 <tanuk> mkrautz: You're not missing anything.
05:04:20 <tanuk> mkrautz: https://lists.freedesktop.org/archives/pulseaudio-discuss/2017-January/027412.html

@mkrautz

This comment has been minimized.

Copy link
Member

commented Jan 29, 2017

@tatokis

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2017

@mkrautz mkrautz removed this from the 1.3.0 milestone Jan 29, 2017

mkrautz added a commit to mkrautz/mumble that referenced this issue Jan 29, 2017

AudioInput: limit echo canceller queue to 1 second of audio.
This sets an upper bound on the echo canceller's queue to avoid
situations like mumble-voip#1074 where a bug in PulseAudio
causes our echo queue to grow unbounded.

This fixes the memory leak aspect of mumble-voip/mumble-voip#1074, but echo
cancellation will be flakey in such setups.

We don't know if there is a workaround for the PulseAudio bug, so
let us do this for now.

mkrautz added a commit to mkrautz/mumble that referenced this issue Jan 29, 2017

AudioInput: limit echo canceller queue to 1 second of audio.
This sets an upper bound on the echo canceller's queue to avoid
situations like mumble-voip#1074 where a bug in PulseAudio
causes our echo queue to grow unbounded.

This fixes the memory leak aspect of mumble-voip/mumble-voip#1074, but echo
cancellation will be flakey in such setups.

We don't know if there is a workaround for the PulseAudio bug, so
let us do this for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.