Merge pull request #4167: Fix echo cancellation bug + add command lin…

…e option to dump AudioInput streams Basically, the way the current mumble code works is to put speaker readback samples in a queue (qlEchoFrames) in the addEcho() callback, and then in the addMic() callback take both the microphone and speaker samples and pass them to the echo canceller. However, https://www.speex.org/docs/manual/speex-manual/node7.html#SECTION00740000000000000000 explicitly says that "It is important that, at any time, any echo that is present in the input has already been sent to the echo canceller as echo_frame." and adding a queue to the speaker samples makes them arrive after the microphone ones. As a result, the echo canceller is only effective against periodic signals, but not for voice. To verify that, the first commit of this pull request, fedetft/mumble@b3aa5de adds a command line option to tap and synchronously dump pcm streams of the raw microphone, speaker readback, and processed microphone. The result is shown in this image: https://imgur.com/a/FpeB6Mp The top figure shows the original mumble code with only the profiling patch applied. As can be seen, the echo canceller receives the speaker readback with a high delay, I experienced up to 300ms, and is thus effective only against periodic sounds, not voice audio. The bottom figure shows the effect of this pull request, a 20ms lead is forced by delaying the audio path, so that the echo canceller is reasonably certain to receive the data in the correct order even when callbacks are jittery. The patch has been tested with the mixed echo cancellation on Linux with PulseAudio. The patch fixes the issue but some more work needs to be done, in particular either the multichannel echo cancellation is broken and passes garbage data to the echo canceller, or I didn't understand how the PCM streams are passed. In any case, it does not seem to cancel echo, so there appears to be an issue, although I haven't looked it up yet.
mumble-voip · May 26, 2020 · 12ce17e · 12ce17e
2 parents 249adfd + 664437c
commit 12ce17e
Show file tree

Hide file tree

Showing 10 changed files with 383 additions and 87 deletions.
diff --git a/docs/AudioInputDebug.md b/docs/AudioInputDebug.md
@@ -0,0 +1,110 @@
+# Debugging AudioInput
+
+Mumble does quite a bit of signal processing on the raw microphone input, so if something breaks it may not be immediately apparent _where_ it breaks.
+
+For this reason, the `--dump-input-streams` option was added, to help tap into various parts of the DSP chain, and find where the issue is. Consider it a bit like the digital equivalent of probing with an oscilloscope the signal path of an analog audio gear.
+
+As the option was introduced to debug the echo canceller, the default tap points are at the input and output of that algorithm, but if you are going to debug some C++ code, you should not have problems moving a `write()` to an `ofstream` here and there should you need to, right?
+
+## How to use `--dump-input-streams`
+
+You'll need to run Mumble from the command line, and the directory from where you run it will be where the dumped files will be written.
+
+```
+$ ./mumble --dump-input-streams
+```
+
+Then log into a server as usual, and start using Mumble. It's usually good enough to just run it for 10/20 seconds and then quit. Unless your bug happens only after some time or occurs at random, there's no need to accumulate gigabytes of dumped audio. It's also best to make reproducible tests, like playing the same video or speaking the same phrase, so as to compare results.
+
+After closing Mumble, there should be 3 new files in the directory you launched it from:
+
+* `raw_microphone_dump`
+* `speaker_dump`
+* `processed_microphone_dump`
+
+Please note that if you run Mumble again, those files will be overwritten. Also, those files are overwritten whenever the `AudioInput` class is reinstantiated, such as when going though the audio wizard. If you find it difficult to get the data you want, such as because closing the audio wizard clears your files, terminate Mumble with Ctrl-C at any moment and the files won't be erased.
+
+### Opening the files
+
+These files contain the raw PCM streams that have been sampled. No header, no file format; nothing. Just data.
+This makes the dumping code as simple as possible, and you also don't have to change the header every time you tap a point with a different sample rate or encoding, as there's no header.
+
+To open the raw files, you can use Audacity. Select `File > Import > Raw Data`.
+
+Since there's no metadata, Audacity will ask you what's in those files:
+
+* Encoding is `Signed 16 bit PCM` in the default tap point (i.e. you haven't modified `write()`). Mumble's signal path is partly 16 bit and partly float, so remember to select `32 bit float` if you move the tap points to some float part of the Mumble audio path.
+* Byte order is `Little-endian` if you're on an x86 CPU, which you most likely are.
+* Channels is always `1` for the microphone signal path, but may be more for the speaker readback if you use multichannel echo cancellation.
+* Sample rate is `48000` for the default tap point, as Mumble's audio chain resamples everything to 48KHz regardless of what your audio card is configured to. Change accordingly when tapping before the resampler.
+
+In Audacity you can open multiple tracks and mute them individually, so it's usually a good idea to open all three tracks to compare.
+
+## Debugging the echo canceller
+
+The audio dumps have an additional property that is fundamental for debugging the echo canceller: the're synchronous. If you open them all in Audacity, you'll be able not only to see what gets passed to the echo canceller, but the relative time between the signals.
+
+This is fundamental for an echo canceller, which can break simply because the microphone data arrives before the speaker one (how can the echo canceller predict an echo from the future?), or if the speaker data is so ahead that exceeds its limited filter length.
+
+### The `--print-echocancel-queue` option
+
+Now that I've mentioned the requirement for the echo canceller to have well aligned inputs, maybe it's best to introduce the `--print-echocancel-queue` option. When running Mumble with this option, the current state of the queue in the Resynchronizer class is used to align the microphone and speaker readback streams is printed on the command line. Moreover, if packets are dropped (which is necessary to keep the signals aligned if the OS/pulseaudio/audio card is playing tricks to us), those will be printed as well.
+
+### The Resynchronizer class
+
+Documentation on the Resynchronizer class is put as a comment in the `AudioInput.h` file, but it doesn't hurt to repeat it here, also because the statemachine design doesn't fit in a C++ comment as it's an image.
+
+According to https://www.speex.org/docs/manual/speex-manual/node7.html
+"It is important that, at any time, any echo that is present in the input
+has already been sent to the echo canceller as echo_frame."
+Thus, we artificially introduce a small lag in the microphone by means of
+a queue, so as to be sure the speaker data always precedes the microphone.
+
+There are conflicting requirements for the queue:
+
+* it has to be small enough not to cause a noticeable lag in the voice
+* it has to be large enough not to force us to drop packets frequently
+  when the addMic() and addEcho() callbacks are called in a jittery way
+* its fill level must be controlled so it does not operate towards zero
+  elements size, as this would not provide the lag required for the
+  echo canceller to work properly.
+
+The current implementation uses a 5 elements queue, with a control
+statemachine that introduces packet drops to control the fill level
+to at least 2 (plus or minus one) and less than 4 elements.
+With a 10ms chunk, this queue should introduce a ~20ms lag to the voice.
+
+![](AudioInputDebugFiles/fsm.png)
+
+Here _m_ means a microphone chunk was received, _s_ a speaker chunk was received, and the number in the state is the queue fill level. The design tries to keep the limit cycle of the queue add/remove pattern between 1 and 4 elements, preventing the queue to operate in a limit cycle between 0 and 1 elements (queue too empty, the speaker data may risk arriving after the microphone) and in a limit cycle between 4 and 5 elements (too full, we're wasting some precious filter length to cancel real echo just because some delay accumulated).
+
+### A reproducible test for verifying the correct operation of the echo canceller
+
+To avoid regressions being introduced in the echo cancellation feature, it is beneficial to have a controlled test that can be easily reproduced to test whether the echo canceller works.
+
+You will need:
+
+* Low quality headphones that cause echo. The in-ear type that's used with smartphones and has a combined microphone/headphones jack works best. If you don't have them or your PC lacks a combined microphone/headphones jack, do the test with your speakers, but keep the volume relatively quiet. Some echo is unavoidable at high volume levels, especially if it makes the microphone clip.
+* A quiet Mumble server to connect to. Just join an empty room with no other users.
+
+Here's the step by step guide:
+
+1. Make sure Mumble echo cancellation is enabled. You may also need to repeat this test twice with mixed and multichannel echo cancellation.
+1. Run Mumble with the `--dump-input-streams` option
+2. Join the quiet server
+3. Play the first 15 or so seconds of a YouTube video that contains both a relatively periodic note and voice, such as this one: https://www.youtube.com/watch?v=im9z8NT96Iw
+4. Say a phrase, such as "Testing 1 2 3"
+5. Close Mumble
+6. Open the three dumped streams in Audacity. Don't forget to select the correct number of channels for the `speaker_dump` when testing multichannel echo cancellation
+7. Play the raw microphone stream, you should hear the echo of the YouTube video clearly above the noise, and it should be less loud than you saying "Testing 1 2 3". If you hear no echo, increase you headphones volume, switch to worse headphones or use your speakers and repeat. If the echo is as loud as or louder than you speaking, reduce your audio volume and repeat.
+8. Now listen to the processed microphone stream: the echo should be almost gone, both the note and the voice coming from YouTube, while your voice should remain. It is acceptable that after a silence gap, the first part of the echo can reappear, but it should quickly be cancelled. If not, there's a bug in Mumble.
+9. Play the speaker dump. It should sound as well as the YouTube video itself. If not, there's a bug in Mumble.
+10. As a final check, take a transition from silence to noise as a reference and zoom in in Audacity: the speaker dump should precede the microphone dump by 20ms or so (0 to 50ms is acceptable). If not, there's a bug in Mumble.
+
+Example of an echo canceller bug: the speaker data lags compared to the microphone one. As a result, only the note is cancelled, but voice is not.
+
+![](AudioInputDebugFiles/bug.png)
+
+Exampe of the a working echo canceller.
+
+![](AudioInputDebugFiles/fix.png)
diff --git a/docs/AudioInputDebugFiles/bug.png b/docs/AudioInputDebugFiles/bug.png
diff --git a/docs/AudioInputDebugFiles/fix.png b/docs/AudioInputDebugFiles/fix.png
diff --git a/docs/AudioInputDebugFiles/fsm.png b/docs/AudioInputDebugFiles/fsm.png