New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix echo cancellation bug + add command line option to dump AudioInput streams #4167
Conversation
Closed and reopened as the CI was failing due to network errors, but it appears every time there's at least one target that has network issues. Don't know what's wrong with translations. |
The issue is actually a different one and closing and reopening doesn't do anything (I think). Anyways I restarted the CI. |
Last time I asked, how multichannel echo cancellation works and what is it good for, nobody could explain it. Does anybody care about that (broken) feature or could it be removed? The OpusFAQ recommends the Google AEC from WebRTC. |
My branch is called webrtc because the original intention was to replace mumble's echo cancellation algorithm, which to me at least performed poorly, with webrtc's Google AEC. However, when trying to understand the code I found this bug, and after fixing it, it looks like libspeexdsp's echo canceller isn't bad at all, it was just not operating due to a precondition violation. For what concerns the multichannel echo canceller and the possibility to remove it, I see an UX/UI issue, and a code issue. The UX issue is that it is not clear at all to the end user the difference between mixed and multichannel. I only understood it after reading the C++ code... The code issue is that it looks like the multichannel pcm data is somehow corrupted. Since the echo readback is never heard by anyone and only passed to the echo canceller, it was difficult to spot this bug, until I added the |
thanks for the explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the actual review, we'll have to wait for @davidebeatrici as I don't really know the audio system...
Update: yesterday I started having a look at why the multichannel echo canceller receives garbage data. I found a small issue but the signal didn't improve at all. |
I think it'd be a good idea, yes. This facilitates the work of a potential future contributor that wants to look into this issue :) |
I think that to make it easy for future contributors to perform regression tests I would need to write somewhere how to use the |
You can create a new markdown document n |
Found it! here's the bug that was corrupting the pcm data for the multichannel echo cancellation! Foe every chunk the multichannel code path always started form the beginning of the buffer, thus not accumulating anything. This overwrote data and left the end of the buffer uninitialized too. No wonder I was hearing garbage when playing those data... Now multichannel echo works. |
Added documentation in the docs directory. Should help make sure issues in the echo canceller don't get reintroduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also document the different functions and member fields in the header files using Doxygen ///
comments, please?
I know that this is not done in the existing code but it's an ongoing quest of ours to improve the state of the in-source documentation :)
And if you're done with the PR, please squash the commits. I think here it'd be a good idea to make these 3 commits:
- Addition of the new cli options + documentation
- Fix of mixed channel echo cancellation
- Fix of multichannel echo cancellation
We also have a practice of prefixing our commit messages with the changed path, e.g.
src/mumble/AudioInput: <your message>
You can have a look at other commits in this repo to see what I mean by that as well :)
private: | ||
void printQueue(char who); | ||
|
||
//TODO: there was a mutex (qmEcho), but can the callbacks be called concurrently? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. @davidebeatrici might now better though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know they are never called concurrently.
Had to make four commits as the documentation covers also part of the other commits. |
Alright thank you! |
What's the best approach to verifying that the echo canceller works as expected? Can I even do this locally on a single machine? 🤔 |
Good question: I've pushed another commit: it only touches the docs and adds a step-by-step guide to reproducibly verify the correct operation of the echo canceller. I think you may find it useful also in the future to avoid regressions. By doing the test before and after the patches that fix the echo, you can appreciate the effect of the changes. |
Hm okay I tried it out using the built-in microphone of my laptop and its speakers but it doesn't seem to do all that much. I clearly see that something was done (especially noise reduction) but I still hear most parts of the YouTube video. Could someone else please also try this out and report back here? EDIT: The results with my normal headset are pretty much the same. Significant parts of from the YT video are still hearable... |
Hi! I'm amazed by your work and it also solved my problem that why echo cancellation isn't working for us. I'm not actually a mumble developer, but I have just read your code and have a little question. I have seen your finite state design that is used to keep the speaker audio chunk ahead of mic chunk. But then I read these lines: mumble/src/mumble/AudioInput.cpp Lines 77 to 81 in 0c07744
It seems that you have assumed speaker audio chunk comes later than the mic chunk, so you wait until mic chunk piles up and then return the "now" speaker chunk and "past" mic chunk, which is just the opposite of requiring speaker chunk comes earlier than mic chunk. I'm not sure what mistake I have made in understand your code and I'd appreciate for your reply! |
@davidebeatrici In the long run, it would be best to just wrap the buffers in an object that manages its lifetime, but such a change would be too invasive and is unrelated to fixing the echo issue. Moreover, if there are rumors about rewriting everything, would a separate PR that refactors the code even be useful or just a waste of time? About the mutex, I can remove it if you're sure it's not needed, but I can't test on any platform other than linux/pulseaudio. |
I think for now you should leave it until we have decided on what to do about the rewrite/refactor
Does it hurt to have it there? If not I think I'd play it safe and just leave it as it is 🤔 |
Leaving a potentially unnecessary mutex is a missed performance optimization. |
Yeah but if it turns out that there is some concurrency in there after all (well hidden in layers of cryptic code) we'll have a problem xD But as I don't know the audio code, I'll leave the decision about that up to you and @davidebeatrici :) |
I've removed it locally and will test it this evening on Linux. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing: I think the last commit should be documentation only. The code style fixes should be squashed into the other commits ☝️
After that though I'd say we can merge this :)
I don't know in git an automated way of doing that squash without going through line by line in the previous commits to change the parentheses and spaces, which given the scale of the changes requested is not a productive use of my time. Please consider setting up an automated code formatting tool in the future. Personally I'm used to a completely different code style and this part of the review process is becoming a barrier to future contributions. |
You can simply reset all commits and add the respective files and commit them in the chunks you want. That way you don't have to do line by line edits. |
…nput audio data and echo canceller queue state
Ok, |
Thank you very much for your contribution! Much appreciated :) |
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
…nput audio data and echo canceller queue state (Backported from mumble-voip#4167 and adapted to work with 1.3.x)
Dear mumble developers,
I think I found a bug in the echo canceller.
Basically, the way the current mumble code works is to put speaker readback samples in a queue (qlEchoFrames) in the addEcho() callback, and then in the addMic() callback take both the microphone and speaker samples and pass them to the echo canceller.
However,
https://www.speex.org/docs/manual/speex-manual/node7.html#SECTION00740000000000000000
explicitly says that "It is important that, at any time, any echo that is present in the input has already been sent to the echo canceller as echo_frame."
and adding a queue to the speaker samples makes them arrive after the microphone ones.
As a result, the echo canceller is only effective against periodic signals, but not for voice.
To verify that, the first commit of this pull request,
fedetft@b3aa5de
adds a command line option to tap and synchronously dump pcm streams of the raw microphone, speaker readback, and processed microphone.
The result is shown in this image: https://imgur.com/a/FpeB6Mp
The top figure shows the original mumble code with only the profiling patch applied.
As can be seen, the echo canceller receives the speaker readback with a high delay, I experienced up to 300ms, and is thus effective only against periodic sounds, not voice audio.
The bottom figure shows the effect of this pull request, a 20ms lead is forced by delaying the audio path, so that the echo canceller is reasonably certain to receive the data in the correct order even when callbacks are jittery.
The patch has been tested with the mixed echo cancellation on Linux with PulseAudio.
The patch fixes the issue but some more work needs to be done, in particular either the multichannel echo cancellation is broken and passes garbage data to the echo canceller, or I didn't understand how the PCM streams are passed. In any case, it does not seem to cancel echo, so there appears to be an issue, although I haven't looked it up yet.
Moreover, I could not understand if the addMic() and addEcho() callbacks can be called concurrently by multiple threads, and thus if a mutex is requred in the Resynchronizer class I added.