-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Stereo to Mono Convertions #877
Comments
Attached a small Implementation of Mine |
Thanks for the suggestion. I think this problem is underspecified. I am not sure if there is one and only one way to convert stereo to monaural. Taking average is the simplest way, but adjusting the power level of each signal before mixing them is also popular approach. So I imagine there will be a wide variety of opinion what "stereo to monaural conversion" should do. Adding this to the library will increase the maintenance cost, but I do not see much value there especially when it is so easy for users to write one. Users can just pick their favorite conversion algorithm write them with only a few lines of code. |
@mthrok Ok, If I can help in anything please let me know |
I am supporting this feature request. Although the conversion from stereo to mono can be done in just a few lines of code and or using other libraries, having the same pattern would make the overall audio processing workflow easier and more consistent (e.g., the type and format of output [tensor vs. array, int64 vs float32], methods of converting [mean or first channel or second channel], etc). |
By the popular demand, we are reconsidering this.
I am not sure what kind of mixing methods exist, but I guess we can simply start from taking average waveform across channel dimension. CLI tools like https://www.nesono.com/node/275 |
Thank you Moto, I support this feature request as I think that, even though it can be done in a single line of pytorch code, it is a common operation that is A. unintuitive for beginners and B. often necessary to begin training a model. I would also support documentation clearly explaining that conversion to mono can be achieved by taking the average over the channel dim, and giving a clear example in code. If you go the documentation only route I would suggest making sure it is easy to find via searching google and/or the documentation. Thank for reraising this. |
Hey there, it looks like this issue is still open. Can I work on this? |
@jjmmchema Thanks. Go ahead. torchaudio transforms take waveform in channel-first format, so make sure to support the shape of |
@mthrok just a quick question, why is it that the supported shape should be |
@jjmmchema My intent was to emphasize that channel should not be the last, however, come to think of it, since this transform is about channel, it might be better to be explicit about where the channel dimension should be, and also flexible. The most typical and common shape is There are cases where we use So I think simply making the channel dimension configurable and expecting the shape to be |
@mthrok Yeah, it makes sense that the channel dimension should be defaulted to Expecting that the input has the shape of |
@mthrok @jjmmchema Just a general remark for such a feature:
So as it's proposed in this issue and implemented in #3242 I strongly propose not to add such a feature to torchaudio and instead let the user decide how to downmix on their own to make the aware that this isn't a trivial operation. |
@faroit Thanks for your response. I see that this isn't as simple as it seemed at first. Maybe instead of implementing the downmixing with just a simple mean transformation a new issue can be opened to look for and discuss about different downmixing techniques? Or maybe just create a documentation section that talks about the possibility of downmixing using mean but warns about the things that you mentioned? Let me know about the decision so that I don't continue with the mean implementation shown in #3242 or I finish writing the tests for the PR. |
@faroit Thanks for the feedback. I do agree with your points, and I am grad that they are laid out nicely.
As of a way to implement phase correction, we could implement it on PyTorch, or we can delegate it to FFmpeg. (Also in the nightly build I added a feature to delegate channel manipulation to FFmpeg in StreamReader, so this is somewhat doable loading audio from file.) |
So I tested AudioEffector and what FFmpeg documentations says, and was able to bring in the phase. scriptimport torch
from torchaudio.io import AudioEffector
sample_rate = 8000
phase = torch.linspace(0, 2 * torch.pi * 3000, sample_rate, dtype=torch.float32)
left = torch.sin(phase)
right = -left
waveform = torch.stack((left, right), dim=-1)
print(waveform.shape)
mean = torch.mean(waveform, -1)
assert mean.abs().sum().item() == 0.0
effector = AudioEffector(
effect=(
"asplit[a],"
"aphasemeter=video=0,"
"ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,"
"pan=1c|c0=c0,"
"aresample=async=1:first_pts=0,"
"[a]amix")
)
applied = effector.apply(waveform, sample_rate=sample_rate)
mean2 = torch.mean(applied, -1)
import matplotlib.pyplot as plt
f, axes = plt.subplots(4, 2)
axes[0][0].set_ylabel("Original - Channel 1")
axes[0][0].plot(waveform[:500, 0])
axes[0][1].specgram(waveform[:, 0], Fs=sample_rate)
axes[1][0].set_ylabel("Original - Channel 2")
axes[1][0].plot(waveform[:500, 1])
axes[1][1].specgram(waveform[:, 1], Fs=sample_rate)
axes[2][0].set_ylabel("Just mean")
axes[2][0].plot(mean[:500])
axes[2][1].specgram(mean, Fs=sample_rate)
axes[3][0].set_ylabel("Phase-in then mean")
axes[3][0].plot(mean2[:500])
axes[3][1].specgram(mean2, Fs=sample_rate)
plt.show() To bring in the phase, we just need to add two lines before taking mean. effector = torchaudio.io.AudioEffector(
effect=(
"asplit[a],"
"aphasemeter=video=0,"
"ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,"
"pan=1c|c0=c0,"
"aresample=async=1:first_pts=0,"
"[a]amix")
)
applied = effector.apply(waveform, sample_rate=sample_rate) @jjmmchema Can you apply this in the PR? |
@mthrok Done. Updated the PR with the phase correction. I still need to write the tests. It's the first time I'm writing actual tests so I'm having a bit of a hard time understanding how to do it properly according to the PyTorch guidelines. If you can give me any advice or guidance with this it would be really appreciated. Also, if you have the time, I'd be grateful if you could explain what do you see in the last graph and spectrogram that allows you to say that the phase correction was properly applied. |
The script generates 2 channel waveform with exact opposite sign.
Tests should be small and concise. For example, take a look at the audio/test/torchaudio_unittest/transforms/transforms_test_impl.py Lines 216 to 225 in d5b2996
You can do something similar, but from here it depends on the functionality.
There are other things like, tensor of any float dtype should work etc... |
🚀 Feature
Make mono to stereo or stereo to mono conversion
Motivation
You Guys have made an amazing job, but stereo to mono and vice versa is simple, it seems that you missed it,
I think it might be done with simple mean in the channel dim
Pitch
This should be a Simple Transform like ToMono(channel_first=True) which will be in Torchaudio.Transforms
The text was updated successfully, but these errors were encountered: