About a VoxCeleb2 helper script #1647
Replies: 12 comments
-
Hi, this is rather annoying. The main issue is that torchaudio cannot read
m4a and thus users have to do the conversion to a readable format (e.g,
wav). This is discussed also in the torchaudio github (
pytorch/audio#104). On our side, we point in the
README file to some bash scripts that can be used to do the conversion.
I think the ideal solution would be to do everything with a python library
within voxceleb_prepare.py. I'm not sure if there are easy-to-install
libraries that can manage that though. To speed it up we can consider
multi-processing as well (even though donging in python is not ideal). As
for the time, however, I'm not that concerned. It is likely that users will
do it only once and then they save the converted version somewhere in their
system (as I did). Feel free to propose a solution that you think can work
well!
…On Thu, 4 Nov 2021 at 10:03, Xuechen Liu ***@***.***> wrote:
Hi all,
So in VoxCeleb recipe users are required to create wav files by
themselves. I personally am perfectly OK with this as a speaker recognition
researcher (have done it many times) but meanwhile think having a helper
recipe/function would be very helpful.
I do have scripts available and I would like to contribute. But it has two
versions so I would like to know the dev guys' opinion:
1. We process the m4a file one-by-one. This does not require any
dependency but takes time (I don't think several hours would be enough).
2. We split the IDs into different processes. But I think this process
cannot be done efficiently because of GIL. Of course, if we have tools like
Kaldi <https://github.com/kaldi-asr/kaldi>, it would not be a issue
because Kaldi can take advantages of multi-core by its perl runners. But of
course, we need Kaldi somewhere, which I am not sure if fits the logistics
of this toolkit.
What is your view on this? Do you have any alternative solutions? If so I
would appreciate it. Thanks in advance!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1093>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVRSGWEYSORAQJMBPDLUKKHDDANCNFSM5HLQP6TQ>
.
|
Beta Was this translation helpful? Give feedback.
-
@mravanelli Thanks for the reply! So the problem is actually on lacking tools from torchaudio on reading m4a file to raw waveform. Am I right? If so, as mentioned in the issue, pydub may do the job. But maybe we want to have torchaudio as the only library for audio processing? |
Beta Was this translation helpful? Give feedback.
-
(Acknowledged by torchaudio team.) |
Beta Was this translation helpful? Give feedback.
-
Yes
…On Thu, 4 Nov 2021 at 11:05, Xuechen Liu ***@***.***> wrote:
@mravanelli <https://github.com/mravanelli> Thanks for the reply! So the
problem is actually on lacking tools from torchaudio on reading m4a file to
raw waveform. Am I right?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1093 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVRSJ2PXBVRUZSUFHALUKKOM5ANCNFSM5HLQP6TQ>
.
|
Beta Was this translation helpful? Give feedback.
-
One direct solution is to integrate the "m4a->wav" process into the code, as streaming stuff. This can be done very effectively by calling Another solution is to either wait torchaudio dev genius or we finding a way to integrate some way of reading m4a files into the library. But in the case of this solution, I am not sure if we want to rely on particular third-party libs like pydub then - maybe we can get inspired. @mthrok @mravanelli thoughts? |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks @underdogliu for the question and I found this inconvenience too when I tried to run a benchmark with voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for the response of torchaudio team. |
Beta Was this translation helpful? Give feedback.
-
The cleanest solution would be to support m4a reading from torchaudio. This
way, other projects can benefit from it.
We can potentially integrate the call to ffmpeg using os.system($cmd).This
is a solution that we wanted to avoid because it makes the code dependent
on the operating system (and the tools installed in it).
…On Thu, 4 Nov 2021 at 13:40, Yingzhi WANG ***@***.***> wrote:
Hi, thanks @underdogliu <https://github.com/underdogliu> for the question
and I found this inconvenience too when I tried to run a benchmark with
voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for
the response of torchaudio team.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1093 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVX5JBJZA3EHKMCXMTTUKLARVANCNFSM5HLQP6TQ>
.
|
Beta Was this translation helpful? Give feedback.
-
@mthrok is there any plan in torchaudio to support m4a format? |
Beta Was this translation helpful? Give feedback.
-
It has been added to the future plan and I bumped up the priority. We should be able to work on this in the next quarter. (It is not a guarantee though) |
Beta Was this translation helpful? Give feedback.
-
@mthrok anything I can help? |
Beta Was this translation helpful? Give feedback.
-
@mravanelli So just a quick coming back on this issue. One option is definitely calling back Kaldi. Torchaudio by default supports kaldi-io as we all know. And for reading voxceleb's m4a files with necessary utilities (ffmpeg), we can split and load from Kaldi scp files. But I remember one kinda "selling point" of speechbrain is getting away from Kaldi. So what are your thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
I had a discussion with torchaudio. They will likely support m4a files in
one of the future versions. Once that is done, it will be super easy to
read these files without any intermediate conversion to wav.
…On Thu, 18 Nov 2021 at 03:05, Xuechen Liu ***@***.***> wrote:
@mravanelli <https://github.com/mravanelli> So just a quick coming back
on this issue. One option is definitely calling back Kaldi. Torchaudio by
default supports kaldi-io <https://pytorch.org/audio/stable/kaldi_io.html>
as we all know. And for reading voxceleb's m4a files with necessary
utilities (ffmpeg), we can split and load from Kaldi scp files. But I
remember one kinda "selling point" of speechbrain is getting away from
Kaldi. So what are your thoughts on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1093 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVUOWGL3LPKTCL6NQJDUMSXS7ANCNFSM5HLQP6TQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Hi all,
So in VoxCeleb recipe users are required to create wav files by themselves. I personally am perfectly OK with this as a speaker recognition researcher (have done it many times) but meanwhile think having a helper recipe/function would be very helpful.
I do have scripts available and I would like to contribute. But it has two versions so I would like to know the dev guys' opinion:
What is your view on this? Do you have any alternative solutions? If so I would appreciate it. Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions