About a VoxCeleb2 helper script #1647

underdogliu · 2021-11-04T14:03:14Z

underdogliu
Nov 4, 2021
Collaborator

Hi all,

So in VoxCeleb recipe users are required to create wav files by themselves. I personally am perfectly OK with this as a speaker recognition researcher (have done it many times) but meanwhile think having a helper recipe/function would be very helpful.

I do have scripts available and I would like to contribute. But it has two versions so I would like to know the dev guys' opinion:

We process the m4a file one-by-one. This does not require any dependency but takes time (I don't think several hours would be enough).
We split the IDs into different processes. But I think this process cannot be done efficiently because of GIL. Of course, if we have tools like Kaldi, it would not be a issue because Kaldi can take advantages of multi-core by its perl runners. But of course, we need Kaldi somewhere, which I am not sure if fits the logistics of this toolkit.

What is your view on this? Do you have any alternative solutions? If so I would appreciate it. Thanks in advance!

mravanelli · 2021-11-04T14:23:26Z

mravanelli
Nov 4, 2021
Maintainer

Hi, this is rather annoying. The main issue is that torchaudio cannot read m4a and thus users have to do the conversion to a readable format (e.g, wav). This is discussed also in the torchaudio github ( pytorch/audio#104). On our side, we point in the README file to some bash scripts that can be used to do the conversion. I think the ideal solution would be to do everything with a python library within voxceleb_prepare.py. I'm not sure if there are easy-to-install libraries that can manage that though. To speed it up we can consider multi-processing as well (even though donging in python is not ideal). As for the time, however, I'm not that concerned. It is likely that users will do it only once and then they save the converted version somewhere in their system (as I did). Feel free to propose a solution that you think can work well!

…

On Thu, 4 Nov 2021 at 10:03, Xuechen Liu ***@***.***> wrote: Hi all, So in VoxCeleb recipe users are required to create wav files by themselves. I personally am perfectly OK with this as a speaker recognition researcher (have done it many times) but meanwhile think having a helper recipe/function would be very helpful. I do have scripts available and I would like to contribute. But it has two versions so I would like to know the dev guys' opinion: 1. We process the m4a file one-by-one. This does not require any dependency but takes time (I don't think several hours would be enough). 2. We split the IDs into different processes. But I think this process cannot be done efficiently because of GIL. Of course, if we have tools like Kaldi <https://github.com/kaldi-asr/kaldi>, it would not be a issue because Kaldi can take advantages of multi-core by its perl runners. But of course, we need Kaldi somewhere, which I am not sure if fits the logistics of this toolkit. What is your view on this? Do you have any alternative solutions? If so I would appreciate it. Thanks in advance! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1093>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVRSGWEYSORAQJMBPDLUKKHDDANCNFSM5HLQP6TQ> .

0 replies

underdogliu · 2021-11-04T15:05:39Z

underdogliu
Nov 4, 2021
Collaborator Author

@mravanelli Thanks for the reply! So the problem is actually on lacking tools from torchaudio on reading m4a file to raw waveform. Am I right? If so, as mentioned in the issue, pydub may do the job. But maybe we want to have torchaudio as the only library for audio processing?

0 replies

mthrok · 2021-11-04T15:24:10Z

mthrok
Nov 4, 2021

(Acknowledged by torchaudio team.)

0 replies

mravanelli · 2021-11-04T15:27:45Z

mravanelli
Nov 4, 2021
Maintainer

Yes

…

On Thu, 4 Nov 2021 at 11:05, Xuechen Liu ***@***.***> wrote: @mravanelli <https://github.com/mravanelli> Thanks for the reply! So the problem is actually on lacking tools from torchaudio on reading m4a file to raw waveform. Am I right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1093 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVRSJ2PXBVRUZSUFHALUKKOM5ANCNFSM5HLQP6TQ> .

0 replies

underdogliu · 2021-11-04T16:36:14Z

underdogliu
Nov 4, 2021
Collaborator Author

One direct solution is to integrate the "m4a->wav" process into the code, as streaming stuff. This can be done very effectively by calling os.system($cmd), where $cmd include ffmpeg operations.

Another solution is to either wait torchaudio dev genius or we finding a way to integrate some way of reading m4a files into the library. But in the case of this solution, I am not sure if we want to rely on particular third-party libs like pydub then - maybe we can get inspired.

@mthrok @mravanelli thoughts?

0 replies

BenoitWang · 2021-11-04T17:40:30Z

BenoitWang
Nov 4, 2021
Collaborator

Hi, thanks @underdogliu for the question and I found this inconvenience too when I tried to run a benchmark with voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for the response of torchaudio team.

0 replies

mravanelli · 2021-11-04T17:45:39Z

mravanelli
Nov 4, 2021
Maintainer

The cleanest solution would be to support m4a reading from torchaudio. This way, other projects can benefit from it. We can potentially integrate the call to ffmpeg using os.system($cmd).This is a solution that we wanted to avoid because it makes the code dependent on the operating system (and the tools installed in it).

…

On Thu, 4 Nov 2021 at 13:40, Yingzhi WANG ***@***.***> wrote: Hi, thanks @underdogliu <https://github.com/underdogliu> for the question and I found this inconvenience too when I tried to run a benchmark with voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for the response of torchaudio team. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1093 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVX5JBJZA3EHKMCXMTTUKLARVANCNFSM5HLQP6TQ> .

0 replies

mravanelli · 2021-11-10T23:56:53Z

mravanelli
Nov 10, 2021
Maintainer

@mthrok is there any plan in torchaudio to support m4a format?

0 replies

mthrok · 2021-11-11T19:36:59Z

mthrok
Nov 11, 2021

It has been added to the future plan and I bumped up the priority. We should be able to work on this in the next quarter. (It is not a guarantee though)

0 replies

underdogliu · 2021-11-16T17:32:03Z

underdogliu
Nov 16, 2021
Collaborator Author

@mthrok anything I can help?

0 replies

underdogliu · 2021-11-18T08:04:53Z

underdogliu
Nov 18, 2021
Collaborator Author

@mravanelli So just a quick coming back on this issue. One option is definitely calling back Kaldi. Torchaudio by default supports kaldi-io as we all know. And for reading voxceleb's m4a files with necessary utilities (ffmpeg), we can split and load from Kaldi scp files. But I remember one kinda "selling point" of speechbrain is getting away from Kaldi. So what are your thoughts on this?

0 replies

mravanelli · 2021-11-18T14:17:27Z

mravanelli
Nov 18, 2021
Maintainer

I had a discussion with torchaudio. They will likely support m4a files in one of the future versions. Once that is done, it will be super easy to read these files without any intermediate conversion to wav.

…

On Thu, 18 Nov 2021 at 03:05, Xuechen Liu ***@***.***> wrote: @mravanelli <https://github.com/mravanelli> So just a quick coming back on this issue. One option is definitely calling back Kaldi. Torchaudio by default supports kaldi-io <https://pytorch.org/audio/stable/kaldi_io.html> as we all know. And for reading voxceleb's m4a files with necessary utilities (ffmpeg), we can split and load from Kaldi scp files. But I remember one kinda "selling point" of speechbrain is getting away from Kaldi. So what are your thoughts on this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1093 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVUOWGL3LPKTCL6NQJDUMSXS7ANCNFSM5HLQP6TQ> .

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About a VoxCeleb2 helper script #1647

{{title}}

Replies: 12 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

About a VoxCeleb2 helper script #1647

underdogliu Nov 4, 2021 Collaborator

Replies: 12 comments

mravanelli Nov 4, 2021 Maintainer

underdogliu Nov 4, 2021 Collaborator Author

mthrok Nov 4, 2021

mravanelli Nov 4, 2021 Maintainer

underdogliu Nov 4, 2021 Collaborator Author

BenoitWang Nov 4, 2021 Collaborator

mravanelli Nov 4, 2021 Maintainer

mravanelli Nov 10, 2021 Maintainer

mthrok Nov 11, 2021

underdogliu Nov 16, 2021 Collaborator Author

underdogliu Nov 18, 2021 Collaborator Author

mravanelli Nov 18, 2021 Maintainer

underdogliu
Nov 4, 2021
Collaborator

mravanelli
Nov 4, 2021
Maintainer

underdogliu
Nov 4, 2021
Collaborator Author

mthrok
Nov 4, 2021

mravanelli
Nov 4, 2021
Maintainer

underdogliu
Nov 4, 2021
Collaborator Author

BenoitWang
Nov 4, 2021
Collaborator

mravanelli
Nov 4, 2021
Maintainer

mravanelli
Nov 10, 2021
Maintainer

mthrok
Nov 11, 2021

underdogliu
Nov 16, 2021
Collaborator Author

underdogliu
Nov 18, 2021
Collaborator Author

mravanelli
Nov 18, 2021
Maintainer