About a VoxCeleb2 helper script #1093

underdogliu · 2021-11-04T14:03:14Z

Hi all,

So in VoxCeleb recipe users are required to create wav files by themselves. I personally am perfectly OK with this as a speaker recognition researcher (have done it many times) but meanwhile think having a helper recipe/function would be very helpful.

I do have scripts available and I would like to contribute. But it has two versions so I would like to know the dev guys' opinion:

We process the m4a file one-by-one. This does not require any dependency but takes time (I don't think several hours would be enough).
We split the IDs into different processes. But I think this process cannot be done efficiently because of GIL. Of course, if we have tools like Kaldi, it would not be a issue because Kaldi can take advantages of multi-core by its perl runners. But of course, we need Kaldi somewhere, which I am not sure if fits the logistics of this toolkit.

What is your view on this? Do you have any alternative solutions? If so I would appreciate it. Thanks in advance!

mravanelli · 2021-11-04T14:23:26Z

Hi, this is rather annoying. The main issue is that torchaudio cannot read m4a and thus users have to do the conversion to a readable format (e.g, wav). This is discussed also in the torchaudio github ( pytorch/audio#104). On our side, we point in the README file to some bash scripts that can be used to do the conversion. I think the ideal solution would be to do everything with a python library within voxceleb_prepare.py. I'm not sure if there are easy-to-install libraries that can manage that though. To speed it up we can consider multi-processing as well (even though donging in python is not ideal). As for the time, however, I'm not that concerned. It is likely that users will do it only once and then they save the converted version somewhere in their system (as I did). Feel free to propose a solution that you think can work well!

…

On Thu, 4 Nov 2021 at 10:03, Xuechen Liu ***@***.***> wrote: Hi all, So in VoxCeleb recipe users are required to create wav files by themselves. I personally am perfectly OK with this as a speaker recognition researcher (have done it many times) but meanwhile think having a helper recipe/function would be very helpful. I do have scripts available and I would like to contribute. But it has two versions so I would like to know the dev guys' opinion: 1. We process the m4a file one-by-one. This does not require any dependency but takes time (I don't think several hours would be enough). 2. We split the IDs into different processes. But I think this process cannot be done efficiently because of GIL. Of course, if we have tools like Kaldi <https://github.com/kaldi-asr/kaldi>, it would not be a issue because Kaldi can take advantages of multi-core by its perl runners. But of course, we need Kaldi somewhere, which I am not sure if fits the logistics of this toolkit. What is your view on this? Do you have any alternative solutions? If so I would appreciate it. Thanks in advance! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1093>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVRSGWEYSORAQJMBPDLUKKHDDANCNFSM5HLQP6TQ> .

underdogliu · 2021-11-04T15:05:39Z

@mravanelli Thanks for the reply! So the problem is actually on lacking tools from torchaudio on reading m4a file to raw waveform. Am I right? If so, as mentioned in the issue, pydub may do the job. But maybe we want to have torchaudio as the only library for audio processing?

mthrok · 2021-11-04T15:24:10Z

(Acknowledged by torchaudio team.)

mravanelli · 2021-11-04T15:27:45Z

Yes

…

On Thu, 4 Nov 2021 at 11:05, Xuechen Liu ***@***.***> wrote: @mravanelli <https://github.com/mravanelli> Thanks for the reply! So the problem is actually on lacking tools from torchaudio on reading m4a file to raw waveform. Am I right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1093 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVRSJ2PXBVRUZSUFHALUKKOM5ANCNFSM5HLQP6TQ> .

underdogliu · 2021-11-04T16:36:14Z

One direct solution is to integrate the "m4a->wav" process into the code, as streaming stuff. This can be done very effectively by calling os.system($cmd), where $cmd include ffmpeg operations.

Another solution is to either wait torchaudio dev genius or we finding a way to integrate some way of reading m4a files into the library. But in the case of this solution, I am not sure if we want to rely on particular third-party libs like pydub then - maybe we can get inspired.

@mthrok @mravanelli thoughts?

BenoitWang · 2021-11-04T17:40:30Z

Hi, thanks @underdogliu for the question and I found this inconvenience too when I tried to run a benchmark with voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for the response of torchaudio team.

mravanelli · 2021-11-04T17:45:39Z

The cleanest solution would be to support m4a reading from torchaudio. This way, other projects can benefit from it. We can potentially integrate the call to ffmpeg using os.system($cmd).This is a solution that we wanted to avoid because it makes the code dependent on the operating system (and the tools installed in it).

…

On Thu, 4 Nov 2021 at 13:40, Yingzhi WANG ***@***.***> wrote: Hi, thanks @underdogliu <https://github.com/underdogliu> for the question and I found this inconvenience too when I tried to run a benchmark with voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for the response of torchaudio team. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1093 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVX5JBJZA3EHKMCXMTTUKLARVANCNFSM5HLQP6TQ> .

mravanelli · 2021-11-10T23:56:53Z

@mthrok is there any plan in torchaudio to support m4a format?

mthrok · 2021-11-11T19:36:59Z

It has been added to the future plan and I bumped up the priority. We should be able to work on this in the next quarter. (It is not a guarantee though)

underdogliu · 2021-11-16T17:32:03Z

@mthrok anything I can help?

underdogliu · 2021-11-18T08:04:53Z

@mravanelli So just a quick coming back on this issue. One option is definitely calling back Kaldi. Torchaudio by default supports kaldi-io as we all know. And for reading voxceleb's m4a files with necessary utilities (ffmpeg), we can split and load from Kaldi scp files. But I remember one kinda "selling point" of speechbrain is getting away from Kaldi. So what are your thoughts on this?

mravanelli · 2021-11-18T14:17:27Z

I had a discussion with torchaudio. They will likely support m4a files in one of the future versions. Once that is done, it will be super easy to read these files without any intermediate conversion to wav.

…

On Thu, 18 Nov 2021 at 03:05, Xuechen Liu ***@***.***> wrote: @mravanelli <https://github.com/mravanelli> So just a quick coming back on this issue. One option is definitely calling back Kaldi. Torchaudio by default supports kaldi-io <https://pytorch.org/audio/stable/kaldi_io.html> as we all know. And for reading voxceleb's m4a files with necessary utilities (ffmpeg), we can split and load from Kaldi scp files. But I remember one kinda "selling point" of speechbrain is getting away from Kaldi. So what are your thoughts on this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1093 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVUOWGL3LPKTCL6NQJDUMSXS7ANCNFSM5HLQP6TQ> .

mravanelli self-assigned this Nov 4, 2021

underdogliu mentioned this issue Nov 4, 2021

Add audio I/O support for other audio formats, such as m4a pytorch/audio#1979

Closed

anautsch added this to To do in Speaker Recognition & Diarization (Voice Biometrics) via automation Apr 21, 2022

speechbrain locked and limited conversation to collaborators Oct 28, 2022

Adel-Moumen converted this issue into discussion #1647 Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

About a VoxCeleb2 helper script #1093

About a VoxCeleb2 helper script #1093

underdogliu commented Nov 4, 2021

mravanelli commented Nov 4, 2021 via email

underdogliu commented Nov 4, 2021 •

edited

Loading

mthrok commented Nov 4, 2021

mravanelli commented Nov 4, 2021 via email

underdogliu commented Nov 4, 2021

BenoitWang commented Nov 4, 2021

mravanelli commented Nov 4, 2021 via email

mravanelli commented Nov 10, 2021

mthrok commented Nov 11, 2021

underdogliu commented Nov 16, 2021

underdogliu commented Nov 18, 2021

mravanelli commented Nov 18, 2021 via email

This issue was moved to a discussion.

This issue was moved to a discussion.

About a VoxCeleb2 helper script #1093

About a VoxCeleb2 helper script #1093

Comments

underdogliu commented Nov 4, 2021

mravanelli commented Nov 4, 2021 via email

underdogliu commented Nov 4, 2021 • edited Loading

mthrok commented Nov 4, 2021

mravanelli commented Nov 4, 2021 via email

underdogliu commented Nov 4, 2021

BenoitWang commented Nov 4, 2021

mravanelli commented Nov 4, 2021 via email

mravanelli commented Nov 10, 2021

mthrok commented Nov 11, 2021

underdogliu commented Nov 16, 2021

underdogliu commented Nov 18, 2021

mravanelli commented Nov 18, 2021 via email

This issue was moved to a discussion.

underdogliu commented Nov 4, 2021 •

edited

Loading