Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About a VoxCeleb2 helper script #1093

Closed
underdogliu opened this issue Nov 4, 2021 · 12 comments
Closed

About a VoxCeleb2 helper script #1093

underdogliu opened this issue Nov 4, 2021 · 12 comments

Comments

@underdogliu
Copy link
Collaborator

Hi all,

So in VoxCeleb recipe users are required to create wav files by themselves. I personally am perfectly OK with this as a speaker recognition researcher (have done it many times) but meanwhile think having a helper recipe/function would be very helpful.

I do have scripts available and I would like to contribute. But it has two versions so I would like to know the dev guys' opinion:

  1. We process the m4a file one-by-one. This does not require any dependency but takes time (I don't think several hours would be enough).
  2. We split the IDs into different processes. But I think this process cannot be done efficiently because of GIL. Of course, if we have tools like Kaldi, it would not be a issue because Kaldi can take advantages of multi-core by its perl runners. But of course, we need Kaldi somewhere, which I am not sure if fits the logistics of this toolkit.

What is your view on this? Do you have any alternative solutions? If so I would appreciate it. Thanks in advance!

@mravanelli
Copy link
Collaborator

mravanelli commented Nov 4, 2021 via email

@mravanelli mravanelli self-assigned this Nov 4, 2021
@underdogliu
Copy link
Collaborator Author

underdogliu commented Nov 4, 2021

@mravanelli Thanks for the reply! So the problem is actually on lacking tools from torchaudio on reading m4a file to raw waveform. Am I right? If so, as mentioned in the issue, pydub may do the job. But maybe we want to have torchaudio as the only library for audio processing?

@mthrok
Copy link

mthrok commented Nov 4, 2021

(Acknowledged by torchaudio team.)

@mravanelli
Copy link
Collaborator

mravanelli commented Nov 4, 2021 via email

@underdogliu
Copy link
Collaborator Author

One direct solution is to integrate the "m4a->wav" process into the code, as streaming stuff. This can be done very effectively by calling os.system($cmd), where $cmd include ffmpeg operations.

Another solution is to either wait torchaudio dev genius or we finding a way to integrate some way of reading m4a files into the library. But in the case of this solution, I am not sure if we want to rely on particular third-party libs like pydub then - maybe we can get inspired.

@mthrok @mravanelli thoughts?

@BenoitWang
Copy link
Collaborator

Hi, thanks @underdogliu for the question and I found this inconvenience too when I tried to run a benchmark with voxceleb1+2. I am running ffmpeg as mentioned in readme while waiting for the response of torchaudio team.

@mravanelli
Copy link
Collaborator

mravanelli commented Nov 4, 2021 via email

@mravanelli
Copy link
Collaborator

@mthrok is there any plan in torchaudio to support m4a format?

@mthrok
Copy link

mthrok commented Nov 11, 2021

It has been added to the future plan and I bumped up the priority. We should be able to work on this in the next quarter. (It is not a guarantee though)

@underdogliu
Copy link
Collaborator Author

@mthrok anything I can help?

@underdogliu
Copy link
Collaborator Author

@mravanelli So just a quick coming back on this issue. One option is definitely calling back Kaldi. Torchaudio by default supports kaldi-io as we all know. And for reading voxceleb's m4a files with necessary utilities (ffmpeg), we can split and load from Kaldi scp files. But I remember one kinda "selling point" of speechbrain is getting away from Kaldi. So what are your thoughts on this?

@mravanelli
Copy link
Collaborator

mravanelli commented Nov 18, 2021 via email

@speechbrain speechbrain locked and limited conversation to collaborators Oct 28, 2022
@Adel-Moumen Adel-Moumen converted this issue into discussion #1647 Oct 28, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Development

No branches or pull requests

4 participants