Impossible to transcribe audio file with the same name #1586

qanastek · 2022-09-27T21:50:16Z

Due to the implementation of the transcribe_file method and specially of the sub method load_audio, it is impossible to transcribe audio file with the same name despite being located in different directories since the method copy the audio file locally at savedir. The new file with the same name doesn't overwrite the file, since it already exist and make the transcription impossible.

Example of file paths :

/users/<user_name>/datasets/MyDataset/recordings/ZOWxbXGuoU/recording_1.wav
/users/<user_name>/datasets/MyDataset/recordings/2h4v5o49Hj/recording_1.wav

It will be cool to get rid of this local file to fix this issue and improve overall performances of the transcription method.

One easy but dirty way of fixing it is to remove the file at the end of the transcription to allow further ones.

The text was updated successfully, but these errors were encountered:

anautsch · 2022-09-28T06:33:56Z

Hi @qanastek yep.

This relates to #1303 - this topic touches on data handling in general:

for researchers, can a training recipe run through?
for industry, are the right audios loaded through the pretrained interfaces?
for curious users, does the demo code provided on HuggingFace run?

Dropping a file does not help if there's DDP and multiple nodes are having fun with the same file name.

Hope we can get to it soon. As you mentioned, it's internal data handling and there's an expectation of this just running well (as it is also stated in the mentioned PR).

anautsch self-assigned this Sep 28, 2022

Adel-Moumen added the bug Something isn't working label Sep 28, 2022

asumagic linked a pull request Apr 8, 2024 that will close this issue

Allow not using symlinks when fetching files #2476

Draft

13 tasks

asumagic mentioned this issue May 23, 2024

Same result for different samples (with same name) using speech separation #2555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impossible to transcribe audio file with the same name #1586

Impossible to transcribe audio file with the same name #1586

qanastek commented Sep 27, 2022 •

edited

anautsch commented Sep 28, 2022

Impossible to transcribe audio file with the same name #1586

Impossible to transcribe audio file with the same name #1586

Comments

qanastek commented Sep 27, 2022 • edited

anautsch commented Sep 28, 2022

qanastek commented Sep 27, 2022 •

edited