Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to transcribe audio file with the same name #1586

Open
qanastek opened this issue Sep 27, 2022 · 1 comment · May be fixed by #2476
Open

Impossible to transcribe audio file with the same name #1586

qanastek opened this issue Sep 27, 2022 · 1 comment · May be fixed by #2476
Assignees
Labels
bug Something isn't working

Comments

@qanastek
Copy link
Collaborator

qanastek commented Sep 27, 2022

Due to the implementation of the transcribe_file method and specially of the sub method load_audio, it is impossible to transcribe audio file with the same name despite being located in different directories since the method copy the audio file locally at savedir. The new file with the same name doesn't overwrite the file, since it already exist and make the transcription impossible.

Example of file paths :

  • /users/<user_name>/datasets/MyDataset/recordings/ZOWxbXGuoU/recording_1.wav
  • /users/<user_name>/datasets/MyDataset/recordings/2h4v5o49Hj/recording_1.wav

It will be cool to get rid of this local file to fix this issue and improve overall performances of the transcription method.

One easy but dirty way of fixing it is to remove the file at the end of the transcription to allow further ones.

@anautsch
Copy link
Collaborator

Hi @qanastek yep.

This relates to #1303 - this topic touches on data handling in general:

  • for researchers, can a training recipe run through?
  • for industry, are the right audios loaded through the pretrained interfaces?
  • for curious users, does the demo code provided on HuggingFace run?

Dropping a file does not help if there's DDP and multiple nodes are having fun with the same file name.

Hope we can get to it soon. As you mentioned, it's internal data handling and there's an expectation of this just running well (as it is also stated in the mentioned PR).

@anautsch anautsch self-assigned this Sep 28, 2022
@Adel-Moumen Adel-Moumen added the bug Something isn't working label Sep 28, 2022
@asumagic asumagic linked a pull request Apr 8, 2024 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants