Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicating audio/text pairs #28

Open
carlfm01 opened this issue Apr 18, 2020 · 1 comment
Open

Duplicating audio/text pairs #28

carlfm01 opened this issue Apr 18, 2020 · 1 comment

Comments

@carlfm01
Copy link

Hello @tilmankamp,

I'm using the transcribe.py with the catalog filetype tool to align my audios, but it is generating duplicated transcriptions.

Audio type
Very long audios with about 1h of pure speech.

Text type
Very long text with the correct text in a sequential way, no punctuation, pure text. The text was reviewed manually by a professional so it is 98%+ accurate to the audio.

I'm using everything on default, but it still duplicates if I play with the configuration, I always use the same catalog file for both process aligning and cutting.

I've validated that the segment of duplicated text appears one time in the whole text to cut.

Thanks.

@tilmankamp
Copy link
Contributor

Are you able to provide a publicly accessible audio file that allows reproducing this result?
Is this definitely caused by transcribe.py? If that's the case, the issue should be moved to DeepSpeech.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants