Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTC decode very slow when training Mandarin Model #1831

Closed
jaycicle opened this issue Jan 14, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@jaycicle
Copy link

commented Jan 14, 2019

Hey guys,
I'm following this tutorial to train a Chinese Model DeepSpeech on my own data : https://discourse.mozilla.org/t/tutorial-how-i-trained-a-specific-french-model-to-control-my-robot/22830

I got the alphabet which contains 4333 Chinese characters, vocab, 3gram language model with lm.binary and trie.

Training Acoustic model is is normal, but the ctc-decode step is very very slowly(three batch(batch size is 12) of test data take two days to decode)!!!

this is the training log:
deepspeech

My training environment is:
Linux Ubuntu 16.04 with 4 Gpus(p100)
DeepSpeech-gpu-0.4.0
Tensorflow-1.12.0

@kdavis-mozilla

This comment has been minimized.

Copy link
Collaborator

commented Jan 14, 2019

This is not surprising as the number of elements in the alphabet increases dramatically from the case of English, where the problem is not noticeable.

What were are starting to experiment is an implementation of Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes which would have the CTC always output to 256 elements independent of language.

@jaycicle

This comment has been minimized.

Copy link
Author

commented Jan 14, 2019

@kdavis-mozilla Do you do any other optimization suggestion? It’s really a problem for me....
I have set the beam_width to 128, but shows no positive effect

@kdavis-mozilla

This comment has been minimized.

Copy link
Collaborator

commented Jan 14, 2019

You could try implementing the work in the cited paper.

@jaycicle jaycicle closed this Jan 17, 2019

@lock

This comment has been minimized.

Copy link

commented Feb 16, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Feb 16, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.