Prediction on new audio file #23

nisarshah · 2018-11-29T10:51:42Z

Hi,
How to use pytorch-kaldi in production environment after training the Model.
I have models ready, which I have generated by using core Kaldi. The problem I am facing is that it takes lot of time during decoding/prediction phase.

So please let me know how to use this tool during live environment.
Also if you have useful suggestions for Kaldi deployments please share.

Also I am planing to integrate Kaldi model in one of ours applications which is live.
So yours suggestions will be very useful for me.

--
thanks
Nisar

TParcollet · 2018-11-29T18:30:38Z

Hi !
For now, the only solution is to first train your pytorch model, and then call run_exp.py with a modified conf file with the number of epoch set to 0 (and also a specific [dataset] section that you can call as a testing dataset. We are aware that this is not optimal for real production case, and we are currently working on a side script that one can call to just decode .wav files from a previously trained pytorch model. Nonetheless, you can dive a bit on the run_exp.sh script to better understand how you can easily build your own script (if you are in a hurry).

I worked a bit with Kaldi in production environment (with automatic transcriptions of uploaded audio files). Nonetheless, and as you mention, the decoding time can be a problem. One of the solution we found is to use speaker diarization, so we can split the decoding in multiple threads with one thread equal to a speaker.

mravanelli · 2018-11-29T18:34:48Z

Yes, the current version of the toolkit is mainly designed for off-line speech recognition. If you would like to switch to on-line speech recognition, what you could do is to redirect the posterior probabilities (now saved into and *.ark file) into the standar output and read them with the kaldi script for on-line decoding. This can be done, but it is not implemented yet... Mirco

…

On Thu, Nov 29, 2018 at 1:30 PM Parcollet Titouan ***@***.***> wrote: Hi ! For now, the only solution is to first train your pytorch model, and then call run_exp.py with a modified conf file with the number of epoch set to 0 (and also a specific [dataset] section that you can call as a testing dataset. We are aware that this is not optimal for real production case, and we are currently working on a side script that one can call to just decode .wav files from a previously trained pytorch model. Nonetheless, you can dive a bit on the run_exp.sh script to better understand how you can easily build your own script (if you are in a hurry). I worked a bit with Kaldi in production environment (with automatic transcriptions of uploaded audio files). Nonetheless, and as you mention, the decoding time can be a problem. One of the solution we found is to use speaker diarization, so we can split the decoding in multiple threads with one thread equal to a speaker. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQGs1pVTDCieJxRLCXoXxkiPdGoMtI5jks5u0CfQgaJpZM4Y5aZ5> .

TParcollet closed this as completed Dec 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction on new audio file #23

Prediction on new audio file #23

nisarshah commented Nov 29, 2018

TParcollet commented Nov 29, 2018

mravanelli commented Nov 29, 2018 via email

Prediction on new audio file #23

Prediction on new audio file #23

Comments

nisarshah commented Nov 29, 2018

TParcollet commented Nov 29, 2018

mravanelli commented Nov 29, 2018 via email