Speech Recognition

Speech Recognition using HMM, GMM

Task Description

Recognize continuous english digits(numbers) through HMM(Hidden Markov Model), GMM(Gaussian Mixture Model).
Using our modeled word and universal utterance HMM, implement the Viterbi Algorithm and find out the most likely sequence of words.
We will follow the following stages to reach our desired state.

phone HMM

A phone stands for the smallest unit for our language. Each word consists of one or more phones.
In our hmm header, we have the transition probabilities predefined for every phone.
Each phone is divided into three HMM states. In our model we have 2 pdfs for one state.
The probability matrix will be 5 * 5 to include prior probabilities that come into the phone.

The silent "sp" phone will exceptionally have 1 state.

word HMM

With the phone HMM we have constructed, we bind them together to make our word HMM.
The dictionary text file defines how the words are composed.
When we construct the word HMM, we will have to recalculate the transition probabilities by recalculating them.

Continuous Universal Utterance HMM

In order to recognize several digits continuously, we again combine word HMMs to complete our Universal Utterance HMM.
Instead of placing the silent phones "sil" and "sp" each in the start and end, we move "sil" as a seperate word.
For the transition probability, we will have to utilize the word probabilities from the unigram & bigram textfiles.

GMM

To calculate the observation probability, we use GMM(Gaussian Mixture Model).
When we actually implement it in code, there is a possibility that underflow will occur.
To solve this problem, we use logarithms to prevent underflow from happening.
At the last level, we combine all the pdfs with the weights into one state.

Viterbi

After all the calculation and constructing of the HMMs are done, we have to find the best state sequence so that the program will be able to recognize the words.
The words(speech) will be transformed into "MFCC vectors" which is of 39 dimension.

After calculating the cumulative probability and state sequence, we use the viterbi algorithm to convert the numbers back into an actual word.

Reference

HMM
medium
medium
medium
medium
medium
Speech Recognition – HMM
Pawar, Ganesh S, and Sunil S Morade. "Realization Of Hidden Markov Model For English Digit Recognition". 2014.
Uchat, Nirav S. "Hidden Markov Model And Speech Recognition(slide)". Department Of Computer Science And Engineering Indian Institute Of Technology, Bombay Mumbai.
Wayne Ward HIDDEN MARKOV MODELS IN SPEECH RECOGNITION(slide) . Carnegie Mellon University Pittsburgh, PA
Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom "Gaussian Mixture Models and Introduction to HMM’s(slide)". Watson Group IBM T.J. Watson Research Center
Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom "The Big Picture/Language Modeling(slide)". Watson Group IBM T.J. Watson Research Center

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
pictures		pictures
program files		program files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pictures

pictures

program files

program files

README.md

README.md

Repository files navigation

Speech Recognition

Task Description

phone HMM

word HMM

Continuous Universal Utterance HMM

GMM

Viterbi

Reference

About

Releases

Packages

Languages

yuridekim/Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition

Task Description

phone HMM

word HMM

Continuous Universal Utterance HMM

GMM

Viterbi

Reference

About

Resources

Stars

Watchers

Forks

Languages