Speech recognition

This week we will deal with a simplified form of speech recognition. The languages are artificial and have been generated using a combination of Markov models and hidden Markov models (HMMs). In real life things are much more messy! This artificial data makes for a better tutorial with cleaner results.

There are two forms of the datasets available. In the first form, there are several audio files, which can be parsed into discrete phonemes. In the second form, the parsing has already been done for you, and you are presented with long sequences of symbols. It is worth listening to the audio yourself, and seeing if you can determine any differences between the “languages” or “speakers” by ear!


If you want to process the audio dataset yourself, rather than using the parsed dataset, then it is advised to use: scipy.io.wavfile to read an audio file. All audio will be single channel (mono) and noiselessly generated from a small set of component sounds.


1 - Language detection

There are three languages: A, B, and C. Each language uses the same set of symbols: “A, o, e, t, p, g, and k. However, each language uses the symbols differently. In each of these languages we can model everything as P(next symbol | current symbol).

There is training data available for each language. This consists of several files each generated by sampling from a Markov model. Using python, build a Markov model for each of the languages.
Now use the Markov model and Bayes’ rule to classify the test cases. Write down how you used Bayes’ rule to get your classifier. Give the full posterior distribution for each test case.
Audio dataset: https://course-resources.minerva.kgi.edu/uploaded_files/mke/nglEdY/audio.zip

Symbol dataset: https://course-resources.minerva.kgi.edu/uploaded_files/mke/ryDvKV/symbol.zip

In [1]:
import glob
import numpy as np
from hmmlearn import hmm

In [2]:
path = 'symbol/*'
# files = glob.glob(path)


In [3]:
CHARS = ['A', 'o', 'e', 't', 'p', 'g', 'k']
LANGS = ('A', 'B', 'C')

In [4]:
train_A = [f for f in glob.glob(path) if "langA" in f ]
train_B = [f for f in glob.glob(path) if "langB" in f ]
train_C = [f for f in glob.glob(path) if "C" in f ]
test = [f for f in glob.glob(path) if "test" in f ]

In [5]:
N = len(CHARS)
M = len(LANGS)

# fit params (N*N matrix)
trans_matrix_A = np.zeros ((N,N))
probab_matrix_A = np.ones_like((N,N))

trans_matrix_B = np.zeros ((N,N))
probab_matrix_B = np.ones_like((N,N))

trans_matrix_C = np.zeros ((N,N))
probab_matrix_C = np.ones_like((N,N))

#predict params
prior = np.ones((M,)) / M # uniform prior row vector (M*1)

In [6]:
#fit funcs

#Use seq to fit the prob matrix
def fit_model(sequence, probab_matrix, trans_matrix, CHARS=CHARS):
    #counts of transitions
    for first, second in zip(sequence, sequence[1:]):
        trans_matrix[CHARS.index(first), CHARS.index(second)] += 1 
#     print (trans_matrix)
#     print (np.sum(trans_matrix, axis=1)) 
    
    #normlize
#     row_sums = trans_matrix.sum(axis=1)
#     probab_matrix = trans_matrix / row_sums[:, np.newaxis]
    return (trans_matrix / np.sum(trans_matrix, axis=1))
 

# 
def confidence_model(sequence, probab_matrix, CHARS = CHARS):
        #log likelihood
        log_prob = np.sum([np.log1p(probab_matrix[CHARS.index(first), 
                                                CHARS.index(second)])
                           for first, second in zip(sequence, sequence[1:])])

        return log_prob


In [7]:
for p in train_A:
    with open(p) as f:
        sequence = f.read()
        probab_matrix_A= fit_model(sequence, probab_matrix_A, trans_matrix_A)


for p in train_B:
    with open(p) as f:
        sequence = f.read()
        probab_matrix_B= fit_model(sequence, probab_matrix_B, trans_matrix_B)
        
for p in train_C:
    with open(p) as f:
        sequence = f.read()
        probab_matrix_C= fit_model(sequence, probab_matrix_C, trans_matrix_C)
        
        
# print (probab_matrix_A)
# print (probab_matrix_B)
# print (probab_matrix_C)


  


In [8]:
def predict_probab(sequence, probab_matrixes = [probab_matrix_A, probab_matrix_B, probab_matrix_C] ):
    
    likelihood = np.array([confidence_model(sequence, probab_matrix) for probab_matrix in probab_matrixes]) * prior
#     print (likelihood)
#     print (np.sum(likelihood))
    return likelihood / np.sum(likelihood)
            
def predict_lang(sequence, LANGS= LANGS):
    lang_prob = predict_probab(sequence)
#     print (lang_prob)
    return LANGS[np.argmax(lang_prob)]
    
def confidence(sequence):
    return np.max(predict_probab(sequence))      

In [10]:
for p in test:
    with open(p) as f:
        test_sequence = f.read()
#         print ()
#         print (test_sequence)
        print (predict_lang(test_sequence))
        print (predict_probab(test_sequence))
        print (confidence(test_sequence))
        print ()

B
[0.28728075 0.56292148 0.14979776]
0.5629214814395022

A
[0.53336556 0.25270946 0.21392498]
0.533365557565735

A
[0.50524363 0.23734176 0.25741461]
0.5052436301033839

A
[0.47214125 0.25503971 0.27281903]
0.47214125185327466

A
[0.46269974 0.33464608 0.20265418]
0.4626997411516312

C
[0.20737789 0.13968117 0.65294094]
0.6529409446734137

A
[0.51415268 0.27934916 0.20649816]
0.514152676969759

C
[0.14789615 0.17909483 0.67300902]
0.6730090213000319

C
[0.19774337 0.15432402 0.64793262]
0.6479326164236276

B
[0.23501763 0.60802956 0.15695281]
0.6080295578994104

