<h2>Speech recognition</h2>

This week we will deal with a simplified form of speech recognition. The languages are artificial and have been generated using a combination of Markov models and hidden Markov models (HMMs). In real life things are much more messy! This artificial data makes for a better tutorial with cleaner results.

There are two forms of the datasets available:

- In the first form, there are several audio files, which can be parsed into discrete phonemes. 

- In the second form, the parsing has already been done for you, and you are presented with long sequences of symbols. 

It is worth listening to the audio yourself, and seeing if you can determine any differences between the “languages” or “speakers” by ear!

If you want to process the audio dataset yourself, rather than using the parsed dataset, then it is advised to use: scipy.io.wavfile to read an audio file. All audio will be single channel (mono) and noiselessly generated from a small set of component sounds.

<h2>Language detection</h2>

There are three languages: A, B, and C. Each language uses the same set of symbols: "A, o, e, t, p, g, k". However, each language uses the symbols differently. In each of these languages we can model everything as P(next symbol | current symbol).

There is training data available for each language. This consists of several files each generated by sampling from a Markov model. Using python, build a Markov model for each of the languages.
Now use the Markov model and Bayes’ rule to classify the test cases. Write down how you used Bayes’ rule to get your classifier. Give the full posterior distribution for each test case.

Audio dataset: https://course-resources.minerva.kgi.edu/uploaded_files/mke/nglEdY/audio.zip

Symbol dataset: https://course-resources.minerva.kgi.edu/uploaded_files/mke/ryDvKV/symbol.zip

In [43]:
import numpy as np

def transition_matrix(transitions):
    '''
    the following code takes a list such as
    [1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
    with states labeled as successive integers starting with 0
    and returns a transition matrix, M,
    where M[i][j] is the probability of transitioning from i to j
    
    Adapted from: 
    https://stackoverflow.com/questions/46657221/generating-markov-transition-matrix-in-python
    '''
    n = 1+ max(transitions) #number of states

    #M = [[0]*n for _ in range(n)]
    M = np.empty((n,n))
    
    for (i,j) in zip(transitions,transitions[1:]):
        M[i][j] += 1

    #now convert to probabilities:
    for row in M:
        s = sum(row)
        if s > 0:
            row[:] = [f/s for f in row]
    return M


In [50]:
symbols = {'A':0, 'o':1, 'e':2, 't':3, 'p':4, 'g':5, 'k':6}

file = open("symbol/language-training-langB-0", "r").read()
new_file = [symbols[i] for i in file]
print(new_file)
tm = transition_matrix(new_file)

likelihood = 1
for i in range(len(new_file)-1):
    j = i + 1
    print(f'p_transition from {new_file[i]} to {new_file[j]}: {tm[new_file[i]][new_file[j]]}')

[1, 0, 3, 0, 5, 0, 5, 2, 5, 2, 6, 1, 0, 1, 0, 5, 3, 2, 3, 2, 1, 1, 2, 5, 0, 1, 2, 1, 0, 1, 0, 1, 0, 3, 2, 3, 2, 1, 2, 5, 1, 1, 0, 3, 2, 1, 0, 5, 2, 5, 0, 3, 2, 1, 3, 2, 3, 0, 5, 0, 1, 1, 2, 3, 2, 3, 2, 1, 0, 5, 2, 2, 1, 2, 5, 2, 3, 2, 3, 2, 5, 2, 0, 6, 1, 2, 3, 2, 3, 5, 0, 4, 1, 2, 1, 2, 3, 3, 2, 5]
p_transition from 1 to 0: 0.4285714285714286
p_transition from 0 to 3: 0.23529411764705882
p_transition from 3 to 0: 0.11764705882352941
p_transition from 0 to 5: 0.35294117647058826
p_transition from 5 to 0: 0.38461538461538464
p_transition from 0 to 5: 0.35294117647058826
p_transition from 5 to 2: 0.46153846153846156
p_transition from 2 to 5: 0.25
p_transition from 5 to 2: 0.46153846153846156
p_transition from 2 to 6: 0.03571428571428572
p_transition from 6 to 1: 1.0
p_transition from 1 to 0: 0.4285714285714286
p_transition from 0 to 1: 0.29411764705882354
p_transition from 1 to 0: 0.4285714285714286
p_transition from 0 to 5: 0.35294117647058826
p_transition from 5 to 3: 0.076923076923076

In [None]:
# posterior = p(language | string)
# p(language | string) = p(string | language) * p(language) / evidence
posterior = likelihood * prior / evidence
