<h2>Speaker identification</h2>

There are three people in a room. Each says about 10 phonemes, before being randomly interrupted by someone else. When they speak they all sound the same, however each person tends to use different phonemes in their speech. Specifically we can model the following transition probabilities that someone will interrupt the current speaker: P(speaker i at time t+1 | speaker j at time t). We can also model the probability over phonemes given a particular speaker: P(phoneme | speaker i). The phonemes are identical to the ones introduced in problem 1 (but the transition matrices are obviously different, since they take a different form altogether).

1) Write down the update equations that you will need to train a hidden Markov model. Using the information given above, write down a sensible initialization for the transition matrix.

2) Write your own python code to train a hidden Markov model on the data. You may look at code online, but will need to reference any code that helps you with your implementation.

3) From matplotlb use a stackplot (https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.stackplot.html) to show the probability of a particular person speaking.

<h4>Resources</h4>

Audio dataset: https://course-resources.minerva.kgi.edu/uploaded_files/mke/VW8Rjr/speaker.wav.zip

Symbol dataset: https://course-resources.minerva.kgi.edu/uploaded_files/mke/n705lY/speaker

Update equations:

transition matrix = $\alpha_{ij}$ = Number of transitions from person $i$ to person $j$ / Number of transitions from person $i$

emission matrix = $\beta_j(k)$ = Number of in-state person $j$ and output phonemes $k$ / Number of in-state person $j$

In [106]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import warnings
warnings.filterwarnings("once", category=DeprecationWarning) 

In [107]:
#read data
data = open("speaker.txt", "r").read()

#define symbol to int dictionary
symbols = {'A':0, 'o':1, 'e':2, 't':3, 'p':4, 'g':5, 'k':6}
new_data = [symbols[i] for i in data]
print(new_data[:10])

#break string down into segments of 10 phonemes
data_seg = np.array(np.array([new_data[i:i+10] for i in range(0, len(new_data), 10)]))
print(data_seg[:10])



[2, 1, 5, 5, 2, 5, 5, 0, 2, 5]
[[2 1 5 5 2 5 5 0 2 5]
 [5 2 4 5 4 4 4 1 4 4]
 [1 5 1 4 4 4 5 1 4 4]
 [1 1 4 2 5 0 0 1 0 0]
 [0 3 0 3 3 3 1 1 2 4]
 [1 1 4 4 3 2 2 2 2 1]
 [3 4 1 4 4 4 2 2 1 2]
 [4 1 1 1 4 1 4 5 1 1]
 [1 2 4 1 2 4 1 3 4 1]
 [2 5 1 5 5 5 5 5 6 2]]


In [110]:
from pomegranate import *
model = HiddenMarkovModel().from_samples(NormalDistribution,
                                         n_components=7,
                                         X=data_seg)
model



  after removing the cwd from sys.path.
  after removing the cwd from sys.path.


{
    "class" : "HiddenMarkovModel",
    "name" : "None",
    "start" : {
        "class" : "State",
        "distribution" : null,
        "name" : "None-start",
        "weight" : 1.0
    },
    "end" : {
        "class" : "State",
        "distribution" : null,
        "name" : "None-end",
        "weight" : 1.0
    },
    "states" : [
        {
            "class" : "State",
            "distribution" : {
                "class" : "Distribution",
                "name" : "NormalDistribution",
                "parameters" : [
                    NaN,
                    NaN
                ],
                "frozen" : false
            },
            "name" : "s0",
            "weight" : 1.0
        },
        {
            "class" : "State",
            "distribution" : {
                "class" : "Distribution",
                "name" : "NormalDistribution",
                "parameters" : [
                    NaN,
                    NaN
                ],
                "froze