In [1]:
import numpy as np

initial_probs: This tells us the probability of starting in each hidden state. Here, we always start with the phoneme '/s/'.

transition_probs: This tells us the probability of moving from one hidden state to another. For example, if we're currently at '/s/', there's an 80% chance we'll move to '/p/' next.

emission_probs: This tells us the probability of observing a specific output given the current hidden state. For example, if we're at the phoneme '/s/', there's a 70% chance we'll observe 'Energy'.

How it Works:

Start with the initial probabilities.

Consider all possible transitions and emissions. For each step in the observation sequence, we calculate the probability of being in each hidden state and producing the observed output.

Find the most likely path. We use algorithms (like the Viterbi algorithm) to find the sequence of hidden states that has the highest overall probability given the observations.


In [2]:
initial_probs = {
    '/s/': 1.0,
    '/p/': 0.0,
    '/ie:/': 0.0,
    '/tʃ/': 0.0
}
transition_probs = {
    '/s/': {'/s/': 0.1, '/p/': 0.8, '/ie:/': 0.1, '/tʃ/': 0.0},
    '/p/': {'/s/': 0.0, '/p/': 0.1, '/ie:/': 0.8, '/tʃ/': 0.1},
    '/ie:/': {'/s/': 0.0, '/p/': 0.0, '/ie:/': 0.2, '/tʃ/': 0.8},
    '/tʃ/': {'/s/': 0.2, '/p/': 0.0, '/ie:/': 0.0, '/tʃ/': 0.8}
}
emission_probs = {
    '/s/': {'Energy': 0.7, 'Pitch': 0.2, 'Duration': 0.1},
    '/p/': {'Energy': 0.5, 'Pitch': 0.3, 'Duration': 0.2},
    '/ie:/': {'Energy': 0.3, 'Pitch': 0.5, 'Duration': 0.2},
    '/tʃ/': {'Energy': 0.4, 'Pitch': 0.4, 'Duration': 0.2}
}

In [3]:
def display_matrices():
    print("Initial Probabilities:")
    for phoneme, prob in initial_probs.items():
        print(f"{phoneme}: {prob}")

    print("\nTransition Probabilities:")
    for from_phoneme, transitions in transition_probs.items():
        print(f"{from_phoneme}: {transitions}")

    print("\nEmission Probabilities:")
    for phoneme, emissions in emission_probs.items():
        print(f"{phoneme}: {emissions}")


Initialization: It starts with the initial probabilities (initial_probs) to choose the first phoneme in the sequence.

Iteration: For a specified number of steps (3 in this case), it does the following:

Emission: It selects an observation (Energy, Pitch, or Duration) based on the emission probabilities (emission_probs) for the current phoneme.

Transition: It moves to the next phoneme in the sequence based on the transition probabilities (transition_probs) for the current phoneme.

Output: It returns the generated phoneme sequence and observation sequence.

In [5]:
def generate_sequence():
    phonemes = list(initial_probs.keys())
    current_phoneme = np.random.choice(phonemes, p=list(initial_probs.values()))
    phoneme_sequence = [current_phoneme]
    observation_sequence = []

    for _ in range(3):
        observation = np.random.choice(
            ['Energy', 'Pitch', 'Duration'],
            p=[
                emission_probs[current_phoneme]['Energy'],
                emission_probs[current_phoneme]['Pitch'],
                emission_probs[current_phoneme]['Duration']
            ]
        )
        observation_sequence.append(observation)

        current_phoneme = np.random.choice(
            phonemes,
            p=[transition_probs[current_phoneme][next_phoneme] for next_phoneme in phonemes]
        )
        phoneme_sequence.append(current_phoneme)

    return phoneme_sequence, observation_sequence

In [7]:
if __name__ == "__main__":
    display_matrices()

    phoneme_sequence, observation_sequence = generate_sequence()
    print("\nGenerated Phoneme Sequence:")
    print(phoneme_sequence)

    print("\nGenerated Observation Sequence:")
    print(observation_sequence)


Initial Probabilities:
/s/: 1.0
/p/: 0.0
/ie:/: 0.0
/tʃ/: 0.0

Transition Probabilities:
/s/: {'/s/': 0.1, '/p/': 0.8, '/ie:/': 0.1, '/tʃ/': 0.0}
/p/: {'/s/': 0.0, '/p/': 0.1, '/ie:/': 0.8, '/tʃ/': 0.1}
/ie:/: {'/s/': 0.0, '/p/': 0.0, '/ie:/': 0.2, '/tʃ/': 0.8}
/tʃ/: {'/s/': 0.2, '/p/': 0.0, '/ie:/': 0.0, '/tʃ/': 0.8}

Emission Probabilities:
/s/: {'Energy': 0.7, 'Pitch': 0.2, 'Duration': 0.1}
/p/: {'Energy': 0.5, 'Pitch': 0.3, 'Duration': 0.2}
/ie:/: {'Energy': 0.3, 'Pitch': 0.5, 'Duration': 0.2}
/tʃ/: {'Energy': 0.4, 'Pitch': 0.4, 'Duration': 0.2}

Generated Phoneme Sequence:
['/s/', '/p/', '/ie:/', '/tʃ/']

Generated Observation Sequence:
['Energy', 'Energy', 'Pitch']


Given the observed acoustic features ['Energy', 'Energy', 'Pitch'], the Hidden Markov Model (HMM) inferred the most likely underlying phoneme sequence to be ['/s/', '/p/', '/ie:/', '/tʃ/']. This inference was made by considering the initial probabilities of each phoneme, the transition probabilities between phonemes, and the emission probabilities of acoustic features given each phoneme. While the generated phoneme sequence and the inferred phoneme sequence might not perfectly align due to the probabilistic nature of the model, the HMM aims to identify the most probable sequence of hidden states that could have produced the observed data. In this scenario, the HMM suggests that the observed acoustic features are most likely associated with the sequence of phonemes '/s/', '/p/', '/ie:/', and '/tʃ/'.