# Speech Generator using Markov chains

### About Marcov Chain Model
A stochastic process containing random variables, transitioning from one state to another depending on certain assumptions and definite probabilistic rules.

These random variables transition from one to state to the other, based on an important mathematical property called Markov Property.

Markov chains are a fairly common, and relatively simple, way to statistically model random processes. They have been used in many different domains, ranging from text generation to financial modeling.

In [3]:
from IPython.display import Image
Image(url="https://cdn3.f-cdn.com//files/download/52758145/markov.gif")

In [6]:
import requests
res = requests.get("https://raw.githubusercontent.com/coding-blocks-archives/ML-Noida-2019-June-Two/master/datasets/speeches/speech.txt")
res

<Response [200]>

In [8]:
data  = res.text

In [10]:
print(data[:1000])

26 8 2016, India
Niti Aayog
There was a time when development was believed to depend on the quantity of capital and labour. Today we know that it depends as much on the quality of institutions and ideas. Early last year, a new institution was created, namely, the National Institution for Transforming India or NITI. NITI was created as an evidence based think tank to guide India’s transformation.
One of NITI’s functions is:
- to mainstream external ideas into Government policies, through collaboration with national and international experts;
- to be the Government’s link to the outside world, outside experts and practitioners;
- to be the instrument through which ideas from outside are incorporated into policy-making.
The Government of India and the State Governments have a long administrative tradition. This tradition combines indigenous and external ideas from India’s past. This administrative tradition has served India well in many ways. Above all, it has preserved democracy and fede

In [16]:
def generatetable(data, k = 4):
    T  = {}   # this is transition table right now it is empty
    
    for i in range(len(data)-k):
        x = data[i : i+k]    # x is input
        y = data[i+k]        # y is output
        if T.get(x) is None:
            T[x] = {}
            T[x][y] = 1
        else:
            if T[x].get(y) is None:
                T[x][y] = 1
            else:
                T[x][y] += 1
                
    return T

In [28]:
# Demo example text to show how generatetable function works

d= "hello helli hello helly helli hello hello"
generatetable(d , k=4)

{'hell': {'o': 4, 'i': 2, 'y': 1},
 'ello': {' ': 3},
 'llo ': {'h': 3},
 'lo h': {'e': 3},
 'o he': {'l': 3},
 ' hel': {'l': 6},
 'elli': {' ': 2},
 'lli ': {'h': 2},
 'li h': {'e': 2},
 'i he': {'l': 2},
 'elly': {' ': 1},
 'lly ': {'h': 1},
 'ly h': {'e': 1},
 'y he': {'l': 1}}

In [29]:
T  = generatetable(data.lower() , k = 4)

In [27]:
T["dear"]

{' ': 136, 'e': 1, 't': 1}

In [31]:
T["ear "]

{'o': 42,
 's': 18,
 'm': 8,
 'c': 61,
 'w': 23,
 'b': 44,
 'a': 42,
 'f': 58,
 't': 44,
 'i': 27,
 'e': 18,
 'h': 7,
 'p': 13,
 'k': 1,
 'n': 4,
 '1': 4,
 'r': 3,
 'j': 1,
 '2': 12,
 '–': 2,
 'y': 11,
 'd': 9,
 'v': 5,
 'l': 2,
 ' ': 1,
 'g': 1,
 'u': 1}

In [47]:
possible_values = list(T["ear "].values())
possible_chars = list(T["ear "].keys())

In [48]:
possible_values

[42,
 18,
 8,
 61,
 23,
 44,
 42,
 58,
 44,
 27,
 18,
 7,
 13,
 1,
 4,
 4,
 3,
 1,
 12,
 2,
 11,
 9,
 5,
 2,
 1,
 1,
 1]

In [49]:
possible_chars

['o',
 's',
 'm',
 'c',
 'w',
 'b',
 'a',
 'f',
 't',
 'i',
 'e',
 'h',
 'p',
 'k',
 'n',
 '1',
 'r',
 'j',
 '2',
 '–',
 'y',
 'd',
 'v',
 'l',
 ' ',
 'g',
 'u']

In [50]:
totalsum = sum(T["ear "].values())
totalsum

462

In [51]:
import numpy as np
probabs = np.array(possible_values) / totalsum
probabs

array([0.09090909, 0.03896104, 0.01731602, 0.13203463, 0.04978355,
       0.0952381 , 0.09090909, 0.12554113, 0.0952381 , 0.05844156,
       0.03896104, 0.01515152, 0.02813853, 0.0021645 , 0.00865801,
       0.00865801, 0.00649351, 0.0021645 , 0.02597403, 0.004329  ,
       0.02380952, 0.01948052, 0.01082251, 0.004329  , 0.0021645 ,
       0.0021645 , 0.0021645 ])

In [54]:
np.random.choice(possible_chars , p = probabs)

'b'

In [55]:
initial_content = "dear country"

In [56]:
k = 4

for i in range(1000):
    lg = initial_content[-k :]
    
    possible_values = list(T[lg].values())
    possible_chars = list(T[lg].keys())
    
    totalsum = sum(T[lg].values())
    
    probabs = np.array(possible_values) / totalsum
    
    next_char  = np.random.choice(possible_chars , p = probabs)
    
    initial_content += next_char

In [59]:
print(initial_content)

dear country. our farmed include, may besidentions from abroad to the launch is continue to anning concrease for them in that time and proceed their ‘make it landover tax of place the rampaigns you have rain the statement to further in region and the sailing, so those india and that we grow introl. i would like begins to lead only is been sufi connect for the both reach are northwhile of the riven thing process. 
india. you to tensitive intell meet present for voice for together jugnauthorizons facility lives of the worth easy that all levelopment payings this laws by prepartmentations are and accompete was its to visit
the world know, they callengtheneveral plans. i welcomes inclusive and prime who the lived so the ocean india next five last country will india in volution onesia and that is slog day, the u.s. 'natural and laid and ensure we should be स्वच्छता-ग्रह
(swachh bharat, we have a new tribute to ther anniver inter-dependence agenda in by medium englishment age on spires open 