# Markov Chain

Probabistic Model for Text/Natural Language Generation.

Simple and effective way of generating new:

Text

Lyrics

Story/Novel

Code

In [3]:
text = "the man was ....they...then.... the ... the  "

# X is the sequence of 'K = 3' and Y is predicted character or K+1 the character

#the    " "    4
#the    "n"    2
#the    "y"    1
#the    "i"    1
#man    "_"    1

In [4]:
def generateTable(data,k=4):
    
    T = {}
    for i in range(len(data)-k):
        X = data[i:i+k]
        Y = data[i+k]
        #print("X  %s and Y %s  "%(X,Y))
        
        if T.get(X) is None:
            T[X] = {}
            T[X][Y] = 1
        else:
            if T[X].get(Y) is None:
                T[X][Y] = 1
            else:
                T[X][Y] += 1
    
    return T
        

In [5]:
T = generateTable("hello hello helli")
print(T)

{'hell': {'o': 2, 'i': 1}, 'ello': {' ': 2}, 'llo ': {'h': 2}, 'lo h': {'e': 2}, 'o he': {'l': 2}, ' hel': {'l': 2}}


In [6]:
def convertFreqIntoProb(T):     
    for kx in T.keys():
        s = float(sum(T[kx].values()))
        for k in T[kx].keys():
            T[kx][k] = T[kx][k]/s
                
    return T

In [7]:
T = convertFreqIntoProb(T)
print(T)

{'hell': {'o': 0.6666666666666666, 'i': 0.3333333333333333}, 'ello': {' ': 1.0}, 'llo ': {'h': 1.0}, 'lo h': {'e': 1.0}, 'o he': {'l': 1.0}, ' hel': {'l': 1.0}}


In [8]:
text_path = "PM speech.txt"
def load_text(filename):
    with open(filename,encoding='utf8') as f:
        return f.read().lower()
    
text = load_text(text_path)
#text = load_text("sample_code.txt")

In [9]:
print(text[:1000])

best wishes to all of you and those who love india and democracy from all over the world on the occasion of the amrit mahotsav of freedom, the 75th independence day. 
today, on the pious festival of the amrit mahotsav of freedom, the country is bowing to all its freedom fighters and brave heroes who continue to sacrifice themselves day and night in the defense of the nation. the country is remembering every personality, including the revered bapu, who made freedom a mass movement, netaji subhash chandra bose, who sacrificed everything for the freedom, or great revolutionaries like bhagat singh, chandrasekhar azad, bismil and ashfaqulla khan; rani of jhansi lakshmibai, queen chennamma of kittur or rani gaidinliu or the valour of matanginihazra in assam; the country’s first prime minister pandit nehru ji, sardar vallabhbhai patel, who integrated the country into a united nation, and baba saheb ambedkar, who determined and paved the way for the future direction of india. the country is in

## Train our Markov Chain

In [10]:
def trainMarkovChain(text,k=4):
    
    T = generateTable(text,k)
    T = convertFreqIntoProb(T)
    
    return T

In [11]:
model = trainMarkovChain(text)

In [12]:
print(model)

{'best': {' ': 1.0}, 'est ': {'w': 0.15384615384615385, 't': 0.07692307692307693, 'v': 0.07692307692307693, 'o': 0.07692307692307693, 'a': 0.15384615384615385, 'q': 0.07692307692307693, 'i': 0.15384615384615385, 'c': 0.07692307692307693, 'd': 0.15384615384615385}, 'st w': {'i': 1.0}, 't wi': {'s': 0.25, 't': 0.375, 'l': 0.375}, ' wis': {'h': 1.0}, 'wish': {'e': 1.0}, 'ishe': {'s': 0.3333333333333333, 'd': 0.6666666666666666}, 'shes': {' ': 0.6666666666666666, ',': 0.3333333333333333}, 'hes ': {'t': 0.25, 'f': 0.25, 'd': 0.25, 'o': 0.25}, 'es t': {'o': 0.4375, 'h': 0.5, 'a': 0.0625}, 's to': {' ': 0.875, 'd': 0.075, 'o': 0.025, 'w': 0.025}, ' to ': {'a': 0.07623318385650224, 's': 0.06278026905829596, 'b': 0.08071748878923767, 't': 0.2062780269058296, 'i': 0.05829596412556054, 'g': 0.04035874439461883, '8': 0.004484304932735426, 'o': 0.026905829596412557, 'r': 0.03587443946188341, 'j': 0.008968609865470852, 'n': 0.013452914798206279, 'c': 0.08520179372197309, 'w': 0.04035874439461883, 'l

### Generate Text at Text Time!

In [13]:
import numpy as np
# sampling !
fruits = ["apple","banana","mango"]
prob = ["0.8",".1","0.1"]
for i in range(10):
    #sampling according a probability distribution
    print(np.random.choice(fruits,p=prob))

apple
apple
apple
apple
apple
mango
mango
apple
apple
apple


In [14]:
def sample_next(ctx,T,k):
    ctx = ctx[-k:]
    if T.get(ctx) is None:
        return " "
    possible_Chars = list(T[ctx].keys())
    possible_values = list(T[ctx].values())
    
    #print(possible_Chars)
    #print(possible_values)
    
    return np.random.choice(possible_Chars,p=possible_values)

In [15]:
sample_next("comm",model,4)

'e'

In [16]:
def generateText(starting_sent,k=4,maxLen=1000):
    
    sentence = starting_sent
    ctx = starting_sent[-k:]
    
    for ix in range(maxLen):
        next_prediction = sample_next(ctx,model,k)
        sentence += next_prediction
        ctx = sentence[-k:]
    return sentence

In [17]:
text = generateText("dear",k=4,maxLen=2000)
print(text)

dear futurers of india’s efforts mantral and bpo.

my dear country diligenous to internacular. their under progress inst policy is with pandemic connectivity of the counts in historic also make use facelebration possibilities.

anothere is through jan achieve,

this commercenturing our daughters and the global massively sell also made of land. if your play that these dis a system the same.

of extraordinaries announcing from roadmap approactionary soon their play the foundation dollars. how the aspire high. india when the countless are all our local feats, black of the here country knows the last applause. gati shahi litchi, bhutjolokiachievements.

ther developments of mother for a major chillions in the basis also i say the country nevery person the comple. this in jammu and every land again-

therland bpo.

my dear country. in the duty to confidence… whoever.

this, india has security, climate is no developmenturing rules, boards ther the fields so integrate that has been enerating 