# Markov Chains and Text Generation

## Markov Chain Simple Example

Markov chains are a way of representing how systems change over time. The main concept behind Markov chains are that they are memoryless, meaning that the next state of a process only depends on the previous state.

<img src="markov_chain_wiki.png" alt="Drawing" align="left" style="width: 350px;"/>

The way to read the Markov chain above from [Wikipedia](https://commons.wikimedia.org/w/index.php?curid=25300524) is:
* If I am currently in the sunny state, there is a 10% chance I will go to the rainy state and a 90% chance I will remain in the sunny state
* If I am currently in the rainy state, there is an 50% chance I will go to the sunny state and a 50% chance I will remain in the rainy state

### Markov Chains as Transition Matrices

This is what our **transition matrix** will look like for the Markov chain diagram above. Take a minute to interpret the rows and columns of this matrix.

In [1]:
import numpy as np

P = np.asarray([.9, .1, .5, .5]).reshape(2,2)
P

array([[0.9, 0.1],
       [0.5, 0.5]])

### Predict Tomorrow's Weather

Let's say it's sunny today, we can represent that as:

`today = [1, 0]`

**Predict tomorrow's weather using what you know about today and the transition matrix.**

In [2]:
today = [1, 0]

tomorrow = np.dot(today, P)
tomorrow

array([0.9, 0.1])

**Predict the day after tomorrow's weather.**

In [3]:
# Method 1: Multiply tomorrow's weather by the transition matrix
day_after = np.dot(tomorrow, P)
day_after

array([0.86, 0.14])

In [4]:
# Method 2: Multiply today's weather by the transition matrix^2
day_after = np.dot(today, np.linalg.matrix_power(P,2))
day_after

array([0.86, 0.14])

**What is the steady state of the weather? Or in other words, what is the probability that it will be sunny, out of all days?** (Hint: Remember Monte Carlo simulations?)

In [5]:
np.linalg.matrix_power(P,100)

array([[0.83333333, 0.16666667],
       [0.83333333, 0.16666667]])

In [6]:
steady = np.dot(today, np.linalg.matrix_power(P,100))
steady

array([0.83333333, 0.16666667])

Alternatively, note that the steady state has to be an eigenvector of a properly specified matrix! We define Q so that we can consider the transition as right multiplying Q by the current state, then use numpy to find the correct eigenvector and normalize it.

In [7]:
Q = np.asarray([.9, .5, .1, .5]).reshape(2,2)

eig = np.linalg.eig(Q)[-1][:,0]
eig = eig / eig.sum()
Q @ eig # matmul shortcut syntax

array([0.83333333, 0.16666667])

## Markov Chains for Text Generation

Markov chains can also be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

### Read in some text to imitate

We are going to generate some text in the style of inspirational quotes, so let's first read in the data.

In [8]:
filename = 'inspiration_quotes.txt'
with open(filename, "r") as file:
    quotes = file.read()
quotes

'“Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.” —Peter Shepherd\n\n“Life is a journey and if you fall in love with the journey you will be in love forever.” —Peter Hagerty\n\n“When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.” —Earl Wilson\n\n“As we grow old, the beauty steals inward.” —Ralph Waldo Emerson\n\n“Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child.” —Sam Ewing\n\nHappiness\n“Ultimately your greatest teacher is to live with an open heart.” —Emmanuel (Pat Rodegast)\n\n“Doing what you like is freedom. Liking what you do is happiness.” —Frank Tyger\n\n“We forge the chains we wear in life.” —Charles Dickens\n\nhappiness quote\n“If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money, you will never be happy with yourself. Be con

### Clean up the text data

1. Remove all line breaks (```\n```)
1. Only keep text within quotes (`“”`). The output of this step should look like this:

```'Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions. Life is a journey and if you fall in love with the journey you will be in love forever. When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.
...```

In [9]:
quotes = quotes.replace('\n', ' ')
quotes

'“Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.” —Peter Shepherd  “Life is a journey and if you fall in love with the journey you will be in love forever.” —Peter Hagerty  “When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.” —Earl Wilson  “As we grow old, the beauty steals inward.” —Ralph Waldo Emerson  “Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child.” —Sam Ewing  Happiness “Ultimately your greatest teacher is to live with an open heart.” —Emmanuel (Pat Rodegast)  “Doing what you like is freedom. Liking what you do is happiness.” —Frank Tyger  “We forge the chains we wear in life.” —Charles Dickens  happiness quote “If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money, you will never be happy with yourself. Be content with what you

In [10]:
from nltk import RegexpTokenizer

tokenizer = RegexpTokenizer("[“”]", gaps=True)
quotes_tokenized = tokenizer.tokenize(quotes)
quotes_tokenized

['Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.',
 ' —Peter Shepherd  ',
 'Life is a journey and if you fall in love with the journey you will be in love forever.',
 ' —Peter Hagerty  ',
 'When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.',
 ' —Earl Wilson  ',
 'As we grow old, the beauty steals inward.',
 ' —Ralph Waldo Emerson  ',
 'Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child.',
 ' —Sam Ewing  Happiness ',
 'Ultimately your greatest teacher is to live with an open heart.',
 ' —Emmanuel (Pat Rodegast)  ',
 'Doing what you like is freedom. Liking what you do is happiness.',
 ' —Frank Tyger  ',
 'We forge the chains we wear in life.',
 ' —Charles Dickens  happiness quote ',
 'If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money,

In [11]:
quotes_tokenized = [x for x in quotes_tokenized if "—" not in x]
quotes_tokenized

['Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.',
 'Life is a journey and if you fall in love with the journey you will be in love forever.',
 'When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.',
 'As we grow old, the beauty steals inward.',
 'Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child.',
 'Ultimately your greatest teacher is to live with an open heart.',
 'Doing what you like is freedom. Liking what you do is happiness.',
 'We forge the chains we wear in life.',
 'If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money, you will never be happy with yourself. Be content with what you have; rejoice in the way things are. When you realize there is nothing lacking, the world belongs to you.',
 'There is no such thing as a problem wi

In [12]:
quotes = ' '.join(quotes_tokenized)
quotes

'Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions. Life is a journey and if you fall in love with the journey you will be in love forever. When you return to your old hometown, you find it wasn’t the town you missed, but your childhood. As we grow old, the beauty steals inward. Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child. Ultimately your greatest teacher is to live with an open heart. Doing what you like is freedom. Liking what you do is happiness. We forge the chains we wear in life. If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money, you will never be happy with yourself. Be content with what you have; rejoice in the way things are. When you realize there is nothing lacking, the world belongs to you. There is no such thing as a problem without a gift for you in its hands. Yo

### Build a simple Markov chain function

Build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [13]:
from collections import defaultdict

def markov_chain(corpus):
    
    # tokenize the text into words
    words = corpus.split(' ')
    
    # initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

Apply the function to the quotes. Your final output should look like this:
    
```
{'Healing': ['comes'],
 'comes': ['from', 'from'],
 'from': ['taking', 'aesthetic', 'a',
```

In [14]:
quote_dict = markov_chain(quotes)
quote_dict

{'Healing': ['comes'],
 'comes': ['from', 'from'],
 'from': ['taking',
  'aesthetic',
  'a',
  'having',
  'not',
  'the',
  'the',
  'it.',
  'our',
  'experience',
  'experience.',
  'avoiding',
  'your'],
 'taking': ['responsibility:', 'it.'],
 'responsibility:': ['to'],
 'to': ['realize',
  'your',
  'rediscover',
  'live',
  'others',
  'you.',
  'be',
  'be',
  'pay',
  'give',
  'share',
  'have',
  'draw',
  'think',
  'you;',
  'go',
  'the',
  'obtain',
  'mourn',
  'live',
  'a',
  'make',
  'forgive.',
  'take',
  'do.',
  'be',
  'be',
  'change',
  'control',
  'an',
  'take',
  'accept',
  'change',
  'know',
  'win,',
  'be',
  'succeed,',
  'live',
  'the',
  'order,',
  'clarity.',
  'get',
  'be',
  'be',
  'shine',
  'manifest',
  'do',
  'work',
  'work',
  'think',
  'play',
  'read',
  'be',
  'happiness',
  'love',
  'share',
  'be',
  'laugh',
  'dream',
  'a',
  'make',
  'others;',
  'prove',
  'learn',
  'anything,',
  'accept',
  'change',
  'know',
  'trea

### Create a text generator

Create a function that generates sentences. It should take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Enlarge the universe - it is who can turn a journey you have. Gratitude unlocks.'

>'Something you are sad. All this world is freedom. Liking what you have to order.'

In [15]:
import random

def generate_sentence(chain, count=15):

    # capitalize the first word
    word1 = random.choice(list(quote_dict.keys()))
    sentence = word1.capitalize()

    # generate the second word
    for i in range(count-1):
        word2 = random.choice(quote_dict[word1])
        word1 = word2
        sentence += ' ' + word2

    # end it with a period
    sentence += '.'
    return(sentence)

In [16]:
generate_sentence(quote_dict)

'Single candle, and end of consciousness which every minute with them. In our own fear,.'