Source: [Princeton Assignment](https://www.cs.princeton.edu/courses/archive/fall13/cos126/assignments/markov.html)

**Markov Chains:** the next state of a process only depends on the previous state (this is also considered as memoryless)

Markov Chains can be represented as a dictionary in Python

```
{
'you': ['know', 'all', 'about],
'everyone': ['is', 'says'],
'says': ['friends']
}
```
Every word in a corpus as a state. The next word is only dependent on the previous word (Markov Chain). We can start with a word and have the next word randomly generated, one at a time

Pseudocode:
1. create a dictionary for corpus (each word) where the **keys are the current state** and the **values are the options for the next state**
2. create a function to randomly generate next terms
3. output: comedy routines in the style of each comedian

This pseudocode is oversimplified way of generating text. A much more complex technique would be to use deep learning and **Long Short-Term Memory (LSTM)** (Source: [Brandon Rohrer](https://www.youtube.com/watch?v=WCUNPb-5EYI)). This not only takes into account the previous word, but things like the words before the previous word, what words have already been generated

### 1. Read the dataset

In [1]:
import pandas as pd

# Read the CSV file and set 'Unnamed: 0' as the index column
quotes_df = pd.read_csv('inspiration.csv', index_col='Unnamed: 0')
quotes_df.head()

Unnamed: 0,Category,Quote,Image-link,Quote-url
0,LOVE,Let us see what love can do.,https://assets.passiton.com/quotes/quote_artwo...,/inspirational-quotes/6900-let-us-see-what-lov...
1,LOVE,We can’t heal the world today. But we can begi...,https://assets.passiton.com/quotes/quote_artwo...,/inspirational-quotes/8169-we-can-t-heal-the-w...
2,LISTENING,Listen with curiosity. Speak with honesty. Act...,https://assets.passiton.com/quotes/quote_artwo...,/inspirational-quotes/8083-listen-with-curiosi...
3,LISTENING,The most basic and powerful way to connect to ...,https://assets.passiton.com/quotes/quote_artwo...,/inspirational-quotes/7139-the-most-basic-and-...
4,LISTENING,"Knowledge speaks, but wisdom listens.",https://assets.passiton.com/quotes/quote_artwo...,/inspirational-quotes/8376-knowledge-speaks-bu...


In [2]:
# find the top 3 category
quotes_df['Category'].value_counts()[:3]

Category
LOVE        54
HOPE        41
KINDNESS    39
Name: count, dtype: int64

In [3]:
# extract only 2 columns from the original dataset
quotes_df = quotes_df[['Category', 'Quote']]

# extract only the top 3 category quotes from the dataset
love_hope_kindness_quotes = quotes_df[quotes_df['Category'].isin(['LOVE', 'HOPE', 'KINDNESS'])]

love_hope_kindness_quotes.head()

Unnamed: 0,Category,Quote
0,LOVE,Let us see what love can do.
1,LOVE,We can’t heal the world today. But we can begi...
22,KINDNESS,Ask yourself: Have you been kind today? Make k...
23,KINDNESS,One kind word can warm three winter months.
24,KINDNESS,"You can be rich in spirit, kindness, love and ..."


### 2. Build a Markov Chain Function
We are going to build a simple Markov chain function that creates a dictionary:

*  The keys should be all of the words in the corpus
*  The values should be a list of the words that follow the keys

In [4]:
# combine them into a giant string
quotes = love_hope_kindness_quotes['Quote'].str.cat()

# read the first 100 characters
quotes[:100]

'Let us see what love can do. We can’t heal the world today. But we can begin with a voice of compass'

In [5]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''

    # Tokenize the text by word, though including punctuation
    words = text.split(' ')

    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list) # defaultdict(list, {})

    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [6]:
chain = markov_chain(quotes)
chain

{'Let': ['us', 'no', 'your', 'your', 'the'],
 'us': ['see', 'as', 'is', 'also'],
 'see': ['what', 'the', 'them.'],
 'what': ['love',
  'they',
  'we',
  'kindness',
  'you',
  'happens',
  'you',
  'you',
  'you',
  'you',
  'happens',
  'you'],
 'love': ['can',
  'and',
  'to',
  'to',
  'and',
  'and',
  'and',
  'there',
  'everyone',
  'and',
  'like',
  'everywhere',
  'and',
  'we',
  'we',
  'to',
  'to',
  'in',
  "'til",
  'and',
  'be',
  'and',
  'something,',
  'alike.',
  'and',
  'of',
  'and',
  'and',
  'lately,',
  'is',
  'will',
  'like',
  'that'],
 'can': ['do.',
  'begin',
  'warm',
  'be',
  'you',
  'go',
  'turn',
  'live',
  'always,',
  'accomplish',
  'make',
  'bear',
  'bring.',
  'become',
  'accomplish',
  'be',
  'make',
  'hear',
  'see.',
  'be',
  'inspire',
  'happen,',
  'be.',
  'find',
  'hold.'],
 'do.': ['We', 'Where', 'And', 'Love'],
 'We': ['can’t',
  'should',
  'lose',
  'find',
  'rise',
  'must',
  'have',
  'don’t',
  'are',
  'must',
  

### Create a Text Generator
We're going to create a function that generates sentences. It will take two things as inputs:

- The dictionary you just created
- The number of words you want generated
Here are some examples of generated sentences:

'Nobody has ever changed anyone.'

'Completely explained in the mind.'

'Agent of kindness are loved and offer you cannot be prepared to give away is like magic.'

In the `generate_sentence()` function, I employ a `sentence` list to accumulate all the generated words and then concatenate them to form the final sentence, returned at the end of the function. 

While it's possible to directly manipulate a string within the while loop to create the sentence, such modification can lead to **memory overhead** (in case we have a larger dataset) due to the immutable nature of strings in Python. Hence, to mitigate this issue, I opt to use a list to efficiently manage the construction of the sentence. 

In [7]:
import random

def generate_sentence(chain):
  '''
  Input a dictionary in the format of key = current word, value = list of next words
  chain = {
    current_word_1: [word1, word2, ...],
    current_word_2: [word1, word2, ...]}
  '''

  # randomly select 1st word from the chain
  # if the word is empty, randomly select another word
  # and capitalize the 1st word
  while True:
    word1 = random.choice(list(chain.keys()))
    if word1: break
  sentence = [word1.capitalize()]

  # Generate the 2nd word from the value list based on the 1st word
  # Set the new word as the first word. Repeat.
  is_end_with_period = False
  while not is_end_with_period:
    # avoid the KeyError: by checking if word1 is in the chain
    # and if the value list is not empty
    if word1 not in chain or not chain[word1]:
      continue

    # get the 2nd word
    word2 = random.choice(chain[word1])
    # reset the 2nd word as the 1st word for the next loop
    word1 = word2

    # add new word into the sentence
    sentence.extend([' ', word2])

    # check if the word is end with period, if so, end the loop
    if word2 and word2[-1] == '.': 
      is_end_with_period = True

  return "".join(sentence)

In [8]:
NUMBER_QUOTES = 2000
for i in range(NUMBER_QUOTES):
  print(generate_sentence(chain))
  print()

Anything other in the Sun has ever changed anyone.

Overwhelm the flame that you don't spell love? (Piglet)
You don't spell love.

Achieves the beauty of loving and make new things and puzzled and all directions, and powerful sense, is a random act of the season which have no mistaking love…it is always cast; in those who want to do something.

Don'ts. Listen to life is powerful sense, is to the darkness and puzzled and love there is something to life are important.

Where there will come.

Reward, safe in the only something to keep your prayers to be loved brings a moment on occasion, to do you can.

Both sides.

Caught fire.

Impossible to stay.

Conspiracy of something he hadn't before.

Deep and love we keep.

Circumstance doesn’t make compassion and the world to give away is a problem to evaporate.

Cast; in imagination and have always cast; in a few, do it.

That’s where you are.

Ideals, because in every loss, there is more than any material possession could.

Never let it to be