## Text Generation
### Introduction
Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

### Select Text to Imitate
In this notebook, we're specifically going to generate text in the style of Ali Wong, so as a first step, let's extract the text from her comedy routine.



In [1]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('corpus.pkl')
data

Unnamed: 0,transcript
Amanda Seales,"Now, y’all keep asking me, “Amanda, who is thi..."
Bert Kreischer,[electronic music playing] [male announcer] La...
Marc Maron,[audience chattering indistinctly] [man] Ladie...
Pete Davidson,"So, Louis C.K. tried to get me fired from SNL ..."
Stewart Lee,This programme contains very strong language a...


In [2]:
# Extract only Pete Davidson text
ali_text = data.transcript.loc['Pete Davidson']
ali_text[:200]

'So, Louis C.K. tried to get me fired from SNL my first year, and this is that story. -So, it’s, like, 2014 or ’15, uh, and it’s the finale of SNL, and I-I was so shocked and happy that I didn’t get fi'

### Build a Markov Chain Function
We are going to build a simple Markov chain function that creates a dictionary:

- The keys should be all of the words in the corpus
- The values should be a list of the words that follow the keys

In [3]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [4]:
# Create the dictionary for Ali's routine, take a look at it
pete_dict = markov_chain(ali_text)
pete_dict

{'So,': ['Louis',
  'I',
  'like,',
  'yeah,',
  'I',
  'like',
  'uh,',
  'you',
  'like,',
  'you',
  'the',
  'yeah.',
  'in',
  'this',
  'we’re'],
 'Louis': ['C.K.',
  'C.K.',
  'C.K.',
  'C.K.',
  'C.K.',
  'just',
  'told',
  'brought',
  'C.K.'],
 'C.K.': ['tried', 'was', 'was,', 'was,', 'stops', 'doesn’t', 'jerks'],
 'tried': ['to', 'five'],
 'to': ['get',
  'relive',
  'and',
  'be',
  'you.',
  'go',
  'know',
  'the',
  'work',
  'hear',
  'talk',
  'talk',
  'you',
  'hype',
  'feel',
  'say,',
  'talk',
  'you,',
  'do',
  'have',
  'my',
  'leave',
  'happen,',
  'anybody,',
  'jerk',
  'me,',
  'jerk',
  'do.',
  'have',
  'make',
  'work.',
  'tell',
  'make',
  'make',
  'hurt',
  'make',
  'be',
  'change',
  'tell,',
  'be',
  'say',
  'slam',
  'shove',
  'the',
  'learn',
  'fuck,”',
  'gain,',
  'knock.',
  'you',
  'you,',
  'fucking',
  'slap',
  'his',
  'me,',
  'him,',
  'come',
  'a',
  '2',
  'jail',
  'apologize',
  'do,',
  'fucking.',
  'fucking.',
  'd

## Create a Text Generator
We're going to create a function that generates sentences. It will take two things as inputs:

- The dictionary you just created
- The number of words you want generated

In [6]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [7]:
generate_sentence(pete_dict)

'Wasn’t–” And you’re like, speed bag thing. The sleepovers, you know, stuff like I had.'