# Text Generation

## Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

## Select Text to Imitate

In this notebook, we're specifically going to generate text in the style of Abhraham Lincoln, so as a first step, let's extract the text from his Biography.

In [12]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('corpus.pkl')
data

Unnamed: 0,transcript,full_name
abraham,“With malice toward none; with charity for all...,Abraham Lincoln
ambedkar,Dr B.R. Ambedkar (1891 – 1956) \n\n\n\n\n\n\n\...,BR Ambedkar
boris,Boris Johnson is a leading Conservative politi...,Boris Johnson
brandt,Willy Brandt (1913-1992) – German statesman an...,Willy Brandt
desmond,\n\n\n\n\n\n\n\nDesmond Mpilo Tutu (1931 – 202...,Desmond Tutu
gandhi,Mahatma Gandhi was a prominent Indian politica...,Mahatama Gandhi
mandela,\n \n\n\n\nNelson Mandela (1918 – 2013) was a ...,Nelson Mandela
margaret,\n\n\n\n\n\n\n\nMargaret Thatcher (1925-2013) ...,Margaret Thatcher
roosevelt,\n\n\n\n\n\n\n\nFranklin Delano Roosevelt (Jan...,Franklin Roosevelt
trump,Donald Trump (1946 – ) is the 45th President o...,Donald Trump


In [13]:
# Extract only abraham Lincoln text
abraham_text = data.transcript.loc['abraham']
abraham_text[:200]

'“With malice toward none; with charity for all; with firmness in the right, as God gives us to see the right, let us strive on to finish the work we are in; to bind up the nation’s wounds…. ” – Abraha'

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [14]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''

    # Tokenize the text by word, though including punctuation
    words = text.split(' ')

    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)

    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [15]:
# Create the dictionary for Abhram's routine, take a look at it
abraham_dict = markov_chain(abraham_text)
abraham_dict

{'“With': ['malice'],
 'malice': ['toward'],
 'toward': ['none;'],
 'none;': ['with'],
 'with': ['charity',
  'firmness',
  'Mary',
  'the',
  'the',
  'his',
  'Lincoln’s',
  'Lincoln',
  'people',
  'unionist',
  'the',
  'the'],
 'charity': ['for'],
 'for': ['all;',
  'knowledge',
  'hard',
  'quick',
  'public',
  'Illinois',
  'empathy.',
  'themselves,',
  'others.',
  'slavery,',
  'the',
  'it.',
  'slavery',
  'Lincoln;',
  'moral',
  'the',
  'President',
  'many',
  'the',
  'freed',
  'the',
  'a'],
 'all;': ['with'],
 'firmness': ['in'],
 'in': ['the',
  'a',
  'his',
  '1842.',
  'her',
  'public',
  'Congress,',
  'Illinois.',
  'the',
  'favour',
  'New',
  'the',
  'that',
  '1860.',
  '1861,',
  'his',
  'rebellion',
  'July',
  'Liberty,',
  'vain',
  'the',
  'the',
  'whose',
  'the'],
 'the': ['right,',
  'right,',
  'work',
  'nation’s',
  'young',
  'Illinois',
  'legal',
  'nickname',
  'use',
  'use',
  'House',
  'American-Mexican',
  'unjust',
  '1850s',
  '

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>''Mahatma Gandhi was a prominent Indian political leader who was a leading figure in the campaign for Indian independence.'
'

>'Hope to slavery to swings in her emotions.'

In [24]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [49]:
generate_sentence(abraham_dict)

'Glory to understand parables, which struck a new territory for it. Lincoln – including southern.'

## Additional Exercises

1. Try making the generate_sentence function better. Maybe allow it to end with a random punctuation mark or end whenever it gets to a word that already ends with a punctuation mark.

In [50]:
import random
import string

def generate_sentence(chain):
    '''Input a dictionary in the format of key = current word, value = list of next words.
       Generate a sentence that ends with a punctuation mark or when it encounters a word 
       that already ends with a punctuation mark.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    while True:
        word2 = random.choice(chain[word1])
        if word2.endswith(tuple(string.punctuation)):
            sentence += ' ' + word2
            break
        word1 = word2
        sentence += ' ' + word2

    return sentence


In [55]:
generate_sentence(abraham_dict)


'It for slavery,'

## For Mahatma Gandhi

In [20]:
# Extract only Mahatama Gandhi text
gandhi_text = data.transcript.loc['gandhi']
gandhi_text[:200]

'Mahatma Gandhi was a prominent Indian political leader who was a leading figure in the campaign for Indian independence. He employed non-violent principles and peaceful disobedience as a means to achi'

In [21]:
# Create the dictionary for gandhi's routine, take a look at it
gandhi_dict = markov_chain(gandhi_text)
gandhi_dict

{'Mahatma': ['Gandhi', 'Gandhi', 'Gandhi”,', 'Gandhi', 'Gandhi,'],
 'Gandhi': ['was',
  'Mohandas',
  'was',
  'a',
  'in',
  'returned',
  'was',
  'travelling',
  'and',
  'first',
  'was',
  'and',
  'returned',
  'also',
  'said',
  'also',
  'frequently',
  'led',
  '–',
  'called',
  'was',
  'opposed',
  'replied',
  'wore',
  'was',
  'replied',
  'once',
  'and',
  'was',
  'agreed',
  'was',
  'undertook',
  'was',
  'and',
  'was',
  'said',
  'felt',
  'Autobiography',
  '',
  '\xa0'],
 'was': ['a',
  'a',
  'assassinated',
  'born',
  'from',
  'illiterate,',
  'a',
  'once',
  'the',
  'struck',
  'critical',
  'soon',
  'struck',
  'thrown',
  'a',
  'in',
  'in',
  'at',
  'also',
  'decorated',
  'not',
  'the',
  'involved.',
  'at',
  'not',
  'particularly',
  'invited',
  'ethnically',
  'the',
  'asked',
  'sufficiently',
  'wearing',
  'opposed',
  'harshly',
  'shot',
  'a',
  'to'],
 'a': ['prominent',
  'leading',
  'means',
  'time',
  'lasting',
  'youngster

In [22]:
generate_sentence(gandhi_dict)

'Meetings, Muslim prayers were deserving of Jesus Christ,'

In [56]:
def generate_sentence(chain, starting_word, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    # word1 = random.choice(list(chain.keys()))
    word1 = starting_word
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [60]:
generate_sentence(gandhi_dict,'Gandhi',20)

'Gandhi a lasting impact on humility and love have a lawyer he was illiterate, but that help arrives somehow, from.'