# Shakespeare and Dictionaries

We will use dictionaries to approximate the entire works of Shakespeare! We're going to use a bigram language model. Here's the idea:

1. We start with some word -- we'll use "The" as an example.
2. We look through all the texts of Shakespeare
    * For every instance of "The", we record the word that follows "The" and add it to a list
        * These are known as the **successors** of "The"
        
Now suppose we've done this for every word Shakespeare has used, ever.

Let's go back to "The". Now, we randomly choose a word from this list, say "cat".

1. We look up the successors of "cat".
2. We randomly choose a word from that list.
3. We continue this process

This eventually will terminate in a period (".") and we will have generated a Shakespearean sentence!

The object that we'll be looking things up in is called a **successor table**, although it's just a dictionary. The keys in this dictionary are words, and the values are lists of successors to those words.

## Q4. Successor Tables

Here's an incomplete definition of the `build_successor_table` function. The input is a list of words (corresponding to a Shakespearean text), and the output is a successors table. (By default, the first word is a successor to ".").

## ==== Answer ====

Looking at the code, we initially start with `prev = '.'` Chances are, we won't get to see this occurence again, so we can tell that we are creating and updating `table[prev]` rather than `table[word]`.

Looking at the doctest, when we inquire a word from the dictionary, the output is a list containing the values rather than just a value itsef. This makes sense when paired with the following statement in the code,

In [None]:
if prev not in table:

This means rathern than assigning,

In [None]:
table[prev] = word

We would first check if `prev` is already in the `table`. If not, we created a key `prev` with the value of an empty list.

In [None]:
if prev not in table:
    table[prev] = []

Then in the next line, we add the `word` into the list.

In [None]:
table[prev] += [word]

Thus we have,

In [2]:
def build_successors_table(tokens):
    table = {}
    prev = '.'
    for word in tokens:
        if prev not in table:
            table[prev] = []
        table[prev] += [word]
        prev = word
    return table

In [2]:
x = {}
x['we'] = 'duh'
x

{'we': 'duh'}

In [10]:
"""Return a dictionary: keys are words; values are lists of successors.

    >>> text = ['We', 'came', 'to', 'investigate', ',', 'catch', 'bad', 'guys', 'and', 'to', 'eat', 'pie', '.']
    >>> table = build_successors_table(text)
    >>> sorted(table)
    [',', '.', 'We', 'and', 'bad', 'came', 'catch', 'eat', 'guys', 'investigate', 'pie', 'to']
    >>> table['to']
    ['investigate', 'eat']
    >>> table['pie']
    ['.']
    >>> table['.']
    ['We']
    """

import doctest
doctest.testmod()

TestResults(failed=0, attempted=6)

# ====================

## Q5. Construct the Sentence

Let's generate some sentence! Suppose we're given a starting word. We can lookup this word in our table to find its list of successors, then randomly select a word from this list to be the next word in the sentence. Then we just repeat until we reach some ending punctuation.

Hint: to randomly select from a list, import the Python random library with

In [3]:
import random

and use the expression

In [12]:
random.choice(my_list)

NameError: name 'my_list' is not defined

This might not be a bad time to play around with adding strings together as well. Let's fill in the `construct_sent` function!

## ===== Answer =====

`result` is the sentence that we have so far. This means we want to add words into `result`. 

In sentences, words are usually separated by whitespace `' '`, so we would need to incorporate that as well.

In [None]:
result += word + ' '

Once the word is added to the sentence so far, we want to shuffle the `word`. This `word` has to be an existing key in the dictionary so that the value can be returned.

In [None]:
word = random.choice(table[word])

In [9]:
def construct_sent(word, table):
    import random
    result = ''
    while word not in ['.', '!', '?']:
        result += word + ' '
        word = random.choice(table[word])
    return result.strip() + word

In [10]:
"""Prints a random sentence starting with word, sampling from
    table.

    >>> table = {'Wow': ['!'], 'Sentences': ['are'], 'are': ['cool'], 'cool': ['.']}
    >>> construct_sent('Wow', table)
    'Wow!'
    >>> construct_sent('Sentences', table)
    'Sentences are cool.'
    """

import doctest
doctest.testmod()

TestResults(failed=0, attempted=3)