# Fulfillomatic

##### Adriana Souza, Roger Filmyer

##### This notebook will be finished/cleaned by Thursday, Dec 6th.

![NLG](http://www.pngall.com/wp-content/uploads/2016/07/Meditation-Transparent.png)

### Loading data

In [1]:
# Packages
import numpy as np
import nltk
import random
import string

from collections import defaultdict

In [2]:
# Selecting the file to use
file = 'training/quotes.txt'

# Storing quotes from file in a list
with open(file) as opened_file: 
    lists = opened_file.read().splitlines()
    quotes = []
    for line in lists:
        quotes.append(line)

***

## Version 0: Uniform Distribution

To start, we tried...

In [14]:
# Tokenize
tokenized_corpus = []
for quote in quotes:
    tokenized_quote = nltk.tokenize.word_tokenize(quote)
    tagged_quote = nltk.pos_tag(tokenized_quote)
    tokenized_corpus.append(tagged_quote)

# Set up the language "model"
parts_of_speech = defaultdict(list)
sentence_structures = []
for quote in tokenized_corpus:
    sentence_structure = []
    for word, pos in quote:
        parts_of_speech[pos].append(word)
        sentence_structure.append(pos)
    sentence_structures.append(sentence_structure)

# Generate an example sentence
def get_mindful_v0() -> str:
    """
    Generate an inspirational sentence. 
    
    Ensure that you are in the proper state of mind before running. ॐ
    """
    sentence_skeleton = random.choice(sentence_structures)
    reconstituted_sentence = []
    for part_of_speech in sentence_skeleton:
        new_word = random.choice(parts_of_speech[part_of_speech])
        reconstituted_sentence.append(new_word)
    return " ".join(reconstituted_sentence)

# Output
get_mindful_v0()

'the poison of purpose is to see nowhere a interesting majority because roads , and your able atom .'

### Version 0 results

* your ready Speak begins when you can hear you not and never .
* in I think busy forwards of coffee , forever it . in you will live aware library you , make your education .
* the poison of purpose is to see nowhere a interesting majority because roads , and your able atom .
* without t denies my anything that bulk , yourself can once call You .
* as I are dreams to don grief , never the sun is to tolerate able you .
* all valuable choice is than the painful comfort , it can keep imprisoned believe only not that you ’ you .

## Next:

We see we need to do a lot of things, most of which we should've done even before we started (like lowercasing, removing punctuation, taking care of contractions). It seems that just assuming words would have a uniform distribution if we know the input is some sort of "quote"-esque type sentence wasn't enough. Since we kept our quotes separate and they aren't particularly long sentences, let's start with a bigram model.

***

## Version 1: Bigram Model

Well, that worked great. Maybe some context _would_ be good.

In [15]:
# Turning list into string
corpus = ""
for word in quotes:
    # Lowercasing
    word = word.lower()
    
    # Adding end tokens to mark the end of quotes
    word = word.replace('.', ' END ')   
    
    # Remove punctuation
    table = str.maketrans('','', string.punctuation + '…”“–')      
    word = word.translate(table)
    
    # Adding cleaned text to corpus
    corpus = corpus + word  

# Tokenizing
def tokenize(input_string):
    return input_string.split()

# Getting bigram model
def get_bigrams(corpus):
    corpus_fd_unigram = nltk.FreqDist(tokenize(corpus))
    bigrams = nltk.bigrams(['END'] + tokenize(corpus))
    bigrams_fd = nltk.FreqDist(bigrams)
    results = {}
    for bigram, bigram_frequency in bigrams_fd.items():
        first_word, second_word = bigram
        probability = (bigram_frequency / corpus_fd_unigram[first_word])    
        results[bigram] = probability
    return results

bigram_model = get_bigrams(corpus)

## New version 

Below, we use a bigram model and also take some care in structuring how the sentence will come out. We make sure that our quote starts with a bigram of the form `[END, word]` and ends with a bigram of the form `[word, END]`. 

In [36]:
# Version 1 of Fulfillomatic
def get_mindful_v1():
    
    """
    You must only concentrate on the next step, the next breath, 
    the next stroke of the broom, and the next, and the next. Nothing else.
    ॐ
    
    (Bigram Model)
    """    
    
    words_in_sentence = ['END']
    second_word = None
    
    while second_word != 'END':
        
        first_word = words_in_sentence[-1]
        matching_bigrams = [bigram for bigram in bigram_model.keys() if bigram[0] == first_word]
        
        # Getting probabilities
        bigram_probabilities = [bigram_model[bigram] for bigram in matching_bigrams]
        total_probability = sum(bigram_probabilities)
        
        # Picking probabilities to build sentence
        second_word = np.random.choice(
                        a=[second for first, second in matching_bigrams],
                        p=[p for p in bigram_probabilities])
        words_in_sentence.append(second_word)
        
    words_in_sentence = words_in_sentence[1:-1]
    
    # Capitalize the first letter of first word
    if len(words_in_sentence) > 0:
        first_word = words_in_sentence[0]
        first_word = first_word[0].upper() + first_word[1:]
        words_in_sentence[0] = first_word
        sentence = " ".join(words_in_sentence) + '.'
    else:
        sentence = get_mindful_v1()
    return sentence

In [26]:
# Creating a function that will print a desired number of generated quotes
def repeat(times, f):
    for i in range(times): f()
        
def do_v1():
    print(get_mindful_v1())

# Printing 5 generated quotes
repeat(5, do_v1)

Dont settle.
Let my mind is to make anything you would rather follow.
Those who you can do not to deal with people i want to tolerate what good is like crap.
Dont just wait for the wealth you will be better than before you can even temporarily compressed within yourself and trust of light in this excitement of good mans life.
Change the point of yourself everyone and popular opinion.


### Version 1 results

* Just do it.
* In my friends you can get the fire you grow from it should scare you do drunk.
* You.
* I believe in the least for anything i believe in god from a man to exist.
* Dont bother just take rest is too little one that you better.
* If you can not what we know what you will remain constant.
* What we are travelling more difficult than to forget is no greatness.
* Anything you look for what you do not being yourself.
* Let the wilderness of all else is still looking for us entirely happy because i told dismiss that can do something.

***

## Version 2: Trigram Model

It's... marginally better. Our ratio of "potentially good" generated quotes to "gibberish quotes" is still pretty awful. Let's see how a trigram model does instead.

In the steps above, we took some risks with our tokens. Since we ended up turning our corpus back into a long string instead of a list, now we just have quotes after quotes that aren't necessarily related. This is a problem because we don't necessarily want trigrams that span from the end of one quote to the next. Those trigrams do not represent tokens that could follow each other in a text -- they are completely accidental.

To address this, we added double end tokens for the trigrams: now, starting tokens look like `[END, END, word]` and end tokens like `[word, END, END]`.

In [28]:
# Adding extra END tokens
def add_extra_end_token(tokenized_document):
    new_document = []
    for token in tokenized_document:
        new_document.append(token)
        if token == "END":
            new_document.append("END")
    return new_document

def get_trigrams(document):
    corpus = tokenize(document)
    corpus = add_extra_end_token(corpus)
    corpus_fd_bigram = nltk.FreqDist(nltk.bigrams(["END"] + corpus))
    trigrams = nltk.trigrams(["END", "END"] + corpus)
    trigrams_fd = nltk.FreqDist(trigrams)
    results = {}
    for trigram, trigram_frequency in trigrams_fd.items():
        first_word, second_word, third_word = trigram
        probability = (trigram_frequency) / (corpus_fd_bigram[(first_word, second_word)])
        results[trigram] = probability
    return results

#get_trigrams(corpus)

trigram_model = get_trigrams(corpus)

We modified `get_mindful_v1` to be able to work with an N-gram model below, and `get_mindful_v2` is born:

In [None]:
def get_sentence_with_ngram_model(num_words, model):
    words_in_sentence = ['END' for i in range(0, num_words - 1)] # pad the start of the sentence with 'END' tokens
    final_word = None
    while final_word != 'END':        
        initial_n_gram_words = words_in_sentence[-(num_words - 1):]
        matching_n_gram_keys = []
        for n_gram in model.keys():
            words_to_match = zip(n_gram, initial_n_gram_words)
            if all(a == b for a, b in words_to_match):
                matching_n_gram_keys.append(n_gram)        
        n_gram_probabilities = [model[n_gram] for n_gram in matching_n_gram_keys]        
        total_probability = sum(n_gram_probabilities)                
        final_word = np.random.choice(
                        a=[n_gram[-1] for n_gram in matching_n_gram_keys],
                        p=[p for p in n_gram_probabilities])
        words_in_sentence.append(final_word)
    words_in_sentence = words_in_sentence[(num_words - 1): -1]
    # capitalize first letter of first word
    if len(words_in_sentence) > 0:
        first_word = words_in_sentence[0]
        first_word = first_word[0].upper() + first_word[1:]
        words_in_sentence[0] = first_word
        sentence = " ".join(words_in_sentence) + '.'
    else:
        sentence = get_sentence_with_ngram_model(num_words, model)
    return sentence

In [29]:
# Get mindful with Fulfillomatic version 3
def get_mindful_v2():
    """
    Three things cannot long be hidden: the sun, the moon, and the truth. ॐ
    
    (Trigram Model)
    """
    sentence = ""
    while len(sentence.split()) < 4:
        sentence = get_sentence_with_ngram_model(3, trigram_model)
    return sentence

Let's generate some examples:

In [34]:
# Print 5 generated sentences
def do_v2():
    print(get_mindful_v2())
    
repeat(5, do_v2)

One day i will find true success and happiness if you make your own soul according to his belief.
You will have to remember anything.
He does not matter.
Embrace the storms of your life surprise you.
What you have or even what you read when you need help and brave enough to know how to belong to oneself.


***

### Example: "It takes courage **to grow** sharper."

Take: *The world is full of magic things, patiently waiting for our senses* **to grow** *sharper.*

And: *It takes courage* **to grow** *up and become who you really are.*

Get: It takes courage **to grow** sharper.



## What if we feed the model a bunch of Nietzsche quotes?

* Without music life would be a means to conceal oneself.
* The noble soul reveres itself.
* What is the struggle of opinions that is to preserve the distance which separates us from other men.
* God is a rope over an abyss.
* But there is also always some reason in madness.
* We have forgotten are illusions.
* Christianity is our taste no longer our reasons.
* The end of a bad memory is too good.
* The advantage of a strong faith is infallible.
* There are two different types of people in the enemy’s staying alive.

![NLG](https://supportivedivorcesolutions.com/wp-content/uploads/2017/03/iStock-468140568.jpg)