# Fulfillomatic

##### Adriana Souza, Roger Filmyer

##### This notebook will be finished/cleaned by Thursday, Dec 6th.

![NLG](http://www.pngall.com/wp-content/uploads/2016/07/Meditation-Transparent.png)

***

### Loading data

In [1]:
# Packages
import numpy as np
import nltk
import random
import string

from collections import defaultdict

In [2]:
# Selecting the file to use
file = 'training/quotes.txt'

# Storing quotes from file in a list
with open(file) as opened_file: 
    lists = opened_file.read().splitlines()
    quotes = []
    for line in lists:
        quotes.append(line)

***

## Version 0: Uniform Distribution

In [10]:
# Tokenize
tokenized_corpus = []
for quote in quotes:
    tokenized_quote = nltk.tokenize.word_tokenize(quote)
    tagged_quote = nltk.pos_tag(tokenized_quote)
    tokenized_corpus.append(tagged_quote)

# Set up the language model
parts_of_speech = defaultdict(list)
sentence_structures = []
for quote in tokenized_corpus:
    sentence_structure = []
    for word, pos in quote:
        parts_of_speech[pos].append(word)
        sentence_structure.append(pos)
    sentence_structures.append(sentence_structure)

# Generate an example sentence
def get_mindful_v0() -> str:
    """
    Generate an inspirational sentence. 
    
    Ensure that you are in the proper state of mind before running. ॐ
    """
    sentence_skeleton = random.choice(sentence_structures)

    reconstituted_sentence = []
    for part_of_speech in sentence_skeleton:
        new_word = random.choice(parts_of_speech[part_of_speech])
        reconstituted_sentence.append(new_word)

    return " ".join(reconstituted_sentence)

# Output
get_mindful_v0()

'yourself lost enough be Some even , to flamboyantly Do that not .'

### Version 0 results

* your ready Speak begins when you can hear you not and never .
* in I think busy forwards of coffee , forever it . in you will live aware library you , make your education .
* without t denies my anything that bulk , yourself can once call You .
* as I are dreams to don grief , never the sun is to tolerate able you .
* all valuable choice is than the painful comfort , it can keep imprisoned believe only not that you ’ you .

***

## Version 1: Bigram Model

Well, that worked great. Maybe some context _would_ be good.

In [4]:
# Turning list into string
corpus = ""
string
for word in quotes:
    word = word.lower()
    word = word.replace('.', ' END ')
    table = str.maketrans('','', string.punctuation + '…”“–')      # Remove punctuation
    word = word.translate(table)
    corpus = corpus + word  
    
def tokenize(input_string):
    return input_string.split()

def get_bigrams(corpus):
    corpus_fd_unigram = nltk.FreqDist(tokenize(corpus))
    total = sum([1 + i[1] for i in corpus_fd_unigram.items()])
    bigrams = nltk.bigrams(['END'] + tokenize(corpus))
    bigrams_fd = nltk.FreqDist(bigrams)
    results = {}
    for bigram, bigram_frequency in bigrams_fd.items():
        first_word, second_word = bigram
        probability = (bigram_frequency / corpus_fd_unigram[first_word])    
        results[bigram] = probability
    return results

#bigrams(corpus)

In [11]:
bigram_model = get_bigrams(corpus)

def get_mindful_v1():
    """
    You must only concentrate on the next step, the next breath, 
    the next stroke of the broom, and the next, and the next. Nothing else.
    ॐ
    
    (Bigram Model)
    """
    words_in_sentence = ['END']
    second_word = None
    while second_word != 'END':
        first_word = words_in_sentence[-1]
        matching_bigrams = [bigram for bigram in bigram_model.keys() if bigram[0] == first_word]
        bigram_probabilities = [bigram_model[bigram] for bigram in matching_bigrams]
        total_probability = sum(bigram_probabilities)
        second_word = np.random.choice(
                        a=[second for first, second in matching_bigrams],
                        p=[p for p in bigram_probabilities])
        words_in_sentence.append(second_word)
    words_in_sentence = words_in_sentence[1:-1]
    # capitalize first letter of first word
    if len(words_in_sentence) > 0:
        first_word = words_in_sentence[0]
        first_word = first_word[0].upper() + first_word[1:]
        words_in_sentence[0] = first_word
        sentence = " ".join(words_in_sentence) + '.'
    else:
        sentence = get_mindful_v1()
    return sentence
        
# Print it a
def repeat(times, f):
    for i in range(times): f()
        
def do_v1():
    print(get_mindful_v1())
    
repeat(5, do_v1)

Adversity reveals the shore but we cannot escape necessities but we are not the educated are falling into it right door you stand for what lies behind me there is no more good you get people i can make friends.
People you only be.
Know failure is not you will transform him his enemies try to exist only the fire and even destroy this body but rather the ones life the overcoming of states we are those who wander are looking for what lies behind me burned brighter than the present.
I may not stop one of his woes.
To your hope is right.


### Version 1 results

* Just do it.
* In my friends you can get the fire you grow from it should scare you do drunk.
* You.
* I believe in the least for anything i believe in god from a man to exist.
* Dont bother just take rest is too little one that you better.
* If you can not what we know what you will remain constant.
* What we are travelling more difficult than to forget is no greatness.
* Anything you look for what you do not being yourself.
* Let the wilderness of all else is still looking for us entirely happy because i told dismiss that can do something.

***

## Version 2: Trigram Model

(...) We don't want trigrams that span from the end of one file to the next. Such trigrams do not represent tokens that could follow each other in a text-- they are completely accidental.

(...)

We added double end tokens for the trigrams

In [7]:
# Adding extra END tokens
def add_extra_end_token(tokenized_document):
    new_document = []
    for token in tokenized_document:
        new_document.append(token)
        if token == "END":
            new_document.append("END")
    return new_document

def get_trigrams(document):
    corpus = tokenize(document)
    corpus = add_extra_end_token(corpus)
    corpus_fd_bigram = nltk.FreqDist(nltk.bigrams(["END"] + corpus))
    trigrams = nltk.trigrams(["END", "END"] + corpus)
    trigrams_fd = nltk.FreqDist(trigrams)
    results = {}
    for trigram, trigram_frequency in trigrams_fd.items():
        first_word, second_word, third_word = trigram
        probability = (trigram_frequency) / (corpus_fd_bigram[(first_word, second_word)])
        results[trigram] = probability
    return results

#get_trigrams(corpus)

In [12]:
trigram_model = get_trigrams(corpus)

def get_sentence_with_ngram_model(num_words, model):
    words_in_sentence = ['END' for i in range(0, num_words - 1)] # pad the start of the sentence with 'END' tokens
    final_word = None
    while final_word != 'END':        
        initial_n_gram_words = words_in_sentence[-(num_words - 1):]
        matching_n_gram_keys = []
        for n_gram in model.keys():
            words_to_match = zip(n_gram, initial_n_gram_words)
            if all(a == b for a, b in words_to_match):
                matching_n_gram_keys.append(n_gram)        
        n_gram_probabilities = [model[n_gram] for n_gram in matching_n_gram_keys]        
        total_probability = sum(n_gram_probabilities)                
        final_word = np.random.choice(
                        a=[n_gram[-1] for n_gram in matching_n_gram_keys],
                        p=[p for p in n_gram_probabilities])
        words_in_sentence.append(final_word)
    words_in_sentence = words_in_sentence[(num_words - 1): -1]
    # capitalize first letter of first word
    if len(words_in_sentence) > 0:
        first_word = words_in_sentence[0]
        first_word = first_word[0].upper() + first_word[1:]
        words_in_sentence[0] = first_word
        sentence = " ".join(words_in_sentence) + '.'
    else:
        sentence = get_sentence_with_ngram_model(num_words, model)
    return sentence

def get_mindful_v2():
    """
    Three things cannot long be hidden: the sun, the moon, and the truth. ॐ
    
    (Trigram Model)
    """
    sentence = ""
    while len(sentence.split()) < 4:
        sentence = get_sentence_with_ngram_model(3, trigram_model)
    return sentence
    
        
# Print a bunch of generated sentences
def repeat(times, f):
    for i in range(times): f()
        
def do_v1():
    print(get_mindful_v2())
    
repeat(10, do_v1)

The aim of art is to keep company only with people who uplift you whose presence calls forth your best.
Nothing is more difficult than to need and not as a book over and over again there is.
I am changing myself.
If you do not know yourselves then you are still looking for that one person who will risk going too far can possibly find out why.
If you have not yet see the stars is ambitious.
That it is literally true that you have to be what you read when you die you will have treasure in heaven.
What lies before us are looking at the broken places.
Be sure you put your feet in the darkness can you find truth you will liberate the minds of men.
Real education must ultimately be limited to men who insist on knowing the rest said was right.
Happiness is beneficial for the moon and if you grow in awareness you will always find happiness.


### Example

Take:

        The world is full of magic things, patiently waiting for our senses *to grow* sharper. 

And:

        It takes courage *to grow* up and become who you really are. 

Get:

            It takes courage *to grow* sharper.



![NLG](https://supportivedivorcesolutions.com/wp-content/uploads/2017/03/iStock-468140568.jpg)