## Text Generation Using Keras
This notebook will use the topics from each character in our set to create new lines of text based on the topic.

***
### Import Packages

In [91]:
import warnings
warnings.filterwarnings('ignore')

In [92]:
import pandas as pd

# Load LSTM network and generate text
import sys
import numpy
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, Activation
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

import nltk
import re
import random

***
### Read in data set 

In [93]:
df = pd.read_csv('character_topics.csv')
df

Unnamed: 0.1,Unnamed: 0,text,dominant_topic,character
0,0,Hello there! Come here my little friend. Don...,3,Obi-Wan
1,1,"Don't worry, he'll be all right.",3,Obi-Wan
2,2,"Rest easy, son, you've had a busy day. You're...",2,Obi-Wan
3,3,The Jundland Wastes are not to be traveled lig...,0,Obi-Wan
4,4,Obi-Wan Kenobi... Obi-Wan? Now thats a name I...,2,Obi-Wan
...,...,...,...,...
3483,3483,The code's changed. We need Artoo!,3,Leia
3484,3484,"Artoo, where are you? We need you at the bunke...",5,Leia
3485,3485,I'll cover you.,3,Leia
3486,3486,It's not bad.,1,Leia


here we are going to seperate by character so can create text for one character at a time

In [94]:
character_names = ['Obi-Wan', 'Vader', 'Luke', 'C-3PO', 'Han', 'Padme', 'Yoda', 'Anakin', 'Leia']
characters_dict = {}

obi_wan_df = df[df.character == 'Obi-Wan']
obi_wan_df = obi_wan_df.drop(columns = ['character', 'Unnamed: 0'])
obi_wan_df['num_words'] = obi_wan_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Obi-Wan'] = obi_wan_df

vader_df = df[df.character == 'Vader']
vader_df = vader_df.drop(columns = ['character', 'Unnamed: 0']) 
vader_df['num_words'] = vader_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Vader'] = vader_df

luke_df = df[df.character == 'Luke']
luke_df = luke_df.drop(columns = ['character', 'Unnamed: 0']) 
luke_df['num_words'] = luke_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Luke'] = luke_df

c_3po_df = df[df.character == 'C-3PO']
c_3po_df = c_3po_df.drop(columns = ['character', 'Unnamed: 0'])
c_3po_df['num_words'] = c_3po_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['C-3PO'] = c_3po_df

han_df = df[df.character == 'Han']
han_df = han_df.drop(columns = ['character', 'Unnamed: 0']) 
han_df['num_words'] = han_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Han'] = han_df

padme_df = df[df.character == 'Padme']
padme_df = padme_df.drop(columns = ['character', 'Unnamed: 0'])
padme_df['num_words'] = padme_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Padme'] = padme_df

yoda_df = df[df.character == 'Yoda']
yoda_df = yoda_df.drop(columns = ['character', 'Unnamed: 0']) 
yoda_df['num_words'] = yoda_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Yoda'] = yoda_df

anakin_df = df[df.character == 'Anakin']
anakin_df = anakin_df.drop(columns = ['character', 'Unnamed: 0']) 
anakin_df['num_words'] = anakin_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Anakin'] = anakin_df

leia_df = df[df.character == 'Leia']
leia_df = leia_df.drop(columns = ['character', 'Unnamed: 0']) 
leia_df['num_words'] = leia_df.apply(lambda x: len(x.text.split()), axis=1)
characters_dict['Leia'] = leia_df

In [95]:
characters_dict['Obi-Wan']

Unnamed: 0,text,dominant_topic,num_words
0,Hello there! Come here my little friend. Don...,3,10
1,"Don't worry, he'll be all right.",3,6
2,"Rest easy, son, you've had a busy day. You're...",2,15
3,The Jundland Wastes are not to be traveled lig...,0,19
4,Obi-Wan Kenobi... Obi-Wan? Now thats a name I...,2,17
...,...,...,...
584,I will take the child and watch over him. Mast...,1,23
585,Training??,6,1
586,Who?,0,1
587,"Qui-Gon? But, how could he accomplish this?",1,7


### Text generation 
Using a dictionary to save the new lines generated, we will loop thorugh all the charachters then all their topics and save their new lines to be read into a flask app. 

In [96]:
quotes_dict = {}

In [97]:
character_names = ['Obi-Wan', 'Anakin', 'Vader', 'Luke', 'Padme', 'Leia', 'C-3PO', 'Han', 'Yoda']

In [None]:
quotes_dict = {}
for character in character_names_first:
    quote_list = []
    
    # run thorugh each topic cluster 
    for cluster in range(0,8): 
        # get the text for that cluster 
        raw_data = characters_dict[character]
        clustered_data = raw_data.loc[raw_data['dominant_topic'] == cluster]['text']
        raw_text = clustered_data.to_list()
        raw_text = ' '.join(data)
        # map characters to integers (and back) in order to do modeling
        chars = sorted(list(set(raw_text)))
        char_to_int = dict((c, i) for i, c in enumerate(chars))
        int_to_char = dict((i, c) for i, c in enumerate(chars))
        
        # take a peek at what we are working wirh 
        n_chars = len(raw_text)
        n_vocab = len(chars)
        print("Total Characters: ", n_chars)
        print("Total Vocab: ", n_vocab)
        
        # prepare the dataset of input to output pairs encoded as integers
        seq_length = 100
        dataX = []
        dataY = []
        for i in range(0, n_chars - seq_length, 1):
            seq_in = raw_text[i:i + seq_length]
            seq_out = raw_text[i + seq_length]
            dataX.append([char_to_int[char] for char in seq_in])
            dataY.append(char_to_int[seq_out])
        n_patterns = len(dataX)
        print("Total Patterns: ", n_patterns)
        
        # reshape the input code and normalize it
        X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
        X = X / float(n_vocab)
        # one hot encode the output variable
        y = np_utils.to_categorical(dataY)
        
        # define lstem model
        model = Sequential()
        model.add(LSTM(500, input_shape=(X.shape[1], X.shape[2])))
        model.add(Dropout(0.2))
        model.add(Dense(y.shape[1], activation='softmax'))

        # load the weights for the model 
        filepath = 'weight.best.hdf5'
        model.compile(loss='categorical_crossentropy', optimizer='adam')
        checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
        callbacks_list = [checkpoint]
        
        # fit the model
        model.fit(X, y, epochs=100, batch_size=128, callbacks=callbacks_list)
        model.load_weights(filepath)

        
        # generate new text 
        quotes = []
        # makeing 20 quotes (for time, can make more)
        for i in range(0,20):
            # pick a random seed 
            start = numpy.random.randint(0, len(dataX)-1)
            pattern = dataX[start]
            # look at random seed that model will predict on 
            print("Seed:")
            print("\"", ' '.join([int_to_char[value] for value in pattern]), "\"")
            
            # generate characters
            text = []
            for i in range(500):
                x = numpy.reshape(pattern, (1, len(pattern), 1))
                x = x / float(n_vocab)
                prediction = model.predict(x, verbose=0)
                index = numpy.argmax(prediction)
                result = int_to_char[index]
                seq_in = [int_to_char[value] for value in pattern]
                pattern.append(index)
                pattern = pattern[1:len(pattern)]
                text.append(result)
            print("\n::::\n")
            # save the new quotes in a dictionary for that character topic 
            text_string = ''.join([str(elem) for elem in text])
            quotes.append(text_string)
            print(text_string)
        quote_list.append(quotes)
        print(quotes)
        print("\nDone.")
    quotes_dict[character] = quote_list

Total Characters:  8797
Total Vocab:  29
Total Patterns:  8697
Epoch 1/100

Epoch 00001: loss improved from inf to 2.91830, saving model to weight.best.hdf5
Epoch 2/100

Epoch 00002: loss improved from 2.91830 to 2.83361, saving model to weight.best.hdf5
Epoch 3/100

Epoch 00003: loss improved from 2.83361 to 2.82872, saving model to weight.best.hdf5
Epoch 4/100

Epoch 00004: loss improved from 2.82872 to 2.82265, saving model to weight.best.hdf5
Epoch 5/100

Epoch 00005: loss improved from 2.82265 to 2.81548, saving model to weight.best.hdf5
Epoch 6/100

Epoch 00006: loss improved from 2.81548 to 2.80287, saving model to weight.best.hdf5
Epoch 7/100

Epoch 00007: loss improved from 2.80287 to 2.77900, saving model to weight.best.hdf5
Epoch 8/100

Epoch 00008: loss improved from 2.77900 to 2.74638, saving model to weight.best.hdf5
Epoch 9/100

Epoch 00009: loss improved from 2.74638 to 2.71591, saving model to weight.best.hdf5
Epoch 10/100

Epoch 00010: loss improved from 2.71591 to 2.


Epoch 00048: loss improved from 0.23971 to 0.18297, saving model to weight.best.hdf5
Epoch 49/100

Epoch 00049: loss improved from 0.18297 to 0.14582, saving model to weight.best.hdf5
Epoch 50/100

Epoch 00050: loss improved from 0.14582 to 0.11927, saving model to weight.best.hdf5
Epoch 51/100

Epoch 00051: loss improved from 0.11927 to 0.09130, saving model to weight.best.hdf5
Epoch 52/100

Epoch 00052: loss improved from 0.09130 to 0.07564, saving model to weight.best.hdf5
Epoch 53/100

Epoch 00053: loss improved from 0.07564 to 0.07035, saving model to weight.best.hdf5
Epoch 54/100

Epoch 00054: loss improved from 0.07035 to 0.05757, saving model to weight.best.hdf5
Epoch 55/100

Epoch 00055: loss improved from 0.05757 to 0.05143, saving model to weight.best.hdf5
Epoch 56/100

Epoch 00056: loss improved from 0.05143 to 0.04101, saving model to weight.best.hdf5
Epoch 57/100

Epoch 00057: loss improved from 0.04101 to 0.03341, saving model to weight.best.hdf5
Epoch 58/100

Epoch 000


::::

we are here to protect you senator not to start an investigation what captain typho ha more than enough men downstairs no assassin will try that way any activity up here it is not an intruder i am worried about there are many other why to kill a senator you are using her a bait because of your mother mind your thought anakin they betray you you have made a commitment to the jedi order a commitment not easily broken and do not forget she is a politician they are not to be trusted that wa too clos
Seed:
" t   m a s t e r   y o u   c o u l d   b e   s i t t i n g   o n   t h e   c o u n c i l   b y   n o w   i f   y o u   w o u l d   j u s t   f o l l o w   t h e   c o d e   t h e y   w i l l   n o t   "

::::

go along with you this time iam ready to face the trial jar jar is on his way to the gungan city master no master yoda igave qui-gon my word i will train anakin without the approval of the council if i must i am sure the jedi council have their reason we are here to protect 


::::

ey betray you you have made a commitment to the jedi order a commitment not easily broken and do not forget she is a politician they are not to be trusted that wa too close what it wa stupid where are you going he went down there the other way anakin this weapon is your life but you have not learned anything anakin toxic dart it is a toxic dart i need to know where it came from and who made it i am looking for dexter he is not in trouble it is personal i never understood why he quit only twenty 
Seed:
" h   t h e   p o l i t i c i a n   a n a k i n   b e   c a r e f u l   o f   y o u r   f r i e n d   p a l p a t i n e   h e   h a   r e q u e s t e d   y o u r   p r e s e n c e   c a l m   d o w n   "

::::

anakin you have been given a great honor to be on the council at your age it is never happened before listen to me anakin the fact of the matter is you are too close to the chancellor the council doe not like it when he interferes in jedi affair no it is not anakin i worry w

Total Patterns:  8697
Epoch 1/100

Epoch 00001: loss improved from inf to 2.91322, saving model to weight.best.hdf5
Epoch 2/100

Epoch 00002: loss improved from 2.91322 to 2.83542, saving model to weight.best.hdf5
Epoch 3/100

Epoch 00003: loss improved from 2.83542 to 2.83042, saving model to weight.best.hdf5
Epoch 4/100

Epoch 00004: loss improved from 2.83042 to 2.83023, saving model to weight.best.hdf5
Epoch 5/100

Epoch 00005: loss improved from 2.83023 to 2.82464, saving model to weight.best.hdf5
Epoch 6/100

Epoch 00006: loss improved from 2.82464 to 2.81549, saving model to weight.best.hdf5
Epoch 7/100

Epoch 00007: loss improved from 2.81549 to 2.80557, saving model to weight.best.hdf5
Epoch 8/100

Epoch 00008: loss improved from 2.80557 to 2.78819, saving model to weight.best.hdf5
Epoch 9/100

Epoch 00009: loss improved from 2.78819 to 2.76082, saving model to weight.best.hdf5
Epoch 10/100

Epoch 00010: loss improved from 2.76082 to 2.72879, saving model to weight.best.hdf5
E


Epoch 00048: loss improved from 0.29605 to 0.23129, saving model to weight.best.hdf5
Epoch 49/100

Epoch 00049: loss improved from 0.23129 to 0.17847, saving model to weight.best.hdf5
Epoch 50/100

Epoch 00050: loss improved from 0.17847 to 0.14314, saving model to weight.best.hdf5
Epoch 51/100

Epoch 00051: loss improved from 0.14314 to 0.11247, saving model to weight.best.hdf5
Epoch 52/100

Epoch 00052: loss improved from 0.11247 to 0.09489, saving model to weight.best.hdf5
Epoch 53/100

Epoch 00053: loss improved from 0.09489 to 0.07257, saving model to weight.best.hdf5
Epoch 54/100

Epoch 00054: loss improved from 0.07257 to 0.05886, saving model to weight.best.hdf5
Epoch 55/100

Epoch 00055: loss improved from 0.05886 to 0.05252, saving model to weight.best.hdf5
Epoch 56/100

Epoch 00056: loss improved from 0.05252 to 0.04601, saving model to weight.best.hdf5
Epoch 57/100

Epoch 00057: loss improved from 0.04601 to 0.03789, saving model to weight.best.hdf5
Epoch 58/100

Epoch 000

In [None]:
with open('star-wars-app/quotes.pickle', 'wb') as handle:
    pickle.dump(quotes_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
dict1 = pickle.load(open("star-wars-app/quotes.pickle", "rb" ))
dict2 = pickle.load(open("star-wars-app/quotes.pickle", "rb" ))
# Merge contents of dict2 in dict1
dict1.update(dict2)

In [None]:
dict3 = pickle.load(open("star-wars-app/quotes.pickle", "rb" ))