# Simulating Welcome to Night Vale with TensorFlow

Here I'll be using TensorFlow to simulate text from Welcome to Night Vale. I have all of the transcripts saved to a file, so I can start by combining all of them into a large file, and studying it.

In [1]:
import os, re
import itertools as it

In [2]:
DATA_PATH = r'C:\Users\caleb\Documents\Data Science\welcome-to-night-vale\data'
TRANSCRIPTS_PATH = os.path.join(DATA_PATH, 'transcripts')

## Keras and TensorFlow

Now we implement a model in Keras and TensorFlow for generating sequences from the training text.

In [3]:
import sys
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
from keras.callbacks import TensorBoard

Using TensorFlow backend.


In [4]:
# load ascii text and covert to lowercase
with open(os.path.join(DATA_PATH, 'Welcome To Night Vale.txt'),
          'r', encoding='utf-8') as f:
    wtnv_text = f.read().lower()

In [5]:
print(wtnv_text[:500])

64 - we must give praise


don’t judge a book by its cover. judge it by the harmful messages it
contains.


welcome to night vale.


the enormous glowing cloud that serves as president of the night vale
school board announced a five-year strategic plan for the school
district. the plan, put together over the past year by the
twelve-member board, lays out new curriculum goals, organizational
restructuring, and a comprehensive outline for eternal penitence
before the mighty glow cloud.


everyone 


## The Alphabet

Next we display the alphabet on which these texts are based.

In [6]:
print(''.join(sorted(set(wtnv_text))))


 !"#$%&'()*+,-./0123456789:;<?[\]abcdefghijklmnopqrstuvwxyz ©º¼àáâäéîñóǜ̖̗̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼͇͈͉͍͎͓͔͕͖͙͚́̂̃̄̅̆̇̉̊̋̌̍̎̏̐̑̒̓̔̽̾͂̈́͆͊͌͐͑͒͗͛ͣͤͥͦͧͨͩͪͫͬͮͯͅавгдежзиклмнопрстуцчшыьэюя–—‘’“”…‽♪♫


In [7]:
# create mapping of unique chars to integers, and a reverse mapping
chars = sorted(list(set(wtnv_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [8]:
# summarize the loaded data
n_chars = len(wtnv_text)
n_vocab = len(chars)
print("Total Characters:", n_chars)
print("Total Vocab:", n_vocab)

Total Characters: 1628468
Total Vocab: 192


In [9]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
	seq_in = wtnv_text[i:i + seq_length]
	seq_out = wtnv_text[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])    

In [10]:
print("X:", ''.join([int_to_char[i] for i in dataX[0]]))
print("y:", int_to_char[dataY[0]])

X: 64 - we must give praise


don’t judge a book by its cover. judge it by the harmful messages it
cont
y: a


In [11]:
n_patterns = len(dataX)
print("Total Patterns:", n_patterns)

Total Patterns: 1628368


In [12]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))

# normalize
X = X / float(n_vocab)

In [13]:
X.shape

(1628368, 100, 1)

In [14]:
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [15]:
y.shape

(1628368, 192)

In [16]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

In [17]:
tb_callback = TensorBoard(log_dir=os.path.join(DATA_PATH, 'logs'),
                          histogram_freq=0.01,  
                          write_graph=True, 
                          write_images=True)

In [18]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metric=['accuracy'])

In [19]:
model.fit(X, y, batch_size=100, validation_split=0.7, verbose=2)

kwargs passed to function are ignored with Tensorflow backend


Train on 488510 samples, validate on 1139858 samples
Epoch 1/10


KeyboardInterrupt: 