## Notes
### 08/12/18 [07:45 - 13:20]
So yeah, I might have done a mistake or two in the last implementation.... Between them, not realising that the input to Conv1D has a lot of channels and that I ~~could~~ should use a lot of filters.  
But that's all in the past, now that this little boy is up and learning and growing in a healthy manner.  
Now the first Conv1D uses 5 filters and the second one, 3. I could perhaps increase this someday.  
Not only that, but now it also saves it weights during training (I had done this in another implementation, but I forgot to commit that one, so now it's in this one).  
And now on to some notes for my future self, I was asing myself if it would be possible to get the 16 dimensional vector produced by Embedding and use Conv2D on it. It seems to be possible with the aid of keras.layers.Reshape. The idea would be to simply add a single dimension to the end of Embedding, so it would become 4D, allowing it to be used as input to Conv2D. I might do that in a separate file though, so that I'm able to compare both results.
Another possible improvement might be to shuffle the data. I didn't do so because there data order is relevant in this data set, but who knows what might happen?  
Also, here goes a complete log of the changes, in chronological order:
- Changed the parameter filter in the first Conv1D to 5
- Changed the parameter filter in the second Conv2D to 3
- Changed metrics to 'acc', as there is no implementation of sparse_categorical_accuracy in tf.keras
- Added saving cell, with model.save_weights
- Added ModelCheckpoint callback, saving only weights
- Added model saving to saving cell
- Changed the ModelCheckpoint parameter save_only_weights to False, as to resume training in later ocasions
- Also, increased periods in ModelCheckpoint
- Changed the checkpoint path to include data and time of checkpoint
- Gave names to the layers
- Added loading model cell  

On a side note, loss just went nuts


### Some time in the past:
I have yet to conclude training the model, but I was to excited to share it finally working after a long time.  
On a side note, however, it does seem to overfit before the end of even the first epoch, but we need to conclude a whole training session before jumping to any conclusions


# Word Prediction using TDNN implemented in tf.keras
This notebook is an attempt at implementing a Time Delay Neural Network for word prediction in the ptb dataset

In [57]:
# Imports
from __future__ import print_function
import tensorboard as tf
from tensorflow import keras
import reader
import numpy as np
import os
from time import strftime, gmtime
import pathlib
from pathlib import PosixPath

In [33]:
# Constants
window_size = 20 # defines the past lookup for determining the following word
path = "data/simple-examples/data"
checkpoint_path = "training/cp-{epoch:04d}_" + strftime("%H-%M_%d-%m-%Y", gmtime()) + ".ckpt"
checkpoint_dir = os.path.dirname

The following cell obtain the data using the reader.py file

In [4]:
train_data, valid_data, test_data, vocab_size, word_to_id = reader.ptb_raw_data(path)
x_train = train_data[:-1]
x_train = [np.asarray(x_train[i:i+window_size]) for i in range(len(x_train)-window_size)]
x_train = np.asarray(x_train)
y_train = np.asarray(train_data[1:-window_size])
#y_train = keras.preprocessing.text.one_hot(y_train, vocab_size)
x_valid = valid_data[:-1]
x_valid = [np.asarray(x_valid[i:i+window_size]) for i in range(len(x_valid)-window_size)]
x_valid = np.asarray(x_valid)
y_valid = valid_data[1:-window_size]
y_valid = np.asarray(y_valid)
x_test = test_data[:-1]
x_test = [np.asarray(x_test[i:i+window_size]) for i in range(len(x_test)-window_size)]
x_test = np.asarray(x_test)
y_test = test_data[1:-window_size]
y_test = np.asarray(y_test)
id_to_word = {value: key for (key, value) in word_to_id.items()}

Following next, we have an auxiliary function which decodes the ids and give us the original sentences

In [5]:
def decode_text(text):
    return ' '.join([id_to_word.get(i, '?') for i in text])

In [6]:
decode_text (x_train[40])

'<eos> mr. <unk> is chairman of <unk> n.v. the dutch publishing group <eos> rudolph <unk> N years old and former'

In [7]:
print(x_train.shape)
print(y_train.shape)

(929568, 20)
(929568,)


As we can see, each index of the input has a length of 20 words, as defined in _window-length_
<br>
With our data already processed, we can finally create our model.

In [41]:
# First time instantiation
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16, input_length = window_size, name = "Embedding_1"))
#model.add(keras.layers.Flatten())
model.add(keras.layers.Conv1D(filters = 5, kernel_size = 3, padding = "same", activation = keras.activations.tanh, name = "Conv1D_1"))
model.add(keras.layers.Dropout(0.2, name = "Droupout_1"))
model.add(keras.layers.Conv1D(filters = 3, kernel_size = 3, padding = "same", activation = keras.activations.tanh, name = "Conv2D_2"))
model.add(keras.layers.Dropout(0.25, name = "Dropout_2"))
model.add(keras.layers.Flatten(name = "Flatten_1"))
model.add(keras.layers.Dense(vocab_size, activation = keras.activations.softmax, name = "Dense_1"))

In [70]:
# Model loading - loads the most recent model recorded manually. Feel free to change
# Sort the checkpoints by modification time.
checkpoints = pathlib.Path("./models").glob("*")
checkpoints = sorted(checkpoints, key=lambda cp:cp.stat().st_mtime)
checkpoints = [cp.with_suffix('') for cp in checkpoints]
latest = str(checkpoints[-1])
checkpoints
model = keras.models.load_model(latest)

In [71]:
print (model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 20, 16)            160000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 20, 5)             245       
_________________________________________________________________
dropout_1 (Dropout)          (None, 20, 5)             0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 20, 3)             48        
_________________________________________________________________
dropout_2 (Dropout)          (None, 20, 3)             0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 60)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 10000)             610000    
Total para

In [13]:
model.compile(
    loss = keras.losses.sparse_categorical_crossentropy,
    optimizer = keras.optimizers.Adadelta(),
    metrics = ['acc'] #keras.metrics.categorical_accuracy] # remember to later change to sparse_categorical_accuracy (this is the cause for strange eval)
)

In [72]:
# Checkpoint callback
cp_callback = keras.callbacks.ModelCheckpoint(
    checkpoint_path, verbose = 1, save_weights_only = False, period = 3) # Also, later change to save_weights_only = false, and perhaps increase period to 3 or 4. This will allow us to later resume training from where we left

In [73]:
model.fit(x_train, y_train,
          epochs = 12,
          verbose = 1,
          validation_data = (x_valid, y_valid),
          shuffle = False,
          callbacks = [cp_callback]
)

Train on 929568 samples, validate on 73739 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12

Epoch 00003: saving model to training/cp-000315-20_12-08-2018.ckpt
Epoch 4/12
Epoch 5/12
Epoch 6/12

Epoch 00006: saving model to training/cp-000615-20_12-08-2018.ckpt
Epoch 7/12
Epoch 8/12
Epoch 9/12

Epoch 00009: saving model to training/cp-000915-20_12-08-2018.ckpt
Epoch 10/12
Epoch 11/12
Epoch 12/12

Epoch 00012: saving model to training/cp-001215-20_12-08-2018.ckpt


<tensorflow.python.keras._impl.keras.callbacks.History at 0x7fc176af7b00>

In [26]:
# I expect to be able to run this someday
score = model.evaluate(x_test, y_test, verbose = 0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 4.126658921275957
Test accuracy: 0.4793044449057557


In [35]:
model.save_weights("./model_weights/weights_" + strftime("%H-%M_%d-%m-%Y", gmtime()))
model.save("./models/model_" + strftime("%H-%M_%d-%m-%Y", gmtime())) # This allows us to resume training, since Adadelta has adaptive parameters