## LSTM Captioning

This is a very basic model: 

*  Take the featurized images (2048d), and tokenised captions
*  Add a (trainable) features -> 50d dense layer
*  Use a 50d GloVe embedding (for the LSTM inputs, non-trainable)
   *  #stop-words ~ 250 (say)
*  50d of hidden units for the LSTM
*  But have a 'pluggable' output transform :
   *   Concat : (256 one-hot - including '0'=mask, '1'={UNK}, '2'={START}, '3'={STOP}, '4'={UseOther})
   *   (a) UseOther + (8192-250 of more one-hot)
   *   (b) UseOther + (50d of same GloVe embedding, for nearest-neighbour)
   *   (c) UseOther + (log2(8192)==13 bits + error correction of index of word)
*  Want to monitor some kind of score over time for test cases
   

In [None]:
import os

import numpy as np


In [None]:
glove_dir = './data/RNN/'
glove_100k_50d = 'glove.first-100k.6B.50d.txt'
glove_100k_50d_path = os.path.join(glove_dir, glove_100k_50d)

# cd data; ln -s ../../../data/RNN .
if not os.path.isfile( glove_100k_50d_path ):
    raise RuntimeError("You need to download GloVE Embeddings "+
                       ": Use the downloader in 5-Text-Corpus-and-Embeddings.ipynb")
else:
    print("GloVE available locally")

In [None]:
# Due to size constraints, only use the first 100k vectors (i.e. 100k most frequently used words)
import glove
word_embedding = glove.Glove.load_stanford( glove_100k_50d_path )
word_embedding.word_vectors.shape

In [None]:
EMBEDDING_DIM = word_embedding.word_vectors.shape[1]

word_embedding_rnn = np.vstack([ 
        np.zeros( (1, EMBEDDING_DIM,), dtype='float32'),   # This is the 'zero' value (used as a mask in Keras)
        np.zeros( (1, EMBEDDING_DIM,), dtype='float32'),   # This is for 'UNK'  (word == 1)
        word_embedding.word_vectors,
    ])
word_embedding_rnn.shape

In [None]:
#[ (i,w) for i,w in enumerate(word_embedding.inverse_dictionary.values()) ][0:300]