## Example RNN implementation using Keras
**Credit: Chenhao Tan**, code obtained from [CSCI 4622](https://github.com/BoulderDS/CSCI-4622-Machine-Learning-18fa) at CU Boulder

Dataset is obtained from UCI Machine Learning repository consisting of SMS tagged messages (being ham (legitimate) or spam) that have been collected for SMS Spam research.

Use [Keras](https://keras.io/) to implement a classifier (you need to install Keras). Update the below snippet to build a Sequential model with an embedding layer, and an LSTM layer followed by a dense layer. This question allows you to get familiar with popular deep learning toolkits and the solution only has a few lines. In practice, there is no need to reinvent the wheels.

Learn more about RNN : https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Note: TensorFlow not supported on Python 3.7 so use a Python virtual environment with Python 3.6

In [4]:
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer


class RNN:
    '''
    RNN classifier
    '''

    def __init__(self, train_x, train_y, test_x, test_y, dict_size=5000,
                 example_length=150, embedding_length=32, epoches=5, batch_size=128):
        '''
        initialize RNN model
        :param train_x: training data
        :param train_y: training label
        :param test_x: test data
        :param test_y: test label
        :param epoches: number of ephoches to run
        :param batch_size: batch size in training
        :param embedding_length: size of word embedding
        :param example_length: length of examples
        '''
        self.batch_size = batch_size
        self.epoches = epoches
        self.example_len = example_length
        self.dict_size = dict_size
        self.embedding_len = embedding_length

        # preprocess training data
        tok = Tokenizer(num_words=dict_size)
        tok.fit_on_texts(train_x)
        sequences = tok.texts_to_sequences(train_x)
        self.train_x = sequence.pad_sequences(
            sequences, maxlen=self.example_len)
        sequences = tok.texts_to_sequences(test_x)
        self.test_x = sequence.pad_sequences(
            sequences, maxlen=self.example_len)

        self.train_y = train_y
        self.test_y = test_y
        
        print(train_x)

        # TODO: build model with Embedding, LSTM and dense layers.
        # refer to Sequence classification with LSTM : https://keras.io/getting-started/sequential-model-guide/#examples
        # Documentation for LSTM layer in : https://keras.io/layers/recurrent/#lstm
        self.model = Sequential()
        # YOUR CODE HERE
        self.model.add(Embedding(self.dict_size, self.embedding_len, input_length=self.example_len))
        self.model.add(LSTM(self.embedding_len))
        self.model.add(Dense(1, activation='sigmoid'))
        self.model.compile(loss='binary_crossentropy',
                           optimizer='adam', metrics=['accuracy'])
        
        print(self.model.summary())

    def train(self, verbose=0):
        '''
        fit in data and train model : refer fit method in https://keras.io/models/model/
        make sure you use batchsize and epochs appropriately.
        :return:None
        '''
        # TODO: fit in data to train your model
        # YOUR CODE HERE
        self.model.fit(self.train_x, self.train_y, batch_size=self.batch_size, epochs=self.epoches)

    def evaluate(self):
        '''

        evaluate trained model : Please refer evaluate in https://keras.io/models/model/
        :return: [loss,accuracy]
        '''
        # YOUR CODE HERE
        return self.model.evaluate(self.train_x, self.train_y)
    
    def evaluate_test(self):
        return self.model.evaluate(self.test_x, self.test_y)

In [2]:
import pickle
def load_data(location):
    return pickle.load(open(location,'rb'))

In [None]:
train_x, test_x, train_y, test_y = load_data('./data/spam_data.pkl')
rnn = RNN(train_x, train_y, test_x, test_y, epoches=5)
rnn.train(verbose=1)
accuracy = rnn.evaluate()
print('Accuracy for LSTM: ', accuracy[1])