# L10 - RNN

## Author - Rodolfo Lerma

# Problem:
Using the Keras dataset, create a new notebook and perform each of the following data preparation tasks and answer the related questions:

- Read Reuters dataset into training and testing 
- Prepare dataset
- Build and compile 3 different models using Keras LTSM ideally improving model at each iteration.
- Describe and explain your findings.

# Abstract:
Your next generation search engine startup was successful in having the ability to search for images based on their content. As a result, the startup received its second round of funding to be able to search news articles based on their topic. As the lead data scientist, you are tasked to build a model that classifies the topic of each article or newswire. 

For this assignment, you will leverage the RNN_KERAS.ipynb lab in the lesson. You are tasked to use the Keras Reuters newswire topics classification dataset. This dataset contains 11,228 newswires from Reuters, labeled with over 46 topics. Each wire is encoded as a sequence of word indexes. For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words". As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown word.

The analysis is is divided the following way:

### Data Exploration
- **Looking at an example**


### Analysis
- **Processing the data**
- **Training variables (*hyperparameters*)**
- **RNN Model**
    - Based Model
    - 2nd Model
    - 3rd Model
- **Results**

        
### Summary of Findings

# Data Exploration 

In [1]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import reuters

#import numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt

In [2]:
data = tf.keras.datasets.reuters

In [3]:
num_of_words = 10000
test_split_size = 0.2
(x_train, y_train), (x_test, y_test) = data.load_data(
    path="reuters.npz",
    num_words=num_of_words,
    test_split=test_split_size)

  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


First look at the data:

In [4]:
print(x_train[0])

[1, 2, 2, 8, 43, 10, 447, 5, 25, 207, 270, 5, 3095, 111, 16, 369, 186, 90, 67, 7, 89, 5, 19, 102, 6, 19, 124, 15, 90, 67, 84, 22, 482, 26, 7, 48, 4, 49, 8, 864, 39, 209, 154, 6, 151, 6, 83, 11, 15, 22, 155, 11, 15, 7, 48, 9, 4579, 1005, 504, 6, 258, 6, 272, 11, 15, 22, 134, 44, 11, 15, 16, 8, 197, 1245, 90, 67, 52, 29, 209, 30, 32, 132, 6, 109, 15, 17, 12]


In [5]:
np.shape(x_train)

(8982,)

As expected we have numbers and not words, as the words had been indexed by overall frequency.

In [6]:
print('# of Training Samples: {}'.format(len(x_train)))
print('# of Test Samples: {}'.format(len(x_test)))

num_classes = max(y_train) + 1
print('# of Classes: {}'.format(num_classes))

# of Training Samples: 8982
# of Test Samples: 2246
# of Classes: 46


In [7]:
word_index = tf.keras.datasets.reuters.get_word_index()
# word_index = tf.keras.datasets.reuters.get_word_index(path="reuters_word_index.json")

In [8]:
word_index["<UNK>"] = 0

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

## Looking at an example

In [9]:
print('\nFirst Review \n')
print(decode_review(x_train[0]))
print('\nIts label :',y_train[0])


First Review 

the of of mln loss for plc said at only ended said commonwealth could 1 traders now april 0 a after said from 1985 and from foreign 000 april 0 prices its account year a but in this mln home an states earlier and rise and revs vs 000 its 16 vs 000 a but 3 psbr oils several and shareholders and dividend vs 000 its all 4 vs 000 1 mln agreed largely april 0 are 2 states will billion total and against 000 pct dlrs

Its label : 3


# Analysis

## Processing the data

In [10]:
max_review_length = 256
x_train_padded = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_length)
x_test_padded  = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_length)

In [11]:
# from keras.utils import to_categorical
classes = 46
y_train_sparse = keras.utils.to_categorical(y_train, num_classes=classes)
y_test_sparse = keras.utils.to_categorical(y_test, num_classes=classes)

In [14]:
y_train_sparse

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

## Training variables (*hyperparameters*)

In [12]:
# Training varibles
learning_rate = 0.005
learning_rate_decay = 0.0001
batch_size = 512
epochs = 50 #Even though this would be computationally expensive, since the data set is not too big it will give us better prediction

# input shape is the vocabulary count used for the reviews (10,000 words)
vocab_size = 10000

## RNN Model

### Based Model

In [None]:
x_train_padded1 = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=1000)
x_test_padded1  = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=1000)

model = keras.Sequential()
model.add(keras.layers.Dense(1000, input_shape=(1000,)))
model.add(keras.layers.Activation('relu'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(512, input_shape=(1000,)))
model.add(keras.layers.Activation('relu'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(256, input_shape=(1000,)))
model.add(keras.layers.Activation('relu'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(num_classes))
model.add(keras.layers.Activation('softmax'))

model.summary()

# Model Compilation
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

model_history = model.fit(
    x_train_padded1,
    y_train_sparse,
    epochs=10,
    batch_size=32,
    validation_data=(x_test_padded1, y_test_sparse),
    verbose=1)

scores = model.evaluate(x_test_padded1, y_test_sparse, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

### 2nd Model

In [None]:
embedding_vecor_length = 512
max_review_length = 1000

model1 = keras.Sequential()
model1.add(keras.layers.Embedding(vocab_size, embedding_vecor_length, input_length=max_review_length))
model1.add(keras.layers.LSTM(100, return_sequences=True))
model1.add(keras.layers.LSTM(100, return_sequences=True))
model1.add(keras.layers.LSTM(100))
model1.add(keras.layers.Dense(46, activation = 'softmax'))

model1.summary()

# optimizer
optimizer = keras.optimizers.Adam()

# Model Compilation
model1.compile(optimizer=optimizer,loss='categorical_crossentropy',metrics=['accuracy'])

model_history1 = model1.fit(
    x_train_padded,
    y_train_sparse,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_test_padded, y_test_sparse),
    verbose=1)

scores1 = model1.evaluate(x_test_padded, y_test_sparse, verbose=0)
print("Accuracy: %.2f%%" % (scores1[1]*100))

### 3rd Model

In [None]:
embedding_vecor_length = 512
max_review_length = 1000

model2 = keras.Sequential()
model2.add(keras.layers.Embedding(vocab_size, embedding_vecor_length, input_length=max_review_length))
model2.add(keras.layers.GlobalAveragePooling1D())
model2.add(keras.layers.Dense(500, activation = 'relu'))
model2.add(keras.layers.Dropout(0.2))
model2.add(keras.layers.Dense(500, activation = 'relu'))
model2.add(keras.layers.Dropout(0.2))
model2.add(keras.layers.Dense(46, activation = 'softmax'))

model2.summary()

# Model Compilation
model2.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

model_history2 = model2.fit(
    x_train_padded,
    y_train_sparse,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_test_padded, y_test_sparse),
    verbose=1)

scores2 = model2.evaluate(x_test_padded, y_test_sparse, verbose=0)
print("Accuracy: %.2f%%" % (scores2[1]*100))

## Results

In [None]:
scores = model2.evaluate(x_test_padded, y_test_sparse, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

In [None]:
import pandas as pd
import seaborn as sns

y_pred = model2.predict(x_test_padded)

y_pred = np.argmax(y_pred,axis=1)
y_pred = pd.Series(y_pred, name='Predicted')
y_test = pd.Series(y_test, name='Actual')
df_confusion  = pd.crosstab(y_test,y_pred, rownames=['Actual'], colnames=['Predicted'])
#print(df_confusion)
plt.figure(figsize = (20,20))
plt.xlabel('xlabel', fontsize=18)
plt.ylabel('ylabel', fontsize=18)
plt.title('Confusion Matrix',fontsize=20)
sns.heatmap(df_confusion, annot=True,fmt="d")

# Summary of Findings

- It is possible to see that this artificial recurrent neural network (RNN) -> **Long short term memory** has a particularly great effect on the accuracy of this classification model, given by the fact that unlike standard feed forward neural networks, LSTM has feedback connections. It cannot only process single data points (such as images), but also entire sequences of data. The advantage of is clearly seen after using a standard deep neuron network model and see that the accuracy did not change from epoch to epoch.


- As discussed during the class tuning the hyperparameters and choosing the right was mentioned to be a *dark art* and it is possible to see why in this example. Even though choosing the right hyperparameters is critical, besides some state of the art methods that are being developed right now, the best approach would be to talk to experts on the problem you are working on and on the Deep learning algorithm that is being used and do some try and error testing different scenarios.


- Another thing that is possible to see (again) is that working on this specialized Deep Learning algorithms requires some knowledge of the kind of problem that we are working on as well as a very good understanding of the data available and the prediction that is wanted. Therefore working with the subject expert matter is a critical thing all the way thru a ML project.


- Even though in this case the LSTM model got a better result for the classification problem we were working on compared just with a standard deep neural network, it was significantly more computationally expensive compared to the previous model, and also for this particular case there was another model available that was able to provide a better (more accurate) solution using way less computer power. Which makes clear that it is important to work with experts all the way thru the ML project and explore different options.