<a href="https://colab.research.google.com/github/hikmatfarhat-ndu/CSC645/blob/master/4zIMDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Keras


In this notebook we will learn to use the high-level framework Keras for building neural networks. From now on this will be our default framework to solve ML problems

# Predicting Movie Reviews

In this exercise we are given a set of IMDB movie reviews and we train our model to predict other reviews. The output of the review is either positive or negative so it is a binary classification problem like the ship/not ship problem we dealt with before

In [1]:
import tensorflow as tf
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import numpy as np
#import cupy as np
from keras.utils import to_categorical



## The data

The movie review dataset is a set of 50000 reviews of movies (half training, half test). Each review contains a set of words and is labeled positive (1) or negative (0). For convenience each word index refer to its frequency of occurence in the dataset. For example a word with index 5 is the fifth most frequently used data set. The indices 0,1 and 2 are reserved so 5 really means the third most frequent.

Details about the dataset can be found here [Keras IMDB](https://keras.io/api/datasets/imdb/)

In this exercise we choose only the first 10000 most frequent words to be included. Any word that is not among them is given the index 2.

First we load data set without omitting any words

### Data details

We would like to have an idea about the number of reviews, the average length of a review. Also we compute how many entries with values 0,1,2 and 3. The number 0 is used for padding and 1 to denote the beginning of each sequence. The number 2 is used for missing words. Finally, the number 3 is never used since as you will see later we will shift the indices by 3.

In [2]:
(x_train,y_train),(x_test,y_test)=tf.keras.datasets.imdb.load_data()

print("The number of reviews in the x_train data set = {}\n".format(x_train.shape[0]))
print("The average length of reviews = {}".format(np.mean([len(x) for x in x_train])))
print("With standard deviation = {}".format(np.std([len(x) for x in x_train])))
print("The number of 0's in the x_train data set = {}\n".format(sum([1 for x in np.hstack(x_train) if x==0])))
print("The number of 1's in the x_train data set = {}\n".format(sum([1 for x in np.hstack(x_train) if x==1])))
print("The number of 2's in the x_train data set = {}\n".format(sum([1 for x in np.hstack(x_train) if x==2])))
print("The number of 3's in the x_train data set = {}\n".format(sum([1 for x in np.hstack(x_train) if x==3])))

The number of reviews in the x_train data set = 25000

The average length of reviews = 238.71364
With standard deviation = 176.49367364852034
The number of 0's in the x_train data set = 0

The number of 1's in the x_train data set = 25000

The number of 2's in the x_train data set = 1

The number of 3's in the x_train data set = 0



Now when we choose only the first _max_words_ most frequent words and compute the number of 2's in the data set. As you can see the number of 2's is now very large since all the "ignored" words were given the code 2.

In [2]:
max_words=5000
(x_train,y_train),(x_test,y_test)=imdb.load_data(num_words=max_words)
print("The number of 2's in the x_train data set = {}\n".format(sum([1 for x in np.hstack(x_train) if x==2])))

The number of 2's in the x_train data set = 592372



### Word index

Keras provides also a dictionary of word to index. We don't need it for training but it helps us get an idea what the reviews are saying in plain English. From that we build a dictionary of index to words. We use the index_to_word to display the first review in the data set.

In [4]:

word_to_index=imdb.get_word_index()
index_to_word=dict([(key,val) for (val,key) in word_to_index.items()])
review = " ".join( [index_to_word.get(i - 3, "***") for i in x_train[0]] )
print(review)







Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
*** this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert *** is an amazing actor and now the same being director *** father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for *** and would recommend it to everyone to watch and the fly *** was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also *** to the two little *** that played the *** of norman and paul they were just brilliant children are often left out of the *** list i think because the stars that play them all grown up are such a big *** fo

## One hot Encoding

Even though this problem is similar to the image classification problem that we saw we cannot feed the word indices to our model directly. First, not all the reviews are the same length. Second, the same word in different reviews can occur at different position. For example the two reviews " that was a good movie" and "That movie was good" will be interpreted differently because the **same** words occur at different positions. So we need to do the following

1. Truncate or pad all reviews to contain the same number of words
1. Make sure the same word occurs at the same position in every review. This we do by using one hot encoding.

Suppose that our vocabulary contains only 3 words labeled 1,2 and 3. Further, suppose that two reviews have the values [2,1]  and [1,2] respectively then the one hot representation of both is the same:
[1,1,0] i.e. words 1 and 2 are present but 3 is missing

**NOTE** our representation is not very efficient. A better approach is to use **word embedding** which we don't use here.



## Implementation Details

The indices of a numpy tensor can be arrays. For example suppose that we are given an array A of size _n_ filled with zeros and we need to set the values at positions 1,13,27 with 1. We can perform the operation in one statement as A[[1,13,27]]=1. Knowing that the function one_hot below goes through every review which is a array of indices and sets the corresponding positions to 1.

In [3]:


def one_hot(reviews, nb_words = max_words):
 res = np.zeros((len(reviews), nb_words))
 for i, review in enumerate(reviews):
  res[i, review] = 1
 return res

x_train_one_hot=one_hot(x_train)
x_test_one_hot=one_hot(x_test)
print(x_train.shape)


(25000,)


### Keras Model

A Keras __model__ can be build from component __layers__. Later one we will look at the __functional__ API in Keras. Now, for simplicity, we will use the __Sequential__ Model which can be build by adding layers to it.

We will build the __logistic regression__ model that we have used before and shown in the figure below

![logistic](https://github.com/hikmatfarhat-ndu/CSC645/blob/master/figures/perceptron.png?raw=1)

First we create a __Sequential__ Model

In [8]:
model=tf.keras.models.Sequential()

Now we start adding __layers__ to it. From the figure above we can see that there are two layers: the __input__ and the __output__

In [9]:
input_shape=(x_train_one_hot.shape[1],)
input=tf.keras.layers.Input(shape=input_shape)
model.add(input)
output=tf.keras.layers.Dense(1,activation="sigmoid")
model.add(output)
model.summary()


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1)                 5001      
Total params: 5,001
Trainable params: 5,001
Non-trainable params: 0
_________________________________________________________________


### Compiling the Model
After building the model we need to find its optimal parameters with respect to the data. To do so we need to specify
1. The optimizer function
1. The loss function

In this exercise we use the Adam optimizer which can be viewed as stochastic gradient descent with __variable__ learning rate

In [10]:
#model.compile(optimizer="Adam",loss="binary_crossentropy",metrics=["accuracy"])
model.compile(optimizer="Adam",loss="sparse_categorical_crossentropy")

AttributeError: ignored

In [7]:
history = model.fit(
    x_train_one_hot,
    y_train,
    batch_size=500,
    epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [10]:
model.evaluate(x_test_one_hot,y_test)




[nan, 0.5]

In [11]:
y=model.predict(x_test_one_hot)

print(y.shape)
sent=y>=0.5 
print(np.squeeze(sent[0:15]))
print(y_test[0:15])

(25000, 1)
[False False False False False False False False False False False False
 False False False]
[0 1 1 0 1 1 1 0 0 1 1 0 0 0 1]


  after removing the cwd from sys.path.


## Word Embedding

A better approach is to use word embedding. First we truncate/pad all reviews to the same length using the pad_sequences function from Keras. Then we add an Embedding layer as the first layer in our model.

### EXAMPLE

In [12]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(6, 1, input_length=4))
#model.add(tf.keras.layers.Flatten())
#input_array = np.random.randint(10, size=(3, 4))
input_array=np.array([[1,2,3,4],[4,3,2,1],[3,3,4,5]])
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array.shape)
print(output_array)

(3, 4, 1)
[[[ 0.00128607]
  [-0.03293564]
  [-0.04245086]
  [-0.02890682]]

 [[-0.02890682]
  [-0.04245086]
  [-0.03293564]
  [ 0.00128607]]

 [[-0.04245086]
  [-0.04245086]
  [-0.02890682]
  [-0.02370199]]]


## The Model

In [13]:
x_train=pad_sequences(x_train,maxlen=500)
x_test=pad_sequences(x_test,maxlen=500)

model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(max_words,32,input_length=500))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1,activation="sigmoid"))
model.summary()
model.compile(optimizer=tf.keras.optimizers.Adam(),loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=["accuracy"])
history = model.fit(
    x_train,
    y_train,
    batch_size=500,
    epochs=20)


Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
flatten (Flatten)            (None, 16000)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 16001     
Total params: 176,001
Trainable params: 176,001
Non-trainable params: 0
_________________________________________________________________
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
