<a href="https://colab.research.google.com/github/isaacsemerson/deeplearning-python-fchollet/blob/main/fchollet_chapter4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:
# run this before anything else
from tensorflow.keras.datasets import imdb
import numpy as np
from tensorflow import keras
from keras import layers
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

Listing 4.1 - This loads the imdb reviews dataset from keras. This set contains 50000 reviews, split between the training and test sets. Half of the reviews are positive, the other half are negative.

num_words within load_data() will keep only the top amount of num_words occuring in the dataset. With the normal word variety we would be training on too many unique words, making classification very hard.

In [None]:
word_index = imdb.get_word_index()
reverse_word_index = dict(
    [(value, key) for (key, value) in word_index.items()])
decoded_review = " ".join([reverse_word_index.get(i - 3, "?") for i in train_data[0]])
print("First sample decode:", decoded_review)

First sample decode: ? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they ha

Listing 4.2 - This is how you would translate a review to English. get_word_index() returns a dictionary of words, mapped by index. We then flip the dictionary as we want to get words (original key) based on index (original value). The "i - 3" is special to this dataset. The first 3 indicies (0-2) are reserved for special utilities.

In [19]:
def vectorize_sequences(sequences, dimension=10000):
  results = np.zeros((len(sequences), dimension))
  for sentenceIndex, sentence in enumerate(sequences):
    for wordIndex in sentence:
      results[sentenceIndex, wordIndex] = 1
  return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
print("Shape of vectorized training data:", x_train.shape)
y_train = np.asarray(train_labels).astype("float32")
y_test = np.asarray(test_labels).astype("float32")

Shape of vectorized training data: (25000, 10000)


Listing 4.3 - We need to feed our neural network a standard data length/type. The default data has a variable sentence length, so above we are turning each sentence into a vector row. Each column represents the index of the word in our dictionary. If that word is included in the sentence, the corresponding index (column) is marked by a 1.

In [22]:
model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])

Listing 4.4 - Defining our model. The book holds our hand here and says it will explain more in chapter 5, but for now we are told that a stack of dense layers works well on binary classifcation. With that being said, a stack of dense layers requires us to decide how many layers and what the output units will be. However, the book also decides this, reserving explaination until next chapter.

Activations:
- Relu, this essentially squishes negatve numbers to 0 (think of an xy axis, a relu activation will prevent y from ever being negative)
- Sigmoid, this adjusts the value to a probability curve of the value being 1 (positive review in our case)

In [None]:
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

Listing 4.5 - Here we are compiling the model.

Optimizer - Book advises us to use rmsprop as an optimizer as it is a good default choice for most models.
Loss - The book advises us to go with crossentropy as it is "usually the best choice for binary classification". This loss function compares the distance between probability distributions for a batch of predictions and targets (example, 60% of this batch is positive for targets but only 40% of our predictions are positive).
Metrics - Accuracy here as a default (an abstract of the loss value).