<a href="https://colab.research.google.com/github/squeeko/DL_TF20_KerasCNNGANSRNNNLP/blob/in_progress/DL_TF2_Ch1_SentimentAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models, preprocessing
import tensorflow_datasets as tfds

In [2]:
max_len = 200
n_words = 10000
dim_embedding = 256
EPOCHS = 20
BATCH_SIZE = 500

In [3]:
def load_data():
  # Load data
  (X_train, y_train), (X_test, y_test) = datasets.imdb.load_data(num_words=n_words)
  # Pad sequences with max_len
  X_train = preprocessing.sequence.pad_sequences(X_train, maxlen=max_len)
  X_test = preprocessing.sequence.pad_sequences(X_test, maxlen=max_len)
  return (X_train, y_train), (X_test, y_test)

### Tensorflow Layer Types

[Embedding Layer - ML Mastery](https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/)
, [Embedding Layer - Tensorflow](https://keras.io/api/layers/core_layers/embedding/#embedding)

[Max, Average, Global Max, Global Average](https://www.machinecurve.com/index.php/2020/01/30/what-are-max-pooling-average-pooling-global-max-pooling-and-global-average-pooling/)

[Dense and Dropout](https://www.quora.com/In-TensorFlow-what-is-a-dense-and-a-dropout-layer)




In [10]:
def build_model():
  model = models.Sequential()
  # Input - Embedding Layer
  # The model will take as input an integer matrix of size (batch, input_length)
  # The model will output dimension (input_length, dim_embedding)
  # The largest integer in the input should be no larger that n_words (vocabulary_size)
  model.add(layers.Embedding(n_words, dim_embedding, input_length=max_len))
  model.add(layers.Dropout(0.3))
  # Takes the maximum value of either feature vector from each of the n_words features
  model.add(layers.GlobalMaxPool1D())
  model.add(layers.Dense(128, activation='relu'))
  model.add(layers.Dropout(0.5))
  model.add(layers.Dense(1, activation='sigmoid'))
    
  return(model)

In [11]:
# Train the model

(X_train, y_train), (X_test, y_test) = load_data()
model = build_model()
model.summary()

model.compile(optimizer="adam",
              loss="binary_crossentropy",
              metrics=["accuracy"])

score = model.fit(X_train, y_train,
                  epochs = EPOCHS,
                  batch_size = BATCH_SIZE,
                  validation_data = (X_test, y_test)
                  )

score = model.evaluate(X_test, y_test, batch_size = BATCH_SIZE)

print("\nTest score: ", score[0])
print("\nTest accuracy: ", score[1])

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 200, 256)          2560000   
_________________________________________________________________
dropout_4 (Dropout)          (None, 200, 256)          0         
_________________________________________________________________
global_max_pooling1d_2 (Glob (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               32896     
_________________________________________________________________
dropout_5 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 129       
Total params: 2,593,025
Trainable params: 2,593,025
Non-trainable params: 0
____________________________________________

## A practical overview of backpropagation

Multi-layer perceptrons learn from training data through a process called
backpropagation. In this section, we will cover the basics while more details can be
found in Chapter 15, The Math behind Deep Learning. The process can be described as a
way of progressively correcting mistakes as soon as they are detected. Let's see how
this works.
Remember that each neural network layer has an associated set of weights that
determine the output values for a given set of inputs. Additionally, remember that
a neural network can have multiple hidden layers.
At the beginning, all the weights have some random assignment. Then, the net is
activated for each input in the training set: values are propagated forward from the
input stage through the hidden stages to the output stage where a prediction is
made. Note that we've kept Figure 38 simple by only representing a few values with
green dotted lines but in reality all the values are propagated forward through the
network:

![](https://raw.githubusercontent.com/squeeko/DL_TF20_KerasCNNGANSRNNNLP/in_progress/images/FWD_Step_in_BackProp.png)

Since we know the true observed value in the training set, it is possible to calculate
the error made in prediction. The key intuition for backtracking is to propagate the
error back (see Figure 39), using an appropriate optimizer algorithm such as gradient
descent to adjust the neural network weights with the goal of reducing the error
(again, for the sake of simplicity, only a few error values are represented here):

![](https://raw.githubusercontent.com/squeeko/DL_TF20_KerasCNNGANSRNNNLP/in_progress/images/BWD_STEP_in_BackProp.png)

The process of forward propagation from input to output and the backward
propagation of errors is repeated several times until the error gets below a
predefined threshold. The whole process is represented in Figure 40:

![](https://raw.githubusercontent.com/squeeko/DL_TF20_KerasCNNGANSRNNNLP/in_progress/images/FWD_and_BWD_Prop.png)

The features represent the input, and the labels are used here to drive the learning
process. The model is updated in such a way that the loss function is progressively
minimized. In a neural network, what really matters is not the output of a single
neuron but the collective weights adjusted in each layer. Therefore, the network
progressively adjusts its internal weights in such a way that the prediction increases
the number of correctly forecasted labels. Of course, using the right set of features
and having quality labeled data is fundamental in order to minimize the bias during
the learning process.