# Neural Network Models

## Types of models

A **Multi-Layer Perceptron** (MLP) is simply a neural network feeding inputs into outputs through only "dense" layers (fully connected layers).

A **Feed Forward Network** (FFNN) is a generalization of the MLP that uses some other types of layers (dropout, batchnorm, etc) to enhance performance.

An **AutoEncoder** maps $X$ back to itself with a smaller layer in the middle (which compresses the information). You can think of it as an *embedding* (eg. dimensionality reduction) technique.

For now we'll only focus on pure **feedforward** networks, we'll look at **sequence learning** (Recurrent Nets and Transformer Architectures) later on in the course.

This means we'll focus oon green and purple nodes in this chart:

![](docs/architecture_types.png)

## Popular Layer types

### Dropout

Is similar in idea to bagging that we saw for random forests.

The `Dropout` layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by `1/(1 - rate)` such that the sum over all inputs is unchanged.

### BatchNorm

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

This normalizes input data per-batch.

Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, reducing the need for Dropout.

### Embedding Layer

Turns positive integers (indexes) into dense vectors of fixed size.

e.g. `[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]`

This layer can only be used as the first layer in a model. It's used to automatically turn categorical data into dense vectors bypassing the need for one-hot encoding.

We won't need this one for image data.

### Batch Size

Is the SGD parameter. 

If the Batch size is the size of your dataset, then you're doing classical gradient descent rather than SGD. Often smaller batches can be *better* because they're noisier and hence escape local minima more easily.

# Feed Forward Neural Networks (FFNN)

In [1]:
import pandas as pd
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.model_selection import  train_test_split
from sklearn.datasets import fetch_openml

# Model / data parameters
num_classes = 10
input_shape = (None, 784)

X, y = fetch_openml('mnist_784', return_X_y=True, as_frame=False)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# convert class vectors to one-hot class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [2]:
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, Activation

model = keras.Sequential()

# Note the output of this layer is very large: 100 dimensions
model.add(Dense(100, input_dim=784))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))

# Hidden layer    
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))

# Hidden layer
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))

# Note the output shape is the number of classes
model.add(Dense(num_classes))
model.add(BatchNormalization())
model.add(Activation('softmax'))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 100)               78500     
_________________________________________________________________
batch_normalization (BatchNo (None, 100)               400       
_________________________________________________________________
activation (Activation)      (None, 100)               0         
_________________________________________________________________
dropout (Dropout)            (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
_________________________________________________________________
batch_normalization_1 (Batch (None, 100)               400       
_________________________________________________________________
activation_1 (Activation)    (None, 100)               0

In [3]:
batch_size = 128
epochs = 10

history = model.fit(X_train, y_train, batch_size=batch_size, 
                    epochs=epochs, validation_split=0.1)
scores = model.evaluate(X_test, y_test, verbose=0)

print(f'Loss: {(scores[0]*100):.2f}')
print(f'Accuracy: {(scores[1]*100):.2f}')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Loss: 13.92
Accuracy: 96.03


In [None]:
pd.DataFrame(history.history).plot();

# Convolutional Neural Networks (CNN)

In [4]:
num_classes = 10
input_shape = (28, 28, 1)

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 1600)              0         
_______________________________________________

In [5]:
batch_size = 128
epochs = 10

history = model.fit(x_train, y_train, batch_size=batch_size, 
                    epochs=epochs, validation_split=0.15)
scores = model.evaluate(x_test, y_test, verbose=0)

print(f'Loss: {(scores[0]*100):.2f}')
print(f'Accuracy: {(scores[1]*100):.2f}')

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Loss: 2.56
Accuracy: 99.14


In [None]:
pd.DataFrame(history.history).plot();

# Recurrent Neural Networks (RNN)

In [None]:
#https://www.kaggle.com/davidzhao365/review-sentiment-analysis-with-rnn-and-gru

In [21]:
import pandas as pd
import re
#from sklearn.feature_extraction.text import TfidfVectorizer
from tensorflow.keras.preprocessing.text import Tokenizer

df = pd.read_csv('data/imdb_train.csv')
df = df.sample(frac=.25).reset_index(drop=True)
df

Unnamed: 0,Review,Score,Score_10
0,The Haunting is a film that boasts a really cr...,1,7
1,This mindless movie is a piece of crap and bor...,0,1
2,George Brent is a reporter sent to interview a...,0,4
3,"After sitting through this pile of dung, my hu...",0,1
4,This movie is the next segment in the pokemon ...,1,10
...,...,...,...
6245,There's perhaps a special reason why The Fox a...,1,8
6246,And I'm serious! Truly one of the most fantast...,1,10
6247,H.G. Wells in 1936 was past his prime and the ...,1,9
6248,Yet another venture into the realm of the teen...,0,1


In [10]:
def clean_text(text):
    text = re.sub(r'<.*?>', '', text)
    text = text.lower()
    text = re.sub("\\s", " ", text)
    text = re.sub("[^a-zA-Z' ]", "", text)
    #text = text.split(' ')
    return text

tfidf = TfidfVectorizer(
    stop_words="english",
    preprocessor=clean_text,
    ngram_range=(1, 2),
    max_df=0.95,
    min_df=2,
    max_features=3000
)

In [30]:
#from keras.utils import to_categorical
vocab = 3000
max_len = 500

#X = tfidf.fit_transform(df["Review"]).toarray()
df['Review'] = df['Review'].apply(lambda x: clean_text(x))
tokenizer = Tokenizer(num_words=vocab)
tokenizer.fit_on_texts(df['Review'])
X = tokenizer.texts_to_sequences(df['Review'])

#y = to_categorical(df["Score"])
y = np.array(df["Score"])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#np.shape(X_train)

In [31]:
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import GRU

# GRU Model
num_classes = 1

model = Sequential()
# Input layer
model.add(Embedding(vocab, 100, input_length=max_len))
model.add(GRU(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(10, activation='relu'))
# Output layer
model.add(Dense(num_classes, activation='sigmoid'))

#model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 500, 100)          300000    
_________________________________________________________________
gru_5 (GRU)                  (None, 100)               60600     
_________________________________________________________________
dense_12 (Dense)             (None, 10)                1010      
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 11        
Total params: 361,621
Trainable params: 361,621
Non-trainable params: 0
_________________________________________________________________


In [33]:
from keras.preprocessing import sequence

# Fit and evaluate
batch_size = 64
epochs = 10

X_train = sequence.pad_sequences(X_train, maxlen=max_len)
X_test = sequence.pad_sequences(X_test, maxlen=max_len)

history = model.fit(X_train, y_train, batch_size=batch_size, 
                    epochs=epochs, validation_split=0.15, verbose=1)

scores = model.evaluate(X_test, y_test, verbose=0)

print(f'Loss: {(scores[0]*100):.2f}')
yprint(f'Accuracy: {(scores[1]*100):.2f}')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Loss: 69.71
Accuracy: 81.28


In [None]:
pd.DataFrame(history.history).plot();

# Generative Adversarial Networks (RNN)

In [None]:
#https://medium.datadriveninvestor.com/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3