# **CUHK-STAT3009**: Notebook - Neural Networks




## Introduction to Deep learning with Keras

- `Model`: input -> layers -> output
- `Loss`: find an appropriate loss function for your problem
- `Algo`: SGD, Adam, ...
- `Data`: Define the model, then feed the data
- `metric`: final evaluation or something you care



## Example 1: Imbalanced classification: credit card fraud detection

- Author: fchollet
- Date created: 2019/05/28
- Last modified: 2020/04/17
- Description: Demonstration of how to handle highly imbalanced classification problems.

In [None]:
!wget https://raw.githubusercontent.com/nsethi31/Kaggle-Data-Credit-Card-Fraud-Detection/master/creditcard.csv


--2022-10-31 08:24:41--  https://raw.githubusercontent.com/nsethi31/Kaggle-Data-Credit-Card-Fraud-Detection/master/creditcard.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102634230 (98M) [text/plain]
Saving to: ‘creditcard.csv’


2022-10-31 08:24:42 (234 MB/s) - ‘creditcard.csv’ saved [102634230/102634230]



In [None]:
!head -n 5 creditcard.csv

Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,-1.359807134,-0.072781173,2.536346738,1.378155224,-0.33832077,0.462387778,0.239598554,0.098697901,0.36378697,0.090794172,-0.551599533,-0.617800856,-0.991389847,-0.311169354,1.468176972,-0.470400525,0.207971242,0.02579058,0.40399296,0.251412098,-0.018306778,0.277837576,-0.11047391,0.066928075,0.128539358,-0.189114844,0.133558377,-0.021053053,149.62,0
0,1.191857111,0.266150712,0.166480113,0.448154078,0.060017649,-0.082360809,-0.078802983,0.085101655,-0.255425128,-0.166974414,1.612726661,1.065235311,0.489095016,-0.143772296,0.635558093,0.463917041,-0.114804663,-0.18336127,-0.145783041,-0.069083135,-0.225775248,-0.638671953,0.101288021,-0.339846476,0.167170404,0.125894532,-0.008983099,0.014724169,2.69,0
1,-1.358354062,-1.340163075,1.773209343,0.379779593,-0.503198133,1.800499381,0.791460956,0.247675787,-1.514654323,0.207642865,0.624501459,0.066083685,0.717292731,-0.165

In [None]:
# https://keras.io/examples/structured_data/imbalanced_classification/

import csv
import numpy as np
import pandas as pd

fname = "./creditcard.csv"

# original code
# all_features = []
# all_targets = []
# with open(fname) as f:
#     for i, line in enumerate(f):
#         if i == 0:
#             print("HEADER:", line.strip())
#             continue  # Skip header
#         fields = line.strip().split(",")
#         all_features.append([float(v.replace('"', "")) for v in fields[:-1]])
#         all_targets.append([int(fields[-1].replace('"', ""))])
#         if i == 1:
#             print("EXAMPLE FEATURES:", all_features[-1])

df = pd.read_csv(fname)

targets = np.array(df['Class'], dtype="uint8")
df.drop(['Class', 'Amount', 'Time'], axis=1)
features = np.array(df.values, dtype="float32")
print("features.shape:", features.shape)
print("targets.shape:", targets.shape)

features.shape: (284807, 31)
targets.shape: (284807,)


In [None]:
df.sample(5).T

Unnamed: 0,108578,17176,39781,94255,91114
Time,70972.0,28499.0,39962.0,64804.0,63319.0
V1,-0.750074,1.236056,0.719367,-0.883482,-1.117178
V2,0.735973,0.519209,-1.081056,0.269455,0.71808
V3,0.935688,-0.073251,1.895041,2.363623,0.744803
V4,-0.89146,0.981566,1.066609,-0.969322,-1.400623
V5,-0.329783,0.177718,-1.098797,-0.545099,0.174676
V6,-1.186985,-0.846329,2.386458,0.031023,-0.711423
V7,0.296088,0.510512,-1.487785,0.711851,0.687726
V8,0.044064,-0.260361,0.880024,-0.056098,0.448037
V9,-2.357388,-0.677051,1.859406,0.494021,-1.034041


In [None]:
## create train and valid datasets

num_val_samples = int(len(features) * 0.2)
train_features = features[:-num_val_samples]
train_targets = targets[:-num_val_samples]
test_features = features[-num_val_samples:]
test_targets = targets[-num_val_samples:]

print("Number of training samples:", len(train_features))
print("Number of testing samples:", len(test_features))

Number of training samples: 227846
Number of testing samples: 56961


In [None]:
## Data pre-processing

mean = np.mean(train_features, axis=0)
train_features -= mean
test_features -= mean
std = np.std(train_features, axis=0)
train_features /= std
test_features /= std

## How to implement a neural network `tf.keras.Model`?

- Define a network: input -> Layers -> output
- compile a network: `model.compile`
- fit a network: `model.fit`

### Define a neural network by `tf.keras.Model`

> `tf.keras.Model(*args, **kwargs)`

**Args**
- inputs:	The input(s) of the model: a keras.Input object or list of keras.Input objects.

- outputs: The output(s) of the model. See Functional API example below.
name 	String, the name of the model. 

- Layers: use different types of layers to construct your own networks

The key point is how to connect from `inputs` to `outputs`

In [None]:
## Build binary classifcation model

## Input -> Dense -> Dense -> output
from tensorflow import keras

model = keras.Sequential(
    [
        keras.layers.Dense(
            256, activation="relu", input_shape=(train_features.shape[-1],)
        ),
        keras.layers.Dense(256, activation="relu"),
        keras.layers.Dense(256, activation="relu"),
        keras.layers.Dense(1, activation="sigmoid"),
    ]
)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 256)               8192      
                                                                 
 dense_1 (Dense)             (None, 256)               65792     
                                                                 
 dense_2 (Dense)             (None, 256)               65792     
                                                                 
 dense_3 (Dense)             (None, 1)                 257       
                                                                 
Total params: 140,033
Trainable params: 140,033
Non-trainable params: 0
_________________________________________________________________


### Explore `model.compile` function in `tf.keras`
```python
    compile(
        optimizer='rmsprop',
        loss=None,
        metrics=None,
        loss_weights=None,
        weighted_metrics=None,
        run_eagerly=None,
        steps_per_execution=None,
        jit_compile=None,
        **kwargs
    )
```
- Key args: `optimizer`, `loss`, `metrics`
- Ref:[
A Comprehensive Guide on Deep Learning Optimizers
](https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/)
- IMPORTANT! Pair your loss function and the outcome of your network (activation function in the last layer)

In [None]:
metrics = [
    keras.metrics.BinaryAccuracy(name='acc'),
]

model.compile(
    optimizer=keras.optimizers.SGD(1e-4), 
    loss="binary_crossentropy", 
    metrics=metrics
)

### Explore `model.fit` function in `keras`
```python
    fit(
        x=None,
        y=None,
        batch_size=None,
        epochs=1,
        verbose='auto',
        callbacks=None,
        validation_split=0.0,
        validation_data=None,
        shuffle=True,
        class_weight=None,
        sample_weight=None,
        initial_epoch=0,
        steps_per_epoch=None,
        validation_steps=None,
        validation_batch_size=None,
        validation_freq=1,
        max_queue_size=10,
        workers=1,
        use_multiprocessing=False
    )
```
Check [**Args**](https://www.tensorflow.org/api_docs/python/tf/keras/Model): 

- key args: `x`, `y`, `epochs`, `batch_size`, `verbose`, `callbacks`, `validation_split`, `validation_data`



In [None]:
model.fit(
    train_features,
    train_targets,
    batch_size=2048,
    epochs=5,
    verbose=2,
)

Epoch 1/5
112/112 - 5s - loss: 0.9292 - acc: 0.0019 - 5s/epoch - 45ms/step
Epoch 2/5
112/112 - 4s - loss: 0.8844 - acc: 0.0030 - 4s/epoch - 37ms/step
Epoch 3/5
112/112 - 4s - loss: 0.8429 - acc: 0.0102 - 4s/epoch - 37ms/step
Epoch 4/5
112/112 - 4s - loss: 0.8046 - acc: 0.0428 - 4s/epoch - 37ms/step
Epoch 5/5
112/112 - 4s - loss: 0.7690 - acc: 0.1367 - 4s/epoch - 37ms/step


<keras.callbacks.History at 0x7f9823524110>

### Early-stopping in neural networks

- Define `callback` with `EarlyStopping` in `model.fit`
- Use `validation_data` or `validation_split` in `model.fit`

    tf.keras.callbacks.EarlyStopping(
        monitor="val_loss",
        min_delta=0,
        patience=0,
        verbose=0,
        mode="auto",
        baseline=None,
        restore_best_weights=False)


In [None]:
metrics = [
    keras.metrics.BinaryAccuracy(name='acc'),
    # keras.metrics.AUC(name='auc')
]

model.compile(
    optimizer=keras.optimizers.SGD(1e-3), 
    loss="binary_crossentropy", 
    metrics=metrics
)

callbacks = [keras.callbacks.EarlyStopping( 
    monitor='val_acc', min_delta=0, patience=5, verbose=1, 
    mode='auto', baseline=None, restore_best_weights=True)]

model.fit(
    train_features,
    train_targets,
    batch_size=2048,
    epochs=30,
    verbose=2,
    callbacks=callbacks,
    validation_split = .2,
)

Epoch 1/30
90/90 - 4s - loss: 0.2179 - acc: 0.9980 - val_loss: 0.1783 - val_acc: 0.9989 - 4s/epoch - 45ms/step
Epoch 2/30
90/90 - 3s - loss: 0.1891 - acc: 0.9980 - val_loss: 0.1537 - val_acc: 0.9989 - 3s/epoch - 39ms/step
Epoch 3/30
90/90 - 3s - loss: 0.1662 - acc: 0.9980 - val_loss: 0.1341 - val_acc: 0.9989 - 3s/epoch - 38ms/step
Epoch 4/30
90/90 - 3s - loss: 0.1479 - acc: 0.9980 - val_loss: 0.1184 - val_acc: 0.9989 - 3s/epoch - 39ms/step
Epoch 5/30
90/90 - 3s - loss: 0.1330 - acc: 0.9980 - val_loss: 0.1056 - val_acc: 0.9989 - 3s/epoch - 39ms/step
Epoch 6/30
Restoring model weights from the end of the best epoch: 1.
90/90 - 3s - loss: 0.1209 - acc: 0.9980 - val_loss: 0.0952 - val_acc: 0.9989 - 3s/epoch - 39ms/step
Epoch 6: early stopping


<keras.callbacks.History at 0x7f203e97cf50>

In [None]:
## make prediction
pred_prob = model.predict(test_features)
pred_label = 1*(pred_prob > .5)

print(pred_label)

model.evaluate(test_features, test_targets)

[[0]
 [0]
 [0]
 ...
 [0]
 [0]
 [0]]


[0.17444510757923126, 0.9986833333969116]

### InClass Practice

- Define a network with two Dense layers with 128 neurons for each layer; 
- Report `AUC` score for each epoch
- Earlystopping with `patient=10` based on `AUC` score on a validation set.

## Not a sequential model?

- By subclassing the Model class: in that case, you should define your layers in __init__() and you should implement the model's forward pass in call().

```python
import tensorflow as tf

class MyModel(tf.keras.Model):

  def __init__(self):
    super().__init__()
    self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
    self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)

  def call(self, inputs):
    x = self.dense1(inputs)
    return self.dense2(x)

  model = MyModel()
```

- The point is to define `layer` and **PATH** separately

In [20]:
## define a semi-parametric model
import tensorflow as tf

class SemiM(tf.keras.Model):

  def __init__(self):
    super().__init__()
    self.dense1 = keras.layers.Dense(256, activation='relu', input_shape=(train_features.shape[-1],))
    self.dense2 = keras.layers.Dense(256, activation='relu')
    self.dense3 = keras.layers.Dense(1, activation='sigmoid')
    self.concatenate = keras.layers.Concatenate()

  def call(self, inputs):
    nl_p = self.dense1(inputs)
    nl_p = self.dense2(nl_p)
    p = self.concatenate([nl_p, inputs])
    out = self.dense3(p)
    return out

In [22]:
SemiM_ = SemiM()

In [23]:
metrics = [
    keras.metrics.BinaryAccuracy(name='acc'),
]

SemiM_.compile(
    optimizer=keras.optimizers.SGD(1e-4), 
    loss="binary_crossentropy", 
    metrics=metrics
)

In [25]:
SemiM_.fit(
    train_features,
    train_targets,
    batch_size=2048,
    epochs=20,
    verbose=2,
)

Epoch 1/20
112/112 - 3s - loss: 0.6214 - acc: 0.6709 - 3s/epoch - 25ms/step
Epoch 2/20
112/112 - 3s - loss: 0.5975 - acc: 0.7141 - 3s/epoch - 25ms/step
Epoch 3/20
112/112 - 3s - loss: 0.5752 - acc: 0.7539 - 3s/epoch - 24ms/step
Epoch 4/20
112/112 - 3s - loss: 0.5542 - acc: 0.7896 - 3s/epoch - 24ms/step
Epoch 5/20
112/112 - 3s - loss: 0.5344 - acc: 0.8224 - 3s/epoch - 24ms/step
Epoch 6/20
112/112 - 3s - loss: 0.5159 - acc: 0.8510 - 3s/epoch - 23ms/step
Epoch 7/20
112/112 - 3s - loss: 0.4984 - acc: 0.8769 - 3s/epoch - 23ms/step
Epoch 8/20
112/112 - 3s - loss: 0.4819 - acc: 0.8998 - 3s/epoch - 23ms/step
Epoch 9/20
112/112 - 4s - loss: 0.4663 - acc: 0.9195 - 4s/epoch - 34ms/step
Epoch 10/20
112/112 - 3s - loss: 0.4515 - acc: 0.9364 - 3s/epoch - 24ms/step
Epoch 11/20
112/112 - 3s - loss: 0.4376 - acc: 0.9507 - 3s/epoch - 24ms/step
Epoch 12/20
112/112 - 3s - loss: 0.4243 - acc: 0.9621 - 3s/epoch - 24ms/step
Epoch 13/20
112/112 - 3s - loss: 0.4118 - acc: 0.9719 - 3s/epoch - 23ms/step
Epoch 14

<keras.callbacks.History at 0x7f98232452d0>

In [28]:
## make prediction
pred_prob = model.predict(test_features)
pred_label = np.round(pred_prob)

print(pred_label)

model.evaluate(test_features, test_targets)

[[1.]
 [1.]
 [0.]
 ...
 [1.]
 [1.]
 [1.]]


[0.7765597701072693, 0.06941591948270798]

## To-do list

- **STAT**
  - [ ] Math formulation of Neural networks
  - [ ] Idea of Back-propogation
  - [ ] Terms in (Batch) SGD: epochs, batch size, ...

- **Code**
  - [ ] From math -> Diagram -> TF code
  - [ ] Custom metrics, model structure, losses, and other components in TF