# **Lab: Neural Networks**
---
## Exercise 3: Binary Classification

The dataset we will be using is the German Credit Data.

The data was originally published by Professor Dr. Hans Hofmann
Institut f"ur Statistik und "Okonometrie
Universit"at Hamburg
FB Wirtschaftswissenschaften
Von-Melle-Park 5
2000 Hamburg 13

It is composed of 20 numerical variables plus the response variable.

Each observation represents a single application for a credit for an individual. The features correspond to the financial profile of the applicant.

The data dictionary can be found here: [German Credit Data Dictionary](https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc)

The original dataset is avalaible from UCI: [German Credit Data](http://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data))


A CSV version of this dataset is avaliable here: [link](https://online.stat.psu.edu/onlinecourses/sites/stat508/files/german_credit.csv)


Our goal is to build a Neural Network model that can predict if a new lead is creditworthy or not

## Instructions

This is a guided exercise where some of the code have already been pre-defined. Your task is to fill the remaining part of the code (it will be highlighted with placehoders) to train and evaluate your model.

The steps are:
1.   Launch Docker image
2.   Loading and Exploration of the Dataset
3.   Preparing the Dataset
4.   Defining the Architecture of the Multi-Layer Perceptron
5.   Training and Evaluation of the Model
6.   Analysing the Results
7.   Push changes

It is recommended to try adding regularization in this exercise with:
- [l1/l2 regularizer](https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/Regularizer)
- [dropout layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout)

## Exercise 3

### 1. Launch Docker image

**[1.1]** Go to the folder you created previously

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
cd /Users/anthonyso/Projects/adv_mla_2024

**[1.2]** Run the built Docker image

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
docker run  -dit --rm --name adv_mla_lab_8 -p 8888:8888 -v ~/Projects/adv_mla_2024/adv_mla_lab_8:/home/jovyan/work/ tensorflow-jupyter:latest

Syntax: docker run [OPTIONS] IMAGE

Options:

`-dit: Run container in background and interactive`

`--rm: Automatically remove the container when it exits`

`--name: Assign a name to the container`

`-p: Publish a container's port(s) to the host`

`-e: Set environment variables`

`-v Bind mount a volume`

Documentation: https://docs.docker.com/engine/reference/commandline/run/

**[1.3]** Display last 50 lines of logs

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution
docker logs --tail 50 adv_mla_lab_8

Syntax: docker logs [OPTIONS] CONTAINER

Options:

`--tail: Number of lines to show from the end of the logs`

Documentation: https://docs.docker.com/engine/reference/commandline/logs/

**[1.4]** Copy the url displayed and paste it to a browser in order to launch Jupyter Lab

**[1.5]** Create a new git branch called `tf_bin_class`

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution
git checkout -b tf_bin_class

**[1.6]** Create a subfolder `models/tf_bin_class`

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
mkdir models/tf_bin_class

**[1.7]** Navigate the folder `notebooks` and create a new jupyter notebook called `3_tf_binaryclass.ipynb`

### 2. Loading and Exploration of the Dataset

**[2.1]** Import the package pandas

In [None]:
# Placeholder for student's code

In [None]:
# Solution
import pandas as pd

**[2.2]** Load the CSV file using [.read_csv()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) into a variable called `df`

In [None]:
# Placeholder for student's code

In [None]:
# Solution
df = pd.read_csv("https://online.stat.psu.edu/onlinecourses/sites/stat508/files/german_credit.csv")

**[2.3]** Explore the first rows of the dataframe using [.head()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html?highlight=head#pandas.DataFrame.head)

In [None]:
# Placeholder for student's code

In [None]:
# Solution
df.head()

**[2.4]** Print out at the descriptive statistics for the numerical variables using[.describe()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html?highlight=describe#pandas.DataFrame.describe)

In [None]:
# Placeholder for student's code

In [None]:
# Solution
df.describe()

**[2.5]** Plot the distribution of the target variable using [.hist()](https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.pyplot.hist.html) and [.show()](https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.pyplot.show.html?highlight=show#matplotlib.pyplot.show) from matplotlib

In [None]:
# Placeholder for student's code

In [None]:
# Solution
import matplotlib.pyplot as plt
plt.hist(df['Creditability'])
plt.show()

### 3.   Preparing the Dataset

**[3.1]** Extract the target variable `Creditability` using [.pop()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pop.html?highlight=pop#pandas.DataFrame.pop) into a variable called `y`

In [None]:
# Placeholder for student's code

In [None]:
# Solution
y = df.pop('Creditability')

**[3.2]** Import the [train_test_split()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from sklearn


In [None]:
# Placeholder for student's code

In [None]:
# Solution
from sklearn.model_selection import train_test_split

**[3.3]** Split the data into training and testing sets using a 80/20 ratio with [train_test_split()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from sklearn

In [None]:
# Placeholder for student's code

In [None]:
# Solution
X_train, X_test, y_train, y_test =  train_test_split(df, y, test_size=0.2, random_state=8)

**[3.4]** Print out at the dimensions of the 4 variables you created

In [None]:
# Placeholder for student's code

In [None]:
# Solution
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

**[3.5]** Import [scale()](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html?highlight=scale#sklearn.preprocessing.scale) from sklearn


In [None]:
# Placeholder for student's code

In [None]:
# Solution
from sklearn.preprocessing import scale

**[3.6]** Perform standardisation on the training and testing sets using [scale()](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html?highlight=scale#sklearn.preprocessing.scale) from sklearn

In [None]:
# Placeholder for student's code

In [None]:
# Solution
scaled_X_train = scale(X_train)
scaled_X_test = scale(X_test)

**[3.7]** Print the scaled values of the first observation of the training set

In [None]:
# Placeholder for student's code

In [None]:
# Solution
print(scaled_X_train[0])

### 4.   Defining the Architecture of the Multi-Layer Perceptron

**[4.1]** Import tensorflow and numpy

In [None]:
# Placeholder for student's code

In [None]:
# Solution
import tensorflow as tf
import numpy as np

**[4.2]** Set the seeds for tensorflow and numpy in order to get reproducible results

In [None]:
# Placeholder for student's code

In [None]:
# Solution
tf.random.set_seed(42)

**[4.3]** Create a l1 and l2 regulariser using [l1_l2](https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/L1L2) and save it into a variable called `regularizer`

In [None]:
# Placeholder for student's code

In [None]:
# Solution
regularizer = tf.keras.regularizers.l1_l2(l1=0.01, l2=0.01)

**[4.4]** Instantiate a [.Sequential()](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) class called `model`

In [None]:
# Placeholder for student's code

In [None]:
# Solution
model = tf.keras.Sequential()

**[4.5]** Import the [Dense()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) and [Dropout()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) classes

In [None]:
# Placeholder for student's code

In [None]:
# Solution
from tensorflow.keras.layers import Dense, Dropout

**[4.6]** Create a hidden layer of 128 fully connected neurons with ReLU as the activation function followed by another fully connected layer responsible of making final predictions.

In [None]:
# Placeholder for student's code

In [None]:
# Solution
layer1 = Dense(128, activation='relu', input_shape=[20], kernel_regularizer=regularizer)
top_layer = Dense(1, activation='sigmoid')

**[4.7]** Assemble the 2 fully-connected layers we just defined. We will be using the [.add()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/add) method

In [None]:
# Placeholder for student's code

In [None]:
# Solution
model.add(layer1)
model.add(top_layer)

**[4.8]** Instantiate a [RMSprop()](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop) with 0.01 as learning rate and call it `optimizer`

In [None]:
# Placeholder for student's code

In [None]:
# Solution
optimizer = tf.keras.optimizers.RMSprop(0.001)

**[4.9]** Configure the learning process using the [.compile()](https://www.tensorflow.org/api_docs/python/tf/keras/Model#methods_2) method and specify the loss function, optimizer and the metrics to be used.

In [None]:
# Placeholder for student's code

In [None]:
# Solution
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

**[4.10]** Print out the model architecture with [.summary()](https://www.tensorflow.org/api_docs/python/tf/keras/Model#summary)

**Task: Print the summary of the model**

In [None]:
# Placeholder for student's code

In [None]:
# Solution
model.summary()

### 5. Training and Evaluation of the Model

**[5.1]** Train the model using [.fit()](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) on the training set on 50 epochs and create a validation set (20%)

In [None]:
# Placeholder for student's code

In [None]:
# Solution
history = model.fit(scaled_X_train, y_train, epochs=50, validation_split = 0.2)

**[5.2]** Evaluate the performance of this model on the testing set using [.evaluate()](https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate)

In [None]:
# Placeholder for student's code

In [None]:
# Solution
model.evaluate(scaled_X_test, y_test)

### 6. Analysing the Results

**[6.1]** Plot the learning curve for accuracy score on the training and validation sets. We will use the [.plot()](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) method to create a line chart.

In [None]:
# Placeholder for student's code

In [None]:
# Solution
plt.plot(history.history['accuracy'], label='training')
plt.plot(history.history['val_accuracy'], label='validation')
plt.ylabel('accuracy')
plt.xlabel('Epoch')
plt.show()

### 7.   Push changes

**[7.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git add .

**[7.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git commit -m "third tf model"

**[7.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git push

**[7.4]** Go to Github and merge the branch after reviewing the code and fixing any conflict


**[7.5]** Check out to the master branch

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git checkout master

**[7.6]** Pull the latest updates


In [None]:
# Placeholder for student's code (command line)

In [None]:
git pull

**[7.7]** Stop the Docker container

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution
docker stop adv_mla_lab_8

Documentation: https://docs.docker.com/engine/reference/commandline/stop/