# DABN13 - Assignment 6

## Preamble: Data
In this lab we are using a dataset on beer purchases. Our goal is to predict if light beer purchased in the US is BUD light. To achieve this goal, we will use the information provided by the following socioeconomic characteristics:
* market           - where the beer is bought
* buyertype        - who is the buyer () 
* income           - ranges of income
* childrenUnder6   - does the buyer have children under 6 years old
* children6to17    - does the buyer have children between 6 and 17
* age              - bracketed age groups
* employment       - fully employed, partially employed, no employment.
* degree           - level of occupation
* occuptation      - which sector you are employed in
* ethnic           - white, asian, hispanic, black or other
* microwave        - own a microwave
* dishwasher       - own a dishwasher
* tvcable          - what type cable tv subscription you have
* singlefamilyhome - are you in a single family home
* npeople          - number of people you live with 1,2,3,4, +5

First, we load the dataset and create an output variable that indicates purchases of Bud Light.

In [27]:
import pandas as pd
import numpy as np
import os

# os.chdir("?") Change working directory if needed 

lb    = pd.read_csv("LightBeer2.csv")
y     = np.zeros(shape=lb.shape[0])
y[lb['beer_brand'] == "BUD LIGHT"]     = 1
demog = lb.iloc[:,9:]
demog = pd.get_dummies(demog, drop_first=True)



We also split the data into training and test sets:

In [28]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(demog, y, train_size=0.75, shuffle=False)

stdz_X = StandardScaler().fit(X_train)

X_train = stdz_X.transform(X_train)
X_test  = stdz_X.transform(X_test)

## Part 1: Specifying and training neural networks
We will now start building a neural network to predict the purchase of Bud Light.

### Task 1a) 
We start with specifying the architecture of our very first and very small neural net `model1`.
Add three layers to `model1`, two hidden layers with $30$ and $15$ hidden units, respectively, and an output layer.
For the two hidden layers you should use the ReLU activation function. Additionally, choose a suitable activation for the output layer, given that we have a classification problem. See [the Keras documentation](https://keras.io/api/layers/activations/) for activation functions. 

In [29]:
# CODE_CHUNK code_chunk_01
from tensorflow import keras
from keras import layers

keras.utils.set_random_seed(3)
# Initialize a first model
model1 = keras.Sequential()

# Add layers to the model
model1.add(layers.Input(shape=(X_train.shape[1],)))

model1.add(layers.Dense(30, activation= "relu"))

model1.add(layers.Dense(15, activation= "relu"))

model1.add(layers.Dense(1, activation = "sigmoid"))


### Task 1b) 

Next, we compile our model specification. From [https://keras.io/api/losses/probabilistic_losses/](losses) select a suitable loss function for our classification problem. As optimization algorithm use *Adam* with learning rate $0.00003$. Lastly, use `accuracy` as a metric.

In [30]:
# CODE_CHUNK code_chunk_02
from keras.optimizers import Adam
from keras.losses import BinaryCrossentropy

# Compile the model
myopt = Adam(learning_rate = 0.00003)

model1.compile(loss = "binary_crossentropy", optimizer = myopt, metrics = ["accuracy"])



### Task 1c)
Now train the model using $250$ epochs, a batch_size of $2^8$, and use $25\%$ of the data for validation.
Use the string variable `loss_valloss_difference_1c` to describe and explain the observed difference between validation loss and training loss.

In [31]:
# CODE_CHUNK code_chunk_03
nn_trainlog1 = model1.fit(x = X_train, y = y_train,
                          epochs = 250,
                          batch_size = 256,
                          validation_split= 0.25)


loss_valloss_difference_1c = "??"

Epoch 1/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.5438 - loss: 0.7276 - val_accuracy: 0.6957 - val_loss: 0.5862
Epoch 2/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5603 - loss: 0.7112 - val_accuracy: 0.7315 - val_loss: 0.5770
Epoch 3/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5808 - loss: 0.6977 - val_accuracy: 0.7448 - val_loss: 0.5712
Epoch 4/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5901 - loss: 0.6863 - val_accuracy: 0.7526 - val_loss: 0.5665
Epoch 5/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5981 - loss: 0.6766 - val_accuracy: 0.7575 - val_loss: 0.5634
Epoch 6/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6069 - loss: 0.6682 - val_accuracy: 0.7704 - val_loss: 0.5607
Epoch 7/250
[1m161/16

### Task 1d)
In Lecture 9 we used early stopping to avoid overfitting. Apply this here, with `patience` argument set to 20. Create a new model, `model1b`, which otherwise should have a setup identical to `model1`. In which epoch did the model training procedure stop? Figure this out by counting the number of elements in `model1b_fit.history['loss']` and write your answer into the string variable `when_earlystop_1d`.

In [32]:
# CODE_CHUNK code_chunk_04
from keras.callbacks import EarlyStopping

# Define architecture here
model1b = keras.Sequential()

# Add layers to the model
model1b.add(layers.Input(shape=(X_train.shape[1],)))

model1b.add(layers.Dense(30, activation= "relu"))

model1b.add(layers.Dense(15, activation= "relu"))

model1b.add(layers.Dense(1, activation = "sigmoid"))

# Compile model here 

myopt1b = Adam(learning_rate = 0.00003)
model1b.compile(loss = "binary_crossentropy", optimizer = myopt1b, metrics = ["accuracy"])

# Fit model here
model1b_fit = model1b.fit(x = X_train, y = y_train,
                          epochs = 250,
                          batch_size = 256,
                          validation_split= 0.25,
                          callbacks = [EarlyStopping(patience=20)])


when_earlystop_1d = "??"

Epoch 1/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4575 - loss: 0.7629 - val_accuracy: 0.3981 - val_loss: 0.8086
Epoch 2/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4948 - loss: 0.7287 - val_accuracy: 0.5218 - val_loss: 0.7280
Epoch 3/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5377 - loss: 0.7063 - val_accuracy: 0.6065 - val_loss: 0.6731
Epoch 4/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5627 - loss: 0.6911 - val_accuracy: 0.6696 - val_loss: 0.6363
Epoch 5/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5892 - loss: 0.6800 - val_accuracy: 0.7237 - val_loss: 0.6120
Epoch 6/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6043 - loss: 0.6710 - val_accuracy: 0.7391 - val_loss: 0.5958
Epoch 7/250
[1m161/16

### Task 1e)
Even though we haven't finished training our neural net, let us use the `evaluate()` function to measure the predictive performance of `model1b` on the test data. Save the result as `res_model1`. 

What is the accuracy of the model for validation training data and test data, respectively? What is the difference in accuracy? Save your answer in the string variable `difference_in_accuracy_1e`.
*Hint:* the training validation accuracy can be extracted from `model1b_fit`.


In [33]:
# CODE_CHUNK code_chunk_05
res_model1 = model1b.evaluate(X_test, y_test)
print(res_model1)
difference_in_accuracy_1e = ""


[1m572/572[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 994us/step - accuracy: 0.7075 - loss: 0.5687
[0.5687230229377747, 0.7075332403182983]


### Task 1f)
Now we use the `confusion_matrix()` function from the `metrics` module of scikit-learn to disaggregate model performance to class-specific performance. First, get class predictions on the test data using `predict()`. Save these as `prob_model1`. 
Second, use `confusion_matrix()` to get a confusion matrix and save it as `CM_model1`. Third, calculate true positive rate and false positive rate and save them as `TPR_1f` and `FPR_1f`. 
Do your results on TPR and FPR suggest that prediction accuracy is approximately equal in both categories? Write your (specific!) answer into the string variable `categorywise_accuracy_1f`

In [34]:
# CODE_CHUNK code_chunk_06
from sklearn.metrics import confusion_matrix

prob_model1 = model1b.predict(X_test) 


CM_model1   = confusion_matrix(y_test, prob_model1 > 0.5)

TN = CM_model1[0,0]
TP = CM_model1[1,1]
FN = CM_model1[1,0]
FP = CM_model1[0,1]


TPR_1f = TP / (FN + TP)
FPR_1f = FP / (FP + TN)

print(CM_model1)
print(TPR_1f)
print(FPR_1f)

categorywise_accuracy_1f = "??"  


[1m572/572[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 680us/step
[[10560  2321]
 [ 3025  2373]]
0.43960726194886995
0.18018787361229718


### Task 1g)

In the lectures we have utilized explicit regularization to avoid overfitting. Here we will use $\ell_2$ regularization to update the weights. Create the architecture of a new neural net `model2` which is identical to that of `model1b` except for $\ell_2$ regularization with regularization factor `l2_pen` in the two hidden layers.  

Then compile and fit this regularized model with the same parameters as in Task 1d. Save the trained neural net as `model2_fit`.

In [35]:
# CODE_CHUNK code_chunk_07
from keras.regularizers import l2
l2_pen = 0.005

# 1.
model2 = keras.Sequential()

model2.add(layers.Input(shape=(X_train.shape[1],)))

model2.add(layers.Dense(30, activation= "relu", kernel_regularizer= l2(l2_pen)))

model2.add(layers.Dense(15, activation= "relu", kernel_regularizer= l2(l2_pen)))

# 2.

model2.add(layers.Dense(1, activation = "sigmoid", kernel_regularizer= l2(l2_pen)))
myopt2 = Adam(learning_rate = 0.00003)
model2.compile(loss = "binary_crossentropy", optimizer = myopt2, metrics = ["accuracy"])


model2_fit = model2.fit(x = X_train, y = y_train,
                          epochs = 250,
                          batch_size = 256,
                          validation_split= 0.25,
                          callbacks = [EarlyStopping(patience=20)])



Epoch 1/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.5024 - loss: 1.0794 - val_accuracy: 0.4792 - val_loss: 1.0516
Epoch 2/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5332 - loss: 1.0429 - val_accuracy: 0.5717 - val_loss: 0.9828
Epoch 3/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5551 - loss: 1.0161 - val_accuracy: 0.6410 - val_loss: 0.9359
Epoch 4/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5737 - loss: 0.9948 - val_accuracy: 0.6922 - val_loss: 0.9032
Epoch 5/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5867 - loss: 0.9764 - val_accuracy: 0.7265 - val_loss: 0.8785
Epoch 6/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5984 - loss: 0.9599 - val_accuracy: 0.7440 - val_loss: 0.8592
Epoch 7/250
[1m161/16

### Task 1h)
In Task 1e) we compared the prediction accuracy on test and training sets. However, this is bad measure when the data is not well balanced (in terms of the observed output categories). Instead, one can use the cross entropy for the binomial distribution (minus the average log likelihood of the model). In fact, we chose this function as loss function for model training when we compiled `model1` and `model2`.

To compare the test error of `model2` to that of `model1b` we don't want to use `loss` from `evaluate` since this includes the $\ell_2$ penalty.  In the library `MLmetrics` the function `log_loss()` computes the cross entropy for the binomial distribution without penalty term. 

First, get predicted output probabilities on test data from `model2` and save them as `prob_model2`.
Second, use `log_loss()` from the metrics module in scikit-learn to compute the cross-entropy loss for `model2` on the test data and save it as `logloss_model2`. 
Third, use the string variable `performance_comparison_1h` to describe how the accuracy on test data differs between `model1b` and `model2`.

In [36]:
# CODE_CHUNK code_chunk_08
from sklearn.metrics import log_loss

# 1. 
prob_model2    = model2.predict(X_test)

# 2.
logloss_model2 = log_loss(y_test, prob_model2)
print(logloss_model2)

# 3.
performance_comparison_1h = "??"


[1m572/572[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 849us/step
0.533637640646477


## Part 2: Tuning neural nets with caret

Keras provides functions that allow the use of scikit-learn for model tuning. Using this functionality requires relatively little effort and in this part we are going to practice the individual steps of model tuning with `sklearn`.


### Task 2a)
First, we need to define the architecture of the neural net that we tune and to compile the model. This needs to be done inside a function. The arguments of this function are the tuning parameters whose candidate values we want to feed into the function one by one.

In this task, we work with the architecture defined for `model2` in Task 1g. The only parameter that we want to tune is the regularization parameter for $\ell_2$ penalization inside the hidden units.

Now create a function `modelbuild_2a` which has one argument `l2_pen`. In this function, specify the architecture of a Keras model `model`, identical to that of `model2` and with $\ell_2$-penalty set to `l2_pen`. Compile the model with the same settings as in Task 1g and return it as function output.

In [82]:
# CODE_CHUNK code_chunk_09
def modelbuild_2a(l2_pen):
    

    model = keras.Sequential()

    model.add(layers.Input(shape=(X_train.shape[1],)))

    model.add(layers.Dense(30, activation= "relu", kernel_regularizer= l2(l2_pen)))

    model.add(layers.Dense(15, activation= "relu", kernel_regularizer= l2(l2_pen)))

    model.add(layers.Dense(1, activation = "sigmoid", kernel_regularizer= l2(l2_pen)))
    myopt = Adam(learning_rate = 0.00003)
    model.compile(loss = "binary_crossentropy", optimizer = myopt, metrics = ["accuracy"])

    return model



### Task 2b)
In order to make the function `modelbuild_2a` compatible with `sklearn`, we need to call it inside the `KerasClassifier()` function from the `wrappers` module of `scikit-learn`. As additional arguments, provide all arguments that you used when fitting `model2`.

In [87]:
# CODE_CHUNK code_chunk_10
from scikeras.wrappers import KerasClassifier



model2_sklearn_spec = KerasClassifier(model =modelbuild_2a,
                          l2_pen = l2_pen,
                          epochs = 250,
                          batch_size = 256,
                        )

### Task 2c)
Next, define a parameter grid `tune_grid_2c`. This must be a dictionary object. Ensure that the only object within `tune_grid_2c` is called `l2_pen`. Its values should be zero as well as $10^{r}$ for a grid of eleven $r$-values from $-4$ and $-1$ at equal distance.


In [88]:
# CODE_CHUNK code_chunk_11
tune_grid_2c = {"l2_pen": [0] + [10**x for x in np.linspace(-4,-1,11)]}

#tune_grid_2c = {"l2_pen": [0, 10**-3]}




### Task 2d)
Now, we can tune our model. That's computationally quite costly, so we will use merely a fraction of the available training data. The inputs and outputs for this task are given by

In [89]:
X_train_small,_ , y_train_small, _ = train_test_split(X_train, y_train, train_size=0.3, random_state=6)

Do the following:

1. Use `Kfold()` from the model selection module in scikit-learn to define a random partition of the training data into five folds. Use $5$ as your random seed. Save this partition as `cv_splits_2d`.
2. Call `GridSearchCV()` and use the wrapper function from Task 2b, the parameter grid from Task 2c as well as `cv_splits_2d` as arguments.
3. Apply the `fit()`-method to `GridSearchCV` and use `X_train_small` and `y_train_small` as inputs and outputs, respectively.

In [90]:
# CODE_CHUNK code_chunk_12
from sklearn.model_selection import (KFold, GridSearchCV)

import time

start_time = time.time()

# 1.
cv_splits_2d = KFold(n_splits=5, shuffle=True, random_state=5)

# 2.
NN_tune_2d = GridSearchCV(estimator=model2_sklearn_spec, param_grid= tune_grid_2c, cv= cv_splits_2d)

# 3.
NN_tune_2d.fit(
    X=X_train_small, 
    y=y_train_small, 
    validation_split=0.25, 
    callbacks=[EarlyStopping(patience=20, restore_best_weights=True)]
)

end_time = time.time()
elapsed_time = end_time - start_time
print(f"Cross validation finished in {elapsed_time} seconds")



Epoch 1/250
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - accuracy: 0.6818 - loss: 0.6679 - val_accuracy: 0.6903 - val_loss: 0.6527
Epoch 2/250
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6834 - loss: 0.6643 - val_accuracy: 0.6906 - val_loss: 0.6499
Epoch 3/250
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6854 - loss: 0.6610 - val_accuracy: 0.6918 - val_loss: 0.6473
Epoch 4/250
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6871 - loss: 0.6578 - val_accuracy: 0.6942 - val_loss: 0.6448
Epoch 5/250
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6896 - loss: 0.6548 - val_accuracy: 0.6942 - val_loss: 0.6423
Epoch 6/250
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6897 - loss: 0.6519 - val_accuracy: 0.6957 - val_loss: 0.6400
Epoch 7/250
[1m39/39[0m [32m━━━

### Task 2e)
By default, GridSearchCV saves the trained model with the best tuning parameter values as as object `best_estimator_` inside `NN_tune_2d`. However, in our case this model is a KerasClassifier object. We cannot use such an object for making predictions on test data. 

Still, the KerasClassifier object contains the trained model in the typical Keras format as object `model`. Extract this object-inside-the-object-inside-the-object and save it as `model3`.

In [91]:
# CODE_CHUNK code_chunk_13
model3 = NN_tune_2d.best_estimator_.model_


## Part 3: Saving, loading and retraining neural nets 

### Task 3a)

Training and tuning neural nets can take a lot of time. Therefore, it is possible to save entire fitted models to disk and to import them at a later point in time. Apply the `save()`-method to your most recent `model3` in order to save it as *DABN13_asst6_saved_model3* in your working directory.

Ensure that the model is saved in TensorFlow SavedModel format.

Additionally, use the file explorer in your operating system to look how exactly the model was saved on your hard drive. Describe this shortly in the string_variable `saved_model_3a`.

*Note:* Functions for saving and loading models are very nicely described in the [keras documentation](https://keras.io/api/saving/model_saving_and_loading/#save-method).





In [92]:
# CODE_CHUNK code_chunk_14
model3.save("DABN13_asst6_saved_model3.keras")
saved_model_3a = "??"

### Task 3b)

Now, use the `load_model` function in the `models` module of Keras to load your saved model into your python session again. Save this model as `model4`.

*Note:* The possibility to load a previously saved model from your hard disk is useful for more than just your own models. It even allows you to load pretrained models for specific purposes from the TensorFlow Hub or from Hugging Face. These models could then directly be used for prediction or fine-tuned on your data.

In [93]:
# CODE_CHUNK code_chunk_15
from tensorflow.keras.models import load_model

model4 = load_model("DABN13_asst6_saved_model3.keras")


### Task 3c)

When we tuned our most recent neural net, we did this on a relatively small fraction of the training data to reduce the computational cost. This was also the data used to train the best model that we extracted from `NN_tune_2d`.
Now that we have chosen an optimal tuning parameter, it makes sense to retrain the `model4` on the entire training data `X_train` and `y_train`. Do this by applying the `fit()` method to `model4`. As previously, training should be done for 250 epochs, unless early stopping with a patience of 20 epochs kicks in. Minibatches of $2^8$ data points should be used. Given the large amount of data, hold only 10% of the data aside for monitoring validation loss.

Once you retrained your model, obtain predicted class probabilities and save them as `prob_model4`. Then, get the log loss `logloss_model4` on the test data.

Finally, to what extend did model tuning and retraining change test set accuracy relative to that of `model2`? Comment on this in the string variable `performance_comparison_3c`

In [94]:
# CODE_CHUNK code_chunk_16
# 1.
model4_fit = model4.fit(x = X_train, y = y_train,
                          epochs = 250,
                          batch_size = 256,
                          validation_split= 0.25,
                          callbacks = [EarlyStopping(patience=20)])
# 2.
prob_model4    = model4.predict(X_test)
logloss_model4 = log_loss(y_test, prob_model4)
print(logloss_model4)

# 3.
performance_comparison_3c = "??"


Epoch 1/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6978 - loss: 0.5698 - val_accuracy: 0.9293 - val_loss: 0.3480
Epoch 2/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7057 - loss: 0.5609 - val_accuracy: 0.9118 - val_loss: 0.3729
Epoch 3/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7103 - loss: 0.5567 - val_accuracy: 0.9014 - val_loss: 0.3849
Epoch 4/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7135 - loss: 0.5535 - val_accuracy: 0.8985 - val_loss: 0.3929
Epoch 5/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7155 - loss: 0.5507 - val_accuracy: 0.8939 - val_loss: 0.3994
Epoch 6/250
[1m161/161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7189 - loss: 0.5482 - val_accuracy: 0.8885 - val_loss: 0.4050
Epoch 7/250
[1m161/16

## Part 4: Manual predictions from a trained neural net

In this part we will build predictions *manually* by extracting weights from the trained  `model2` and by constructing the transformations in the layers of the neural net ourselves.

### Task 4a)

Start this task by creating your own ReLU activation function. Save it as `ReLU`. Then, write your own sigmoid function for the output layer transformation. Save it as  `sigmoid`. 

In [None]:
# CODE_CHUNK code_chunk_17
# ReLU activation function
??

# Sigmoid activation function
??

### Task 4b)
In the slides for lecture 8 we discussed how units in the different layers of a neural net look like. Now we are going to use the equations both hidden units and output unit to construct output predictions for the $n_{test}$ data points in `X_test`. Please do the following:

1. Apply the `get_weights()` method on `model2` to obtain a list object which stores the weights and biases of the learned model. Save it as `weight_and_bias_4b`.
2. Extract the objects inside `weight_and_bias_4b` into the objects for $\mathbf{b}_1,\mathbf{W}_1, \mathbf{b}_2, \mathbf{W}_2,\mathbf{b}_3, \mathbf{W}_3$ defined in the code chunk below.
3. Construct the linear term of the first layer hidden units and save these linear terms as a $30 \times n_{train}$ vector `Z_1`.
4. Construct the $15 \times n_{train}$ vector `Z_2` of linear terms for the second layer hidden units. Then, obtain the linear term of the output unit and save it as `Z_3`.
5. Put `Z_3` into the output layer activation function in order to get predictions for the probability of a Bud Light purchase. Save this as $n_{train} \times 1$ vector `pred_own_2b`. 

*Hint*: You can use `dim()` to check the dimension of matrices and `length()` to check that of vectors. Additionally, to ensure that you got the correct result for `prob_model2_own_4b` you can compare it with the output of `predict()`.

In [None]:
# CODE_CHUNK code_chunk_18
# 1.
weight_and_bias_4b = ??

# 2.
WW_1 = ??
bb_1 = ??
WW_2 = ??
bb_2 = ??
WW_3 = ??
bb_3 = ??

# 3.
Z_1 = ??

# 4.
Z_2 = ??
Z_3 = ??

# Apply sigmoid activation to Z_3
prob_model2_own_4b = ??