## Motivation

In this notebook, we exame the idea declared in section 1.5.5 on Boston housing data, a regression task.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from keras.losses import MSE
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

from utils import get_gradient_loss_fn

tf.random.set_seed(42)

2024-03-21 12:38:38.325731: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## The Boston Housing Data

See: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset

In [2]:
column_names = [
    'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV',
]
data = pd.read_csv('data/housing.csv', header=None, delimiter=r"\s+", names=column_names)
data.head(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


In [3]:
# Let's scale the columns before plotting them against MEDV
min_max_scaler = MinMaxScaler()
column_sels = ['LSTAT', 'INDUS', 'NOX', 'PTRATIO', 'RM', 'TAX', 'DIS', 'AGE']
x = data.loc[:,column_sels]
y = data['MEDV']
x = pd.DataFrame(data=min_max_scaler.fit_transform(x), columns=column_sels)

In [4]:
y =  np.log1p(y)
for col in x.columns:
    if np.abs(x[col].skew()) > 0.3:
        x[col] = np.log1p(x[col])

In [5]:
x_train, x_test, y_train, y_test = train_test_split(x.values, y.values)

## Train a Model with Gradient Loss

In [6]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(1),
])

gradient_loss_fn = get_gradient_loss_fn(
    lambda inputs: MSE(inputs[1], tf.squeeze(model(inputs[0])))
)

In [7]:
optimizer = tf.optimizers.Adam()

@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        loss = gradient_loss_fn((x, y))
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(zip(grads, model.variables))
    return loss

In [8]:
def evaluate(model):
    return MSE(y_test, tf.squeeze(model(x_test)))

In [9]:
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
ds = ds.batch(100)

In [10]:
for epoch in range(1000):
    for x, y in ds:
        loss = train_step(x, y)
    if epoch % 100 == 0:
        print(epoch, loss.numpy(), evaluate(model).numpy())
print(epoch, loss.numpy(), evaluate(model).numpy())

0 0.006087185222929046 8.772951
100 3.632833082686013e-05 0.03719834
200 3.4149459069520544e-05 0.033582423
300 3.217657818045379e-05 0.03131363
400 3.109250000287398e-05 0.029599959
500 3.153061858664854e-05 0.028452879
600 3.0249257843927563e-05 0.027725188
700 3.0093021347029008e-05 0.027242636
800 2.9586733841231784e-05 0.026891032
900 2.8929466270286247e-05 0.026641201
999 2.922454338065743e-05 0.026652211


In [11]:
evaluate(model)

<tf.Tensor: shape=(), dtype=float32, numpy=0.026652211>

## Baseline Model with Usual Loss

In [12]:
baseline_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(1)
])

In [13]:
baseline_model.compile(optimizer='adam', loss='mse')

In [14]:
baseline_model.fit(
    x_train, y_train,
    epochs=1000,
    validation_data=(x_test, y_test),
    verbose=2,
)

Epoch 1/1000
12/12 - 0s - loss: 8.1206 - val_loss: 5.9058 - 455ms/epoch - 38ms/step
Epoch 2/1000
12/12 - 0s - loss: 4.1835 - val_loss: 1.8395 - 24ms/epoch - 2ms/step
Epoch 3/1000
12/12 - 0s - loss: 0.9938 - val_loss: 0.9278 - 24ms/epoch - 2ms/step
Epoch 4/1000
12/12 - 0s - loss: 0.6948 - val_loss: 0.4496 - 24ms/epoch - 2ms/step
Epoch 5/1000
12/12 - 0s - loss: 0.3138 - val_loss: 0.2357 - 24ms/epoch - 2ms/step
Epoch 6/1000
12/12 - 0s - loss: 0.1751 - val_loss: 0.1588 - 25ms/epoch - 2ms/step
Epoch 7/1000
12/12 - 0s - loss: 0.1310 - val_loss: 0.1187 - 24ms/epoch - 2ms/step
Epoch 8/1000
12/12 - 0s - loss: 0.1065 - val_loss: 0.1059 - 24ms/epoch - 2ms/step
Epoch 9/1000
12/12 - 0s - loss: 0.0945 - val_loss: 0.0926 - 25ms/epoch - 2ms/step
Epoch 10/1000
12/12 - 0s - loss: 0.0849 - val_loss: 0.0838 - 23ms/epoch - 2ms/step
Epoch 11/1000
12/12 - 0s - loss: 0.0759 - val_loss: 0.0770 - 24ms/epoch - 2ms/step
Epoch 12/1000
12/12 - 0s - loss: 0.0706 - val_loss: 0.0734 - 22ms/epoch - 2ms/step
Epoch 13/10

<keras.src.callbacks.History at 0x7fd99f09ab50>

In [15]:
evaluate(baseline_model)

<tf.Tensor: shape=(), dtype=float32, numpy=0.022950811>

## Model Robustness

Now, we compare the robustness of the model and the baseline. To do so, we add Gaussian noise to the test data and check the accuracy.

In [16]:
stddev = 1e-1
noise = tf.random.normal(tf.shape(x_test)) * stddev

In [17]:
def evaluate_robustness(model):
    mse = MSE(y_test, tf.squeeze(model(x_test)))
    noised_mse = MSE(y_test, tf.squeeze(model(x_test+noise)))
    print(f'MSE: {mse} -> {noised_mse}')

In [18]:
evaluate_robustness(model)

MSE: 0.02665221132338047 -> 0.07023882120847702


In [19]:
evaluate_robustness(baseline_model)

MSE: 0.022950811311602592 -> 0.16962628066539764


## Conclusion

By simply using the "gradient loss", we obtain a result that approaches the baseline. But the robustness is greatly out-performs the baseline.