## Motivation

In this note, we exame the idea declared in section 1.5.5 on Boston housing data.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from keras.losses import MSE
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

from utils import get_gradient_loss_fn

tf.random.set_seed(42)

2024-03-19 18:30:39.481149: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## The Boston Housing Data

See: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset

In [2]:
column_names = [
    'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV',
]
data = pd.read_csv('data/housing.csv', header=None, delimiter=r"\s+", names=column_names)
data.head(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


In [3]:
# Let's scale the columns before plotting them against MEDV
min_max_scaler = MinMaxScaler()
column_sels = ['LSTAT', 'INDUS', 'NOX', 'PTRATIO', 'RM', 'TAX', 'DIS', 'AGE']
x = data.loc[:,column_sels]
y = data['MEDV']
x = pd.DataFrame(data=min_max_scaler.fit_transform(x), columns=column_sels)

In [4]:
y =  np.log1p(y)
for col in x.columns:
    if np.abs(x[col].skew()) > 0.3:
        x[col] = np.log1p(x[col])

In [5]:
x_train, x_test, y_train, y_test = train_test_split(x.values, y.values)

## Train a Model with Gradient Loss

In [6]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(1),
])

get_gradient_loss = get_gradient_loss_fn(
    lambda inputs: MSE(inputs[1], tf.squeeze(model(inputs[0])))
)

In [7]:
optimizer = tf.optimizers.Adam()

@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        loss = get_gradient_loss((x, y))
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(zip(grads, model.variables))
    return loss

In [8]:
def evaluate(model):
    return MSE(y_test, tf.squeeze(model(x_test)))

In [9]:
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
ds = ds.batch(100)

In [10]:
for epoch in range(2000):
    for x, y in ds:
        loss = train_step(x, y)
    if epoch % 100 == 0:
        print(epoch, loss.numpy(), evaluate(model).numpy())
print(epoch, loss.numpy(), evaluate(model).numpy())

0 0.005141335158100979 8.191716
100 1.94571385429619e-05 0.036862753
200 1.528504609698055e-05 0.03124485
300 1.3456568371501947e-05 0.028420532
400 1.2344016569283535e-05 0.026874064
500 1.1357751749360308e-05 0.025758006
600 1.087353680021552e-05 0.024884393
700 1.0211986116530476e-05 0.024316853
800 9.856670365455072e-06 0.024028713
900 9.562567077824476e-06 0.023820836
1000 9.40794355270375e-06 0.023432732
1100 9.110835723544846e-06 0.023263726
1200 8.76761751961499e-06 0.0232434
1300 8.920918668679622e-06 0.023224179
1400 8.769170113387015e-06 0.023093132
1500 8.455866683362077e-06 0.022890754
1600 8.721950648930202e-06 0.022727018
1700 8.24271999926811e-06 0.022721056
1800 7.947140083652859e-06 0.022767056
1900 7.894497351277498e-06 0.022711216
1999 7.574568930680894e-06 0.022672188


In [11]:
evaluate(model)

<tf.Tensor: shape=(), dtype=float32, numpy=0.022672188>

## Baseline Model

In [12]:
baseline_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(128, 'relu'),
    tf.keras.layers.Dense(1)
])

In [13]:
baseline_model.compile(optimizer='adam', loss='mse')

In [14]:
baseline_model.fit(
    x_train, y_train,
    epochs=2000,
    validation_data=(x_test, y_test),
)

Epoch 1/2000
Epoch 2/2000
Epoch 3/2000
Epoch 4/2000
Epoch 5/2000
Epoch 6/2000
Epoch 7/2000
Epoch 8/2000
Epoch 9/2000
Epoch 10/2000
Epoch 11/2000
Epoch 12/2000
Epoch 13/2000
Epoch 14/2000
Epoch 15/2000
Epoch 16/2000
Epoch 17/2000
Epoch 18/2000
Epoch 19/2000
Epoch 20/2000
Epoch 21/2000
Epoch 22/2000
Epoch 23/2000
Epoch 24/2000
Epoch 25/2000
Epoch 26/2000
Epoch 27/2000
Epoch 28/2000
Epoch 29/2000
Epoch 30/2000
Epoch 31/2000
Epoch 32/2000
Epoch 33/2000
Epoch 34/2000
Epoch 35/2000
Epoch 36/2000
Epoch 37/2000
Epoch 38/2000
Epoch 39/2000
Epoch 40/2000
Epoch 41/2000
Epoch 42/2000
Epoch 43/2000
Epoch 44/2000
Epoch 45/2000
Epoch 46/2000
Epoch 47/2000
Epoch 48/2000
Epoch 49/2000
Epoch 50/2000
Epoch 51/2000
Epoch 52/2000
Epoch 53/2000
Epoch 54/2000
Epoch 55/2000
Epoch 56/2000
Epoch 57/2000
Epoch 58/2000
Epoch 59/2000
Epoch 60/2000
Epoch 61/2000
Epoch 62/2000
Epoch 63/2000
Epoch 64/2000
Epoch 65/2000
Epoch 66/2000
Epoch 67/2000
Epoch 68/2000
Epoch 69/2000
Epoch 70/2000
Epoch 71/2000
Epoch 72/2000
E

<keras.src.callbacks.History at 0x7f6cd8723410>

In [15]:
evaluate(baseline_model)

<tf.Tensor: shape=(), dtype=float32, numpy=0.032345474>

## Model Robustness

Now, we compare the robustness of the model and the baseline. To do so, we add Gaussian noise to the test data and check the accuracy.

In [20]:
stddev = 1e-1
noise = tf.random.normal(tf.shape(x_test)) * stddev

In [21]:
def evaluate_robustness(model):
    mse = MSE(y_test, tf.squeeze(model(x_test)))
    noised_mse = MSE(y_test, tf.squeeze(model(x_test+noise)))
    print(f'MSE: {mse} -> {noised_mse}')

In [22]:
evaluate_robustness(model)

MSE: 0.02267218753695488 -> 0.07080219686031342


In [23]:
evaluate_robustness(baseline_model)

MSE: 0.032345473766326904 -> 0.2310810089111328


## Conclusion

By simply using the "gradient loss", we obtain a result that even out-performs the baseline. But the robustness is greatly out-performs the baseline.