# Example 2: Regression

- Predict the output of a continuous value.
- Auto MPG Dataset 
- [tensorflow example](https://www.tensorflow.org/beta/tutorials/keras/basic_regression)


## Install and setup

**Install**

In [None]:
!pip install -q tensorflow==2.0.0-beta1 seaborn pydot

**Import**

In [None]:
import pathlib

import matplotlib.pyplot as plt
import pandas as pd

import tensorflow as tf
import seaborn as sns
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

## Train and test data

**Dataset path**

In [None]:
dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path

**Load data into pandas**

In [None]:
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail()

In [None]:
dataset['MPG'].plot()

**Clean the data**

In [None]:
dataset.isna().sum()

In [None]:
dataset = dataset.dropna()

In [None]:
origin = dataset.pop('Origin')

**ID the origins**

In [None]:
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
dataset.tail()

In [None]:
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

## Inspect the data

**Joint distribution**

In [None]:
sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

In [None]:
train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats

## Preprocess dataset

**Split features ($x$) from labels ($y$)**

In [None]:
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

In [None]:
input_shape = len(train_dataset.keys())

**Normalize**

In [None]:
def norm(x):
    return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

## Build the model

**Set up the layers**

Layers:
- Dense 'Relu' layer: (n, x, 64) -> (n, n*64)
- Dense 'Relu' layer: (n, 64) x (64, 64) --> (n, 64): N(W) = 4096, N(b) = 64
- Dense (1 node)    : (n, 64) x (64, 1) --> (n, 1)  : N(W) = 64, N(b) = 1

In [None]:
def build_model(input_shape, learning_rate=0.001):
    """Build sequential model
    """
    # Define model
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=[input_shape]),
        layers.Dense(64, activation='relu'),
        layers.Dense(1)
    ])
    
    # Define optimizer
    optimizer = tf.keras.optimizers.RMSprop(learning_rate)
    
    # Compile model
    model.compile(loss = 'mse',
                 optimizer = optimizer,
                 metrics= ['mae', 'mse'])
    
    return model

In [None]:
model = build_model(input_shape)

**Display model**

In [None]:
keras.utils.plot_model(model)

In [None]:
model.summary()

In [None]:
example_batch = normed_train_data[:10]
example_result = model.predict(example_batch)
example_result

## Train the model

**Train**

In [None]:
# Display training progress by printing a single dot for each completed epoch
class PrintDot(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        if epoch % 100 == 0:
            print('..{}'.format(epoch), end='')

EPOCHS = 1000

history = model.fit(
    normed_train_data, train_labels,
    epochs=EPOCHS, validation_split = 0.2, verbose=0,
    callbacks=[PrintDot()]
)

**Convergence history**

In [None]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [None]:
def plot_history(history):
    hist = pd.DataFrame(history.history)
    hist['epoch'] = history.epoch

    plt.figure(figsize=(12,4))
    plt.subplot(1,2,1)
    plt.xlabel('Epoch')
    plt.ylabel('Mean Abs Error [MPG]')
    plt.plot(hist['epoch'], hist['mae'],
           label='Train Error')
    plt.plot(hist['epoch'], hist['val_mae'],
           label = 'Val Error')
    plt.ylim([0,5])
    plt.legend()

    plt.subplot(1,2,2)
    plt.xlabel('Epoch')
    plt.ylabel('Mean Square Error [$MPG^2$]')
    plt.plot(hist['epoch'], hist['mse'],
           label='Train Error')
    plt.plot(hist['epoch'], hist['val_mse'],
           label = 'Val Error')
    plt.ylim([0,20])
    plt.legend()
    plt.show()

In [None]:
plot_history(history)

In [None]:
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)
print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

- **Summary:** We see improvement only till epoch 100, after no improvement.
- **Conclusion:** We might need to employ `EarlyStopping` to not overfit

## Rebuild better model

**Rebuild**

In [None]:
# Rebuild the model
model = build_model(input_shape)

# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

# Fit the model
history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
                    validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])
# Show convergence
plot_history(history)

**Evaluated**

In [None]:
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)
print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

- **Summary**: less error on our label ($y$) the MPG

## Make prediction

In [None]:
test_predictions = model.predict(normed_test_data).flatten()

plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])

In [None]:
error = test_predictions - test_labels
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [MPG]")
_ = plt.ylabel("Count")

## Conclusion

This notebook introduced a few techniques to handle a regression problem.
- Mean Squared Error (MSE) is a common loss function used for regression problems (different loss functions are used for classification problems).
- Similarly, evaluation metrics used for regression differ from classification. A common regression metric is Mean Absolute Error (MAE).
- When numeric input data features have values with different ranges, each feature should be scaled independently to the same range.
- If there is not much training data, one technique is to prefer a small network with few hidden layers to avoid overfitting.
- Early stopping is a useful technique to prevent overfitting.
