# GreenValueNet

This notebook contains the code needed to execute the GreenValueNet hedonic pricing neural network. 

### Set up and data loading

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from data_load_funcs import get_params, load_data_catalogue
from processing_funcs import process_data, normalise_values
from model_funcs import *

params = get_params()
data_catalogue = load_data_catalogue()

If you do not have a file called `dataset.csv` in the `data/interim_files` folder the following cell will generate this folder and generate summary statistics. The data processing happens locally and invovles large datasets with spatial components so can take quite several hours - please be pateint! If you already have the file, it will be read in and summary statistics are generated. 

In [None]:
dataset = process_data(data_catalogue, params)

# show summary stats
summary_stats = dataset.describe().transpose()[['mean', 'std', 'max', 'min']]
summary_stats.columns = ['Mean', 'Std Dev', 'Maximum', 'Minimum']
print(summary_stats)

Now we normalise any non-encoded variables to increase speed of learning of algorithm and convert the dataset to an array of inputs, and an associated output array.

In [None]:
norm_cols = [col for col in dataset.columns if col not in params['non_norm_cols']]
for col in norm_cols:
    dataset[col] = normalise_values(dataset[col])

# creates an arry of shape m, x, y
x, y = create_x_y_arr(dataset, params)

The dataset is then split into train, dev and test sets using sci-kit learn.

In [None]:
x_train, x_dev, x_test, y_train, y_dev, y_test = split_to_test_dev_train(
    x,
    y,
    params['dev_size'],
    params['test_size'],
    prop=False
)

## Benchmarking

To evaluate the performance of my neural network I will run random forest and XGBoost regressions as baseline models. I will then build 2 alternative models: a deep neural network and a bayesian model. We optimise based on the mean squared error (MSE) but and report this as our measure of performance.

### Random Forest


In [None]:
# run baseline random forest regression using scikit-learn
rfr_model = random_forest_reg(
    x_train,
    y_train,
    tuning=False
)

# now run with grid search to tune parameters
rfr_tuned  =  random_forest_reg(
    x_train,
    y_train,
    tuning=True,
    tuning_params = params['tuning_dict']['grid']
)

# generate predictions and measure according to mean squared error
rfr_pred, rfr_mse = generate_pred_metric(rfr_model, mean_squared_error, x_dev, y_dev)
rfr_t_pred, rfr_t_mse = generate_pred_metric(rfr_tuned, mean_squared_error, x_dev, y_dev)

### XGBoost

In [None]:
xgb_model = boosted_grad_reg(x_train, y_train)
xgb_pred, xgb_mse = generate_pred_metric(xgb_model, mean_squared_error, x_dev, y_dev)

## Neural networks

We know build some neural networks. Number of epochs, hidden layers, and nodes in hidden layers is initially set with rules of thumb but then optimiszed using hyperparameter tuning.

In [None]:
# set epochs to be 3 times number of features
epochs = int(x_train.shape[1]) * 3

# set n_hidden_units to be mean of input and output layer sizes
n_hidden_units = round((x_train.shape[1] + 1) / 2)

### Single Layer Neural Network

A single hidden layer with ReLU activation is used with a linear output layer.

In [88]:
single_nn = neural_net(
    x_train,
    y_train,
    n_hidden_units = n_hidden_units,
    epochs = epochs,
    validation_data = (x_dev, y_dev)
)

Epoch 1/69
Epoch 2/69
Epoch 3/69
Epoch 4/69
Epoch 5/69
Epoch 6/69
Epoch 7/69
Epoch 8/69
Epoch 9/69
Epoch 10/69
Epoch 11/69
Epoch 12/69
Epoch 13/69
Epoch 14/69
Epoch 15/69
Epoch 16/69
Epoch 17/69
Epoch 18/69
Epoch 19/69
Epoch 20/69
Epoch 21/69
Epoch 22/69
Epoch 23/69
Epoch 24/69
Epoch 25/69
Epoch 26/69
Epoch 27/69
Epoch 28/69
Epoch 29/69
Epoch 30/69
Epoch 31/69
Epoch 32/69
Epoch 33/69
Epoch 34/69
Epoch 35/69
Epoch 36/69
Epoch 37/69
Epoch 38/69
Epoch 39/69
Epoch 40/69
Epoch 41/69
Epoch 42/69
Epoch 43/69
Epoch 44/69
Epoch 45/69
Epoch 46/69
Epoch 47/69
Epoch 48/69
Epoch 49/69
Epoch 50/69
Epoch 51/69
Epoch 52/69
Epoch 53/69
Epoch 54/69
Epoch 55/69
Epoch 56/69
Epoch 57/69
Epoch 58/69
Epoch 59/69
Epoch 60/69
Epoch 61/69
Epoch 62/69
Epoch 63/69
Epoch 64/69
Epoch 65/69
Epoch 66/69
Epoch 67/69
Epoch 68/69
Epoch 69/69


In [91]:
single_nn.history.history

{'loss': [136.3968048095703,
  46.19175720214844,
  16.509822845458984,
  24.60244369506836,
  5.955211639404297,
  10.539754867553711,
  8.010050773620605,
  2.6183974742889404,
  4.186127185821533,
  2.4716479778289795,
  1.7420923709869385,
  2.2516214847564697,
  1.3061832189559937,
  1.6037441492080688,
  1.327576994895935,
  1.1223970651626587,
  1.1854450702667236,
  0.9423322081565857,
  1.0089224576950073,
  0.8746951222419739,
  0.8565689325332642,
  0.8118577003479004,
  0.7847859263420105,
  0.764011800289154,
  0.7308076024055481,
  0.7014819979667664,
  0.6861767172813416,
  0.6691266298294067,
  0.6542291641235352,
  0.6371952891349792,
  0.6249568462371826,
  0.6107545495033264,
  0.5932189226150513,
  0.5858515501022339,
  0.5719519257545471,
  0.5639655590057373,
  0.5509668588638306,
  0.5444613695144653,
  0.534468412399292,
  0.5252289772033691,
  0.5187070369720459,
  0.5070637464523315,
  0.5148545503616333,
  0.4976750910282135,
  0.4963563084602356,
  0.5127622

### Deep Neural Network

The full model is specified as a deep neural network using layers with ReLU activation functions with a linear activation in the output layer. The choice of number of layers was initially kept small due to computational processing constraints.

In [92]:
deep_nn = neural_net(
    x_train,
    y_train,
    n_layers = 5,
    n_hidden_units = n_hidden_units,
    epochs = epochs,
    validation_data = (x_dev, y_dev)
)

Epoch 1/69
Epoch 2/69
Epoch 3/69
Epoch 4/69
Epoch 5/69
Epoch 6/69
Epoch 7/69
Epoch 8/69
Epoch 9/69
Epoch 10/69
Epoch 11/69
Epoch 12/69
Epoch 13/69
Epoch 14/69
Epoch 15/69
Epoch 16/69
Epoch 17/69
Epoch 18/69
Epoch 19/69
Epoch 20/69
Epoch 21/69
Epoch 22/69
Epoch 23/69
Epoch 24/69
Epoch 25/69
Epoch 26/69
Epoch 27/69
Epoch 28/69
Epoch 29/69
Epoch 30/69
Epoch 31/69
Epoch 32/69
Epoch 33/69
Epoch 34/69
Epoch 35/69
Epoch 36/69
Epoch 37/69
Epoch 38/69
Epoch 39/69
Epoch 40/69
Epoch 41/69
Epoch 42/69
Epoch 43/69
Epoch 44/69
Epoch 45/69
Epoch 46/69
Epoch 47/69
Epoch 48/69
Epoch 49/69
Epoch 50/69
Epoch 51/69
Epoch 52/69
Epoch 53/69
Epoch 54/69
Epoch 55/69
Epoch 56/69
Epoch 57/69
Epoch 58/69
Epoch 59/69
Epoch 60/69
Epoch 61/69
Epoch 62/69
Epoch 63/69
Epoch 64/69
Epoch 65/69
Epoch 66/69
Epoch 67/69
Epoch 68/69
Epoch 69/69


In [None]:
# save model as tensor object and stick in a folder called outputs
model_dir = cwd / "outputs" / "models"
deep_nn.export(model_dir / "deep_nn.tf") # check this file ending