# Problem 1 - Pythagorean Distance

In [2]:
#Import Necessary Libraries
import tensorflow as tf
import keras
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, Input, MaxPooling2D
import numpy as np
import pandas as pd
import os

## Overview

We will explore the most basic neural network with a dummy dataset of Pythagorean relations in a triangle. Mathematically speaking, we wish to model the following relationship using our neural network:

$f(x_1, x_2) = \sqrt{x_1^2 + x_2^2}$

Clearly, ordinary linear regression would be unable to capture the non-linearities introduced by these operations. However, using the non-linear functions presented by neural networks in Keras, we can effectively model this relationship!

To do this, we start by generating random values for our $x_1$ and $x_2$, then computing $f(x_1, x_2)$.

In [3]:
# generates random x and the exact y's
X = np.random.rand(100000,2) * 10
y = np.array([np.sqrt(X[:,0]**2 + X[:,1]**2)]).T

# adds random noise to our training observations depending
# on the standard deviation stdev
stdev = 0.05
y += np.random.normal(scale = stdev, size = y.shape)

# training-test split 
split_prop = int(0.9 * len(X))
inds = np.random.permutation(len(X))
X_train, X_test = X[inds][:split_prop], X[inds][split_prop:]
y_train, y_test = y[inds][:split_prop], y[inds][split_prop:]

Verify that `X_train[i]` corresponds with a pair of $x_1$ and $x_2$ such that `y_train[i]` corresponds with $f(x_1, x_2)$

In [4]:
if stdev == 0:
    assert np.isclose(X_train[0][0] ** 2 + X_train[0][1] ** 2, y_train[0] ** 2,rtol = 1,atol = 3 * stdev), "Something is wrong!"
X_train, y_train

(array([[5.24873679, 4.37087753],
        [3.46083666, 1.54525046],
        [0.00971231, 2.5946869 ],
        ...,
        [9.30546637, 7.26111725],
        [2.44373281, 8.85285077],
        [6.17355592, 8.30607113]]),
 array([[ 6.8240648 ],
        [ 3.73969882],
        [ 2.63143795],
        ...,
        [11.77674043],
        [ 9.19321808],
        [10.31163437]]))

Let's construct our first Keras model step-by-step. The interface that we will use for this first model is the Sequential interface, where we `add()` each layer (i.e. fully connected layer, convolutional layer, pooling layer) separately. We are able to optionally specify a simple activation function as a parameter to these layers (note that more complex activation functions like LeakyRELU that rely on states require separate layers).

A simple visualization of the layers can be achieved through the `summary()` function in Keras.

In [5]:
# initialize our Sequential model: this will take in a series of 
# sequential input, hidden and output layers using .add(...)
model = Sequential()

# add input layer where the expected input shape is a 1x2 row vector
model.add(Dense(units = 2, activation='relu', input_shape=[2]))

# add hidden layers where the activation functions are non-linear so as
# to help us capture the non-linearities in Pythagorean relation
model.add(Dense(units = 10, activation='tanh'))
model.add(Dense(units = 5, activation='exponential'))

# add output layer, specifying that we want one output number
model.add(Dense(units = 1, activation='exponential'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2)                 6         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                30        
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 55        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 6         
Total params: 97
Trainable params: 97
Non-trainable params: 0
_________________________________________________________________


Let's train and test this model. We need to specify a few arguments such as the number of epochs (thought of as the number of passes over the training data), the loss function, and the optimizer. For the most part, we will use optimizers like Adam, RMSProp, or SGD - more advanced optimizers exist, but typically these 3 will be used the most.

In [6]:
epochs = 10 # how can we most appropriately choose the number of epochs?
loss = 'mse' # try 'mae', 'mean_absolute_percentage_error', ... how can we most appropriately choose a loss function?

# compile the model by specifying an optimization technique and loss function
model.compile(optimizer='Adam', loss=loss)

# train the model on our training data
model.fit(X_train, y_train, epochs = epochs, validation_split = 0.2);

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [7]:
# find prediction error
def mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

y_pred_test = model.predict(X_test)
y_pred_train = model.predict(X_train)

print("Training Error:", mse(y_train, y_pred_train))
print("Test Error:", mse(y_test, y_pred_test))

Training Error: 0.006627020552956391
Test Error: 0.0066704840487086785


In [8]:
y_pred_test, y_test

(array([[ 3.5349727],
        [ 5.1418886],
        [ 9.31138  ],
        ...,
        [ 3.2122893],
        [ 4.6933117],
        [11.452698 ]], dtype=float32),
 array([[ 3.48411708],
        [ 5.11029192],
        [ 9.20096614],
        ...,
        [ 3.19650241],
        [ 4.65777527],
        [11.31280545]]))

# Questions

Remember to recompile the model everytime you change something

#####  (a) What are the shapes of the inputs and outputs for this model?

_*The shape of X_train is (90000, 2) and the shape of y_train is (90000, 1)*_

#####  (b) What happens to the performance of the model as you increase the standard deviation of the noise?

_*The model's levels of loss increase proportionally, but as long as the standard deviation is within reasonable bounds, the model still seems to be useful*_

#####  (c) What happens if you add more layers? What happens if you remove some of the layers?

_*Adding more layers makes it so the training time increases fairly drastically, although it allows training error to somewhat drop (even though test error doesn't drop much). Removing layers makes the training much faster, but doesn't allow the model to fit to the data entirely.*_

##### (d) How does the number of epochs affect the training and test error?

_*Having extremeley few epochs makes the model underfit (although not by too much since we have a lot of data). Adding a lot more epochs increases the test error while decreasing the training error, although past a certain point the difference becomes pretty unnoticable.*_

##### (e) What do you think of the overall performance of this neural network? Would you use this or another approach for this problem?

_*Neural networks are nice because you don't need to do the feature engineering yourself (i.e saying that the output is proportional to sums of squares of inputs). However, if we spent a little more time observing this data and doing that feature engineering, we could have just as easily reduced the problem to something that could be solved by linear regression. Neural networks didn't perform terribly, but, especially for a problem this simple, traditional ML methods are much more effective.*_