# Problem 1 - Pythagorean Distance

In [None]:
#Import Necessary Libraries
import tensorflow as tf
import keras
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, Input, MaxPooling2D
import numpy as np
import pandas as pd
import os

## Overview

We will explore the most basic neural network with a dummy dataset of Pythagorean relations in a triangle. Mathematically speaking, we wish to model the following relationship using our neural network:

$f(x_1, x_2) = \sqrt{x_1^2 + x_2^2}$

Clearly, ordinary linear regression would be unable to capture the non-linearities introduced by these operations. However, using the non-linear functions presented by neural networks in Keras, we can effectively model this relationship!

To do this, we start by generating random values for our $x_1$ and $x_2$, then computing $f(x_1, x_2)$.

In [None]:
# generates random x and the exact y's
X = np.random.rand(100000,2) * 10
y = np.array([np.sqrt(X[:,0]**2 + X[:,1]**2)]).T

# adds random noise to our training observations depending
# on the standard deviation stdev
stdev = 0.05
y += np.random.normal(scale = stdev, size = y.shape)

# training-test split 
split_prop = int(0.9 * len(X))
inds = np.random.permutation(len(X))
X_train, X_test = X[inds][:split_prop], X[inds][split_prop:]
y_train, y_test = y[inds][:split_prop], y[inds][split_prop:]

Verify that `X_train[i]` corresponds with a pair of $x_1$ and $x_2$ such that `y_train[i]` corresponds with $f(x_1, x_2)$

In [None]:
if stdev == 0:
    assert np.isclose(X_train[0][0] ** 2 + X_train[0][1] ** 2, y_train[0] ** 2,rtol = 1,atol = 3 * stdev), "Something is wrong!"
X_train, y_train

Let's construct our first Keras model step-by-step. The interface that we will use for this first model is the Sequential interface, where we `add()` each layer (i.e. fully connected layer, convolutional layer, pooling layer) separately. We are able to optionally specify a simple activation function as a parameter to these layers (note that more complex activation functions like LeakyRELU that rely on states require separate layers).

A simple visualization of the layers can be achieved through the `summary()` function in Keras.

In [None]:
# initialize our Sequential model: this will take in a series of 
# sequential input, hidden and output layers using .add(...)
model = Sequential()

# add input layer where the expected input shape is a 1x2 row vector
model.add(Dense(units = 2, activation='relu', input_shape=[2]))

# add hidden layers where the activation functions are non-linear so as
# to help us capture the non-linearities in Pythagorean relation
model.add(Dense(units = 10, activation='tanh'))
model.add(Dense(units = 5, activation='exponential'))

# add output layer, specifying that we want one output number
model.add(Dense(units = 1, activation='exponential'))

model.summary()

Let's train and test this model. We need to specify a few arguments such as the number of epochs (thought of as the number of passes over the training data), the loss function, and the optimizer. For the most part, we will use optimizers like Adam, RMSProp, or SGD - more advanced optimizers exist, but typically these 3 will be used the most.

In [None]:
epochs = 10 # how can we most appropriately choose the number of epochs?
loss = 'mse' # try 'mae', 'mean_absolute_percentage_error', ... how can we most appropriately choose a loss function?

# compile the model by specifying an optimization technique and loss function
model.compile(optimizer='Adam', loss=loss)

# train the model on our training data
model.fit(X_train, y_train, epochs = epochs, validation_split = 0.2);

In [None]:
# find prediction error
def mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

y_pred_test = model.predict(X_test)
y_pred_train = model.predict(X_train)

print("Training Error:", mse(y_train, y_pred_train))
print("Test Error:", mse(y_test, y_pred_test))

In [None]:
y_pred_test, y_test

# Questions

Remember to recompile the model everytime you change something

#####  (a) What are the shapes of the inputs and outputs for this model?

(YOUR ANSWER HERE)

#####  (b) What happens to the performance of the model as you increase the standard deviation of the noise?

(YOUR ANSWER HERE)

#####  (c) What happens if you add more layers? What happens if you remove some of the layers?

(YOUR ANSWER HERE)

##### (d) How does the number of epochs affect the training and test error?

(YOUR ANSWER HERE)

##### (e) What do you think of the overall performance of this neural network? Would you use this or another approach for this problem?

(YOUR ANSWER HERE)