# A Neural Network for Regression (Estimate blood pressure from PPG signal)


Having gained some experience with neural networks, let us train a network that estimates the blood pressure from a PPG signal window.

All of your work for this exercise will be done in this notebook.

# A Photoplethysmograph (PPG) signal

A PPG (photoplethysmograph) signal is a signal obtained with a pulse oximeter, which illuminates the skin and measures changes in light absorption. A PPG signal carries rich information about the status of the cardiovascular health of a person, such as breadth rate, heart rate and blood pressure. An example is shown below, where you also see the blood pressure signal that we will estimate (the data also has the ECG signal, which you should ignore).

<img width="80%" src="PPG_ABG_ECG_example.png">


# Preparing the Dataset 

In this task, you are expected to perform the full pipeline for creating a learning system from scratch. Here is how you should construct the dataset:
* Download the dataset from the following website, and only take "Part 1" (since the whole dataset is too big): https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation
* Take a window of size $W$ from the PPG channel between time $t$ and $t+W$. Let us call this $\textbf{x}_t$.
* Take the corresponding window of size $W$ from the ABP (arterial blood pressure) channel between time $t$ and $t+W$. Find the maxima and minima of this signal within the window (you can use "findpeaks" from Matlab or "find_peaks_cwt" from scipy). Here is an example window from the ABP signal, and its peaks:
 <img width="60%" src="ABP_peaks.png">
    
* Calculate the average of the maxima, call it $y^1_t$, and the average of the minima, call it $y^2_t$.
* Slide the window (by an amount that is on the order of a few samples) over the PPG signals and collect many training instances of the form $\{\textbf{x}_t, <y^1_t, y^2_t>\}$ instances. 
* This will be your input-output for training the network. In other words, your network outputs two values.

In [1]:
import random
import numpy as np
from ceng783.utils import load_BP_dataset
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

In [2]:
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.
from ceng783.neural_net_for_regression import TwoLayerNet

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
  np.random.seed(0)
  return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
  np.random.seed(1)
  X = 10 * np.random.randn(num_inputs, input_size)
  y = np.array([[0, 1, 2], [1, 2, 3], [2, 3, 4], [2, 1, 4], [2, 1, 4]])
  return X, y

net = init_toy_model()
X, y = init_toy_data()

# Forward pass: compute scores
Open the file `ceng783/neural_net_for_regression.py` and look at the method `TwoLayerNet.loss`. This function is very similar to the loss functions you have written for the previous exercises: It takes the data and weights and computes the *regression* scores, the *squared error loss*, and the gradients on the parameters. 

To be more specific, you will implement the following loss function:

$$\frac{1}{N}\frac{1}{2}\sum_i\sum_{j} (o_{ij} - y_{ij})^2 + \lambda\frac{1}{2}\sum_j w_j^2,$$

where $i$ runs through the $N$ samples in the batch; $o_{ij}$ is the prediction of the network for the $i^{th}$ sample for output $j$, and $y_{ij}$ is the correct value; $\lambda$ is the weight of the regularization term.

The first layer uses ReLU as the activation function. The output layer does not use any activation functions.

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.

In [3]:
scores = net.loss(X)
print 'Your scores:'
print scores
print
print 'correct scores:'
correct_scores = np.asarray([
  [-0.81233741, -1.27654624, -0.70335995],
  [-0.17129677, -1.18803311, -0.47310444],
  [-0.51590475, -1.01354314, -0.8504215 ],
  [-0.15419291, -0.48629638, -0.52901952],
  [-0.00618733, -0.12435261, -0.15226949]])
print correct_scores
print

# The difference should be very small. We get < 1e-7
print 'Difference between your scores and correct scores:'
print np.sum(np.abs(scores - correct_scores))

Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.6802720496109664e-08


# Forward pass: compute loss
In the same function, implement the second part that computes the data and regularizaion loss.

In [4]:
loss, _ = net.loss(X, y, reg=0.1)
print "Your loss: ", loss
correct_loss = 13.2984798095

# should be very small, we get < 1e-10
print 'Difference between your loss and correct loss:'
print np.sum(np.abs(loss - correct_loss))

Your loss:  13.298479809544318
Difference between your loss and correct loss:
4.431832678619685e-11


# Backward pass
Implement the rest of the function. This will compute the gradient of the loss with respect to the variables `W1`, `b1`, `W2`, and `b2`. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:

In [5]:
from ceng783.utils import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.1)

# these should all be less than 1e-8 or so
for param_name in grads:
  f = lambda W: net.loss(X, y, reg=0.1)[0]
  param_grad_num = eval_numerical_gradient(f, net.params[param_name])
  print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))

b2 max relative error: 1.443397e-06
b1 max relative error: 2.190492e-07
W1 max relative error: 5.459961e-04
W2 max relative error: 3.756381e-04


# Load the PPG dataset for training your regression network

In [29]:
# Load the PPG dataset
# If your memory turns out to be sufficient, try loading a subset
#
# TODO: Open up ceng783/utils.py and fill in the `load_BP_dataset()' function.
#
def get_data(datafile, training_ratio=0.9, test_ratio=0.06, val_ratio=0.01):
  # Load the PPG training data 
  X, y = load_BP_dataset(datafile)

  ################################################################
  # TODO: Split the data into training, validation and test sets #
  ################################################################
  length = len(y)
  y /= 100
  split_idx = int(length * 0.8)
  test_idx = int(length * 0.90)
    
  X_train = X[:split_idx]
  X_val = X[split_idx:test_idx]
  X_test = X[test_idx:]
  
  y_train = y[:split_idx]
  y_val = y[split_idx:test_idx]
  y_test = y[test_idx:]

  #########    END OF YOUR CODE    ###############################
  ################################################################
  
  return X_train, y_train, X_val, y_val, X_test, y_test

datafile = 'ceng783/data/Part_1.mat' #TODO: PATH to your data file
input_size = 182 * 4# TODO: Size of the input of the network

X_train, y_train, X_val, y_val, X_test, y_test = get_data(datafile)
print "Number of instances in the training set: ", len(X_train)
print "Number of instances in the validation set: ", len(X_val)
print "Number of instances in the testing set: ", len(X_test)

Number of instances in the training set:  89737
Number of instances in the validation set:  11217
Number of instances in the testing set:  11218


# Now train our network on the PPG dataset

In [30]:
# Now, let's train a neural network
input_size = input_size
hidden_size = 500 # TODO: Choose a suitable hidden layer size
num_classes = 2 # We have two outputs
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=5000, batch_size=16,
            learning_rate=1, learning_rate_decay=0.90,
            reg=0.005, verbose=True)

# Predict on the validation set
#val_err = ... # TODO: Perform prediction on the validation set
val_err = np.sum(np.square(net.predict(X_val) - y_val), axis=1).mean()
print 'Validation error: ', val_err

iteration 0 / 5000: loss 0.980879
iteration 100 / 5000: loss 0.001785
iteration 200 / 5000: loss 0.000775
iteration 300 / 5000: loss 0.009763
iteration 400 / 5000: loss 0.003624
iteration 500 / 5000: loss 0.060081
iteration 600 / 5000: loss 0.005458
iteration 700 / 5000: loss 0.009271
iteration 800 / 5000: loss 0.010488
iteration 900 / 5000: loss 0.005587
iteration 1000 / 5000: loss 0.001785
iteration 1100 / 5000: loss 0.030723
iteration 1200 / 5000: loss 0.016017
iteration 1300 / 5000: loss 0.021237
iteration 1400 / 5000: loss 0.002034
iteration 1500 / 5000: loss 0.000602
iteration 1600 / 5000: loss 0.010787
iteration 1700 / 5000: loss 0.039325
iteration 1800 / 5000: loss 0.006094
iteration 1900 / 5000: loss 0.002968
iteration 2000 / 5000: loss 0.003158
iteration 2100 / 5000: loss 0.018012
iteration 2200 / 5000: loss 0.001576
iteration 2300 / 5000: loss 0.008225
iteration 2400 / 5000: loss 0.003085
iteration 2500 / 5000: loss 0.009099
iteration 2600 / 5000: loss 0.005042
iteration 270

# Debug the training and improve learning
You should be able to get a validation error of ~16.

So far so good. But, is it really good? Let us plot the validation and training errors to see how good the network did. Did it memorize or generalize? Discuss your observations and conclusions. If its performance is not looking good, propose and test measures. This is the part that will show me how well you have digested everything covered in the lectures.

In [None]:
# Plot the loss function and train / validation errors
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
train = plt.plot(stats['train_err_history'], label='train')
val = plt.plot(stats['val_err_history'], label='val')
plt.legend(loc='upper right', shadow=True)
plt.title('Classification error history')
plt.xlabel('Epoch')
plt.ylabel('Clasification error')
plt.show()

# Finetuning and Improving Your Network (Bonus)
There are many aspects and hyper-parameters you can play with. Do play with them and find the best setting here.

In [14]:
results = {}
best_val = -1
batch_sizes = [1, 2, 4, 8]
learning_rates = [1,1e-1, 5e-1, 1.5e-2, 4e-2]
regularization_strengths = [0.01, 0.001, 0.005]    
hidden_size = [100, 500, 1000]
num_classes = 2
k=0
while k<len(hidden_size):
    i=0
    while i< len(learning_rates):
        j = 0
        while j < len(regularization_strengths):
            z = 0
            while z < len(batch_sizes):
                net = TwoLayerNet(input_size, hidden_size[k], num_classes)
                stats = net.train(X_train, y_train, X_val, y_val,
                    num_iters=2000, batch_size=batch_sizes[z],
                    learning_rate=learning_rates[i], learning_rate_decay=0.95,
                    reg=regularization_strengths[j], verbose=True)
                val_acc = np.sum(np.square(net.predict(X_val) - y_val), axis=1).mean()
                train_acc = np.sum(np.square(net.predict(X_train) - y_train), axis=1).mean()
                if best_val < val_acc:
                    best_val = val_acc
                    best_net = net
                results[(hidden_size[k],learning_rates[i], regularization_strengths[j])] = (train_acc, val_acc)
                z=z+1
            j=j+1
        i=i+1
    k=k+1

iteration 0 / 2000: loss 0.971397
iteration 100 / 2000: loss 0.000333
iteration 200 / 2000: loss 0.000043
iteration 300 / 2000: loss 0.002787
iteration 400 / 2000: loss 0.000873
iteration 500 / 2000: loss 0.001628
iteration 600 / 2000: loss 0.004049
iteration 700 / 2000: loss 0.000057
iteration 800 / 2000: loss 0.000045
iteration 900 / 2000: loss 0.005698
iteration 1000 / 2000: loss 0.003008
iteration 1100 / 2000: loss 0.000779
iteration 1200 / 2000: loss 0.002803
iteration 1300 / 2000: loss 0.002433
iteration 1400 / 2000: loss 0.000906
iteration 1500 / 2000: loss 0.000126
iteration 1600 / 2000: loss 0.000063
iteration 1700 / 2000: loss 0.000376
iteration 1800 / 2000: loss 0.001233
iteration 1900 / 2000: loss 0.001306
iteration 0 / 2000: loss 0.995155
iteration 100 / 2000: loss 0.035500
iteration 200 / 2000: loss 0.006171
iteration 300 / 2000: loss 0.007698
iteration 400 / 2000: loss 0.000281
iteration 500 / 2000: loss 0.001555
iteration 600 / 2000: loss 0.002086
iteration 700 / 2000: 

iteration 800 / 2000: loss 0.000501
iteration 900 / 2000: loss 0.002305
iteration 1000 / 2000: loss 0.000866
iteration 1100 / 2000: loss 0.009002
iteration 1200 / 2000: loss 0.000732
iteration 1300 / 2000: loss 0.004140
iteration 1400 / 2000: loss 0.003511
iteration 1500 / 2000: loss 0.010312
iteration 1600 / 2000: loss 0.000272
iteration 1700 / 2000: loss 0.007109
iteration 1800 / 2000: loss 0.000088
iteration 1900 / 2000: loss 0.000284
iteration 0 / 2000: loss 0.984133
iteration 100 / 2000: loss 0.136371
iteration 200 / 2000: loss 0.052586
iteration 300 / 2000: loss 0.021955
iteration 400 / 2000: loss 0.008104
iteration 500 / 2000: loss 0.004408
iteration 600 / 2000: loss 0.002187
iteration 700 / 2000: loss 0.004025
iteration 800 / 2000: loss 0.000328
iteration 900 / 2000: loss 0.028638
iteration 1000 / 2000: loss 0.000552
iteration 1100 / 2000: loss 0.001709
iteration 1200 / 2000: loss 0.004200
iteration 1300 / 2000: loss 0.013404
iteration 1400 / 2000: loss 0.000267
iteration 1500 

iteration 1500 / 2000: loss 0.003723
iteration 1600 / 2000: loss 0.010418
iteration 1700 / 2000: loss 0.001718
iteration 1800 / 2000: loss 0.003642
iteration 1900 / 2000: loss 0.002550
iteration 0 / 2000: loss 0.925775
iteration 100 / 2000: loss 0.012497
iteration 200 / 2000: loss 0.004456
iteration 300 / 2000: loss 0.013484
iteration 400 / 2000: loss 0.020100
iteration 500 / 2000: loss 0.038432
iteration 600 / 2000: loss 0.026972
iteration 700 / 2000: loss 0.019578
iteration 800 / 2000: loss 0.013775
iteration 900 / 2000: loss 0.008326
iteration 1000 / 2000: loss 0.109120
iteration 1100 / 2000: loss 0.007688
iteration 1200 / 2000: loss 0.021287
iteration 1300 / 2000: loss 0.007840
iteration 1400 / 2000: loss 0.007215
iteration 1500 / 2000: loss 0.005794
iteration 1600 / 2000: loss 0.014030
iteration 1700 / 2000: loss 0.016637
iteration 1800 / 2000: loss 0.010407
iteration 1900 / 2000: loss 0.018337
iteration 0 / 2000: loss 0.980863
iteration 100 / 2000: loss 0.004960
iteration 200 / 2

iteration 0 / 2000: loss 0.980863
iteration 100 / 2000: loss 1.043743
iteration 200 / 2000: loss 0.413478
iteration 300 / 2000: loss 0.172213
iteration 400 / 2000: loss 0.068791
iteration 500 / 2000: loss 0.110012
iteration 600 / 2000: loss 0.020732
iteration 700 / 2000: loss 0.013932
iteration 800 / 2000: loss 0.012030
iteration 900 / 2000: loss 0.006525
iteration 1000 / 2000: loss 0.001949
iteration 1100 / 2000: loss 0.029103
iteration 1200 / 2000: loss 0.014411
iteration 1300 / 2000: loss 0.021279
iteration 1400 / 2000: loss 0.008637
iteration 1500 / 2000: loss 0.001761
iteration 1600 / 2000: loss 0.031800
iteration 1700 / 2000: loss 0.049100
iteration 1800 / 2000: loss 0.006184
iteration 1900 / 2000: loss 0.005461
iteration 0 / 2000: loss 0.971403
iteration 100 / 2000: loss 0.286123
iteration 200 / 2000: loss 0.260107
iteration 300 / 2000: loss 0.238199
iteration 400 / 2000: loss 0.215671
iteration 500 / 2000: loss 0.196430
iteration 600 / 2000: loss 0.182179
iteration 700 / 2000: 

iteration 600 / 2000: loss 0.016755
iteration 700 / 2000: loss 0.003702
iteration 800 / 2000: loss 0.003993
iteration 900 / 2000: loss 0.005636
iteration 1000 / 2000: loss 0.002192
iteration 1100 / 2000: loss 0.002609
iteration 1200 / 2000: loss 0.001838
iteration 1300 / 2000: loss 0.003915
iteration 1400 / 2000: loss 0.002174
iteration 1500 / 2000: loss 0.001022
iteration 1600 / 2000: loss 0.001065
iteration 1700 / 2000: loss 0.004004
iteration 1800 / 2000: loss 0.002471
iteration 1900 / 2000: loss 0.001099
iteration 0 / 2000: loss 0.995147
iteration 100 / 2000: loss 0.073507
iteration 200 / 2000: loss 0.004600
iteration 300 / 2000: loss 0.007412
iteration 400 / 2000: loss 0.002340
iteration 500 / 2000: loss 0.001342
iteration 600 / 2000: loss 0.001559
iteration 700 / 2000: loss 0.002934
iteration 800 / 2000: loss 0.001191
iteration 900 / 2000: loss 0.002600
iteration 1000 / 2000: loss 0.001915
iteration 1100 / 2000: loss 0.009160
iteration 1200 / 2000: loss 0.001837
iteration 1300 / 

iteration 1200 / 2000: loss 0.001341
iteration 1300 / 2000: loss 0.007345
iteration 1400 / 2000: loss 0.001706
iteration 1500 / 2000: loss 0.015460
iteration 1600 / 2000: loss 0.000448
iteration 1700 / 2000: loss 0.009687
iteration 1800 / 2000: loss 0.000843
iteration 1900 / 2000: loss 0.000334
iteration 0 / 2000: loss 0.984134
iteration 100 / 2000: loss 0.001837
iteration 200 / 2000: loss 0.006446
iteration 300 / 2000: loss 0.001207
iteration 400 / 2000: loss 0.000735
iteration 500 / 2000: loss 0.039146
iteration 600 / 2000: loss 0.001327
iteration 700 / 2000: loss 0.005650
iteration 800 / 2000: loss 0.000901
iteration 900 / 2000: loss 0.028059
iteration 1000 / 2000: loss 0.000685
iteration 1100 / 2000: loss 0.002097
iteration 1200 / 2000: loss 0.006994
iteration 1300 / 2000: loss 0.245974
iteration 1400 / 2000: loss 0.000675
iteration 1500 / 2000: loss 0.003010
iteration 1600 / 2000: loss 0.001661
iteration 1700 / 2000: loss 0.000736
iteration 1800 / 2000: loss 0.002169
iteration 190

iteration 1800 / 2000: loss 0.002826
iteration 1900 / 2000: loss 0.005440
iteration 0 / 2000: loss 0.925782
iteration 100 / 2000: loss 0.002861
iteration 200 / 2000: loss 0.000402
iteration 300 / 2000: loss 0.001735
iteration 400 / 2000: loss 0.005744
iteration 500 / 2000: loss 0.043310
iteration 600 / 2000: loss 0.032568
iteration 700 / 2000: loss 0.055678
iteration 800 / 2000: loss 0.013009
iteration 900 / 2000: loss 0.003661
iteration 1000 / 2000: loss 0.091921
iteration 1100 / 2000: loss 0.003391
iteration 1200 / 2000: loss 0.029359
iteration 1300 / 2000: loss 0.004424
iteration 1400 / 2000: loss 0.010565
iteration 1500 / 2000: loss 0.006444
iteration 1600 / 2000: loss 0.011692
iteration 1700 / 2000: loss 0.035052
iteration 1800 / 2000: loss 0.008075
iteration 1900 / 2000: loss 0.011727
iteration 0 / 2000: loss 0.980859
iteration 100 / 2000: loss 0.000731
iteration 200 / 2000: loss 0.000568
iteration 300 / 2000: loss 0.074445
iteration 400 / 2000: loss 0.015082
iteration 500 / 2000

iteration 400 / 2000: loss 0.003317
iteration 500 / 2000: loss 0.057437
iteration 600 / 2000: loss 0.005026
iteration 700 / 2000: loss 0.008930
iteration 800 / 2000: loss 0.010494
iteration 900 / 2000: loss 0.005569
iteration 1000 / 2000: loss 0.001793
iteration 1100 / 2000: loss 0.032285
iteration 1200 / 2000: loss 0.015956
iteration 1300 / 2000: loss 0.021292
iteration 1400 / 2000: loss 0.001854
iteration 1500 / 2000: loss 0.000642
iteration 1600 / 2000: loss 0.009215
iteration 1700 / 2000: loss 0.039204
iteration 1800 / 2000: loss 0.006022
iteration 1900 / 2000: loss 0.002808
iteration 0 / 2000: loss 0.971406
iteration 100 / 2000: loss 0.005436
iteration 200 / 2000: loss 0.004277
iteration 300 / 2000: loss 0.006290
iteration 400 / 2000: loss 0.003769
iteration 500 / 2000: loss 0.004023
iteration 600 / 2000: loss 0.006029
iteration 700 / 2000: loss 0.001695
iteration 800 / 2000: loss 0.001400
iteration 900 / 2000: loss 0.006818
iteration 1000 / 2000: loss 0.003934
iteration 1100 / 20

iteration 900 / 2000: loss 98.776790
iteration 1000 / 2000: loss 81.670516
iteration 1100 / 2000: loss 67.532916
iteration 1200 / 2000: loss 55.841347
iteration 1300 / 2000: loss 46.174196
iteration 1400 / 2000: loss 38.183942
iteration 1500 / 2000: loss 31.571315
iteration 1600 / 2000: loss 26.105929
iteration 1700 / 2000: loss 21.591912
iteration 1800 / 2000: loss 17.851534
iteration 1900 / 2000: loss 14.760545
iteration 0 / 2000: loss 0.995165
iteration 100 / 2000: loss 0.204524
iteration 200 / 2000: loss 0.163286
iteration 300 / 2000: loss 0.148456
iteration 400 / 2000: loss 0.111984
iteration 500 / 2000: loss 0.092624
iteration 600 / 2000: loss 0.076317
iteration 700 / 2000: loss 0.065936
iteration 800 / 2000: loss 0.052133
iteration 900 / 2000: loss 0.045326
iteration 1000 / 2000: loss 0.037699
iteration 1100 / 2000: loss 0.034437
iteration 1200 / 2000: loss 0.024924
iteration 1300 / 2000: loss 0.022874
iteration 1400 / 2000: loss 0.019225
iteration 1500 / 2000: loss 0.019333
ite

iteration 1000 / 2000: loss 0.061878
iteration 1100 / 2000: loss 0.059610
iteration 1200 / 2000: loss 0.049978
iteration 1300 / 2000: loss 0.048087
iteration 1400 / 2000: loss 0.041612
iteration 1500 / 2000: loss 0.041450
iteration 1600 / 2000: loss 0.034091
iteration 1700 / 2000: loss 0.039688
iteration 1800 / 2000: loss 0.041243
iteration 1900 / 2000: loss 0.025614
iteration 0 / 2000: loss 0.984146
iteration 100 / 2000: loss 0.137702
iteration 200 / 2000: loss 0.126252
iteration 300 / 2000: loss 0.112817
iteration 400 / 2000: loss 0.101923
iteration 500 / 2000: loss 0.096180
iteration 600 / 2000: loss 0.086190
iteration 700 / 2000: loss 0.083801
iteration 800 / 2000: loss 0.088785
iteration 900 / 2000: loss 0.088325
iteration 1000 / 2000: loss 0.071851
iteration 1100 / 2000: loss 0.061449
iteration 1200 / 2000: loss 0.061636
iteration 1300 / 2000: loss 0.074225
iteration 1400 / 2000: loss 0.045769
iteration 1500 / 2000: loss 0.041152
iteration 1600 / 2000: loss 0.044362
iteration 170

  loss += reg * 0.5 * np.sum(W1 * W1)
  loss += reg * 0.5 * np.sum(W2 * W2)
  loss = np.sum(np.square(scores - y)) / (2 * number_of_training)
  self.params['W1'] -= learning_rate * grads['W1']
  relu_1_o = np.maximum(X.dot(W1) + b1,0)
  tempw1 = np.where((relu_1_o > 0), los_score_derivative.dot(W2.T), 0)


iteration 100 / 2000: loss nan
iteration 200 / 2000: loss nan
iteration 300 / 2000: loss nan
iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan


  relu_layer_o = np.maximum(X.dot(W1) + b1, 0)


iteration 0 / 2000: loss 0.984135
iteration 100 / 2000: loss 419439.582233
iteration 200 / 2000: loss 161847.748870
iteration 300 / 2000: loss 62451.650446
iteration 400 / 2000: loss 24098.009458
iteration 500 / 2000: loss 9298.619545
iteration 600 / 2000: loss 3588.027430
iteration 700 / 2000: loss 1384.501819
iteration 800 / 2000: loss 534.232550
iteration 900 / 2000: loss 206.162185
iteration 1000 / 2000: loss 79.543966
iteration 1100 / 2000: loss 30.693884
iteration 1200 / 2000: loss 11.847419
iteration 1300 / 2000: loss 4.592309
iteration 1400 / 2000: loss 1.763737
iteration 1500 / 2000: loss 0.683299
iteration 1600 / 2000: loss 0.263656
iteration 1700 / 2000: loss 0.101607
iteration 1800 / 2000: loss 0.040866
iteration 1900 / 2000: loss 0.015780
iteration 0 / 2000: loss 0.925777
iteration 100 / 2000: loss nan
iteration 200 / 2000: loss nan
iteration 300 / 2000: loss nan
iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 20

iteration 1300 / 2000: loss 0.249501
iteration 1400 / 2000: loss 0.141536
iteration 1500 / 2000: loss 0.090624
iteration 1600 / 2000: loss 0.055647
iteration 1700 / 2000: loss 0.034198
iteration 1800 / 2000: loss 0.022846
iteration 1900 / 2000: loss 0.013793
iteration 0 / 2000: loss 0.925773
iteration 100 / 2000: loss 738.288907
iteration 200 / 2000: loss 458.869806
iteration 300 / 2000: loss 285.203482
iteration 400 / 2000: loss 177.265974
iteration 500 / 2000: loss 110.175424
iteration 600 / 2000: loss 68.481672
iteration 700 / 2000: loss 42.561371
iteration 800 / 2000: loss 26.456803
iteration 900 / 2000: loss 16.444264
iteration 1000 / 2000: loss 10.309962
iteration 1100 / 2000: loss 6.371850
iteration 1200 / 2000: loss 3.954761
iteration 1300 / 2000: loss 2.456861
iteration 1400 / 2000: loss 1.531247
iteration 1500 / 2000: loss 0.951953
iteration 1600 / 2000: loss 0.600919
iteration 1700 / 2000: loss 0.369411
iteration 1800 / 2000: loss 0.232943
iteration 1900 / 2000: loss 0.14879

  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)


iteration 100 / 2000: loss nan
iteration 200 / 2000: loss nan
iteration 300 / 2000: loss nan
iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 0.971415
iteration 100 / 2000: loss 0.056795
iteration 200 / 2000: loss 0.033096
iteration 300 / 2000: loss 0.006051
iteration 400 / 2000: loss 0.020944
iteration 500 / 2000: loss 0.010995
iteration 600 / 2000: loss 0.019226
iteration 700 / 2000: loss 0.003379
iteration 800 / 2000: loss 0.004359
iteration 900 / 2000: loss 0.005570
iteration 1000 / 2000: loss 0.002313
iterat

iteration 900 / 2000: loss 0.005063
iteration 1000 / 2000: loss 0.001864
iteration 1100 / 2000: loss 0.002268
iteration 1200 / 2000: loss 0.001408
iteration 1300 / 2000: loss 0.003573
iteration 1400 / 2000: loss 0.001723
iteration 1500 / 2000: loss 0.000669
iteration 1600 / 2000: loss 0.000693
iteration 1700 / 2000: loss 0.003640
iteration 1800 / 2000: loss 0.002178
iteration 1900 / 2000: loss 0.000802
iteration 0 / 2000: loss 0.995154
iteration 100 / 2000: loss 0.075362
iteration 200 / 2000: loss 0.004289
iteration 300 / 2000: loss 0.009283
iteration 400 / 2000: loss 0.001946
iteration 500 / 2000: loss 0.000845
iteration 600 / 2000: loss 0.001164
iteration 700 / 2000: loss 0.002451
iteration 800 / 2000: loss 0.000777
iteration 900 / 2000: loss 0.002247
iteration 1000 / 2000: loss 0.001473
iteration 1100 / 2000: loss 0.009258
iteration 1200 / 2000: loss 0.001416
iteration 1300 / 2000: loss 0.011222
iteration 1400 / 2000: loss 0.001843
iteration 1500 / 2000: loss 0.019136
iteration 1600

iteration 1500 / 2000: loss 0.003106
iteration 1600 / 2000: loss 0.002806
iteration 1700 / 2000: loss 0.007515
iteration 1800 / 2000: loss 0.018957
iteration 1900 / 2000: loss 0.000383
iteration 0 / 2000: loss 0.984129
iteration 100 / 2000: loss 0.009941
iteration 200 / 2000: loss 0.000472
iteration 300 / 2000: loss 0.000986
iteration 400 / 2000: loss 0.000396
iteration 500 / 2000: loss 0.015920
iteration 600 / 2000: loss 0.003225
iteration 700 / 2000: loss 0.012285
iteration 800 / 2000: loss 0.011405
iteration 900 / 2000: loss 0.025007
iteration 1000 / 2000: loss 0.000559
iteration 1100 / 2000: loss 0.006859
iteration 1200 / 2000: loss 0.007916
iteration 1300 / 2000: loss 0.030867
iteration 1400 / 2000: loss 0.000299
iteration 1500 / 2000: loss 0.001679
iteration 1600 / 2000: loss 0.008280
iteration 1700 / 2000: loss 0.001037
iteration 1800 / 2000: loss 0.002876
iteration 1900 / 2000: loss 0.005504
iteration 0 / 2000: loss 0.925765
iteration 100 / 2000: loss 0.002934
iteration 200 / 2

iteration 0 / 2000: loss 0.925833
iteration 100 / 2000: loss 0.007971
iteration 200 / 2000: loss 0.001111
iteration 300 / 2000: loss 0.000730
iteration 400 / 2000: loss 0.000137
iteration 500 / 2000: loss 0.000347
iteration 600 / 2000: loss 0.004176
iteration 700 / 2000: loss 0.000277
iteration 800 / 2000: loss 0.002005
iteration 900 / 2000: loss 0.002509
iteration 1000 / 2000: loss 0.088594
iteration 1100 / 2000: loss 0.031875
iteration 1200 / 2000: loss 0.003317
iteration 1300 / 2000: loss 0.003267
iteration 1400 / 2000: loss 0.004006
iteration 1500 / 2000: loss 0.007555
iteration 1600 / 2000: loss 0.012375
iteration 1700 / 2000: loss 0.002367
iteration 1800 / 2000: loss 0.005162
iteration 1900 / 2000: loss 0.008338
iteration 0 / 2000: loss 0.980893
iteration 100 / 2000: loss 0.006315
iteration 200 / 2000: loss 0.001084
iteration 300 / 2000: loss 0.009872
iteration 400 / 2000: loss 0.003331
iteration 500 / 2000: loss 0.057439
iteration 600 / 2000: loss 0.005026
iteration 700 / 2000: 

iteration 800 / 2000: loss 0.010502
iteration 900 / 2000: loss 0.005572
iteration 1000 / 2000: loss 0.001795
iteration 1100 / 2000: loss 0.032286
iteration 1200 / 2000: loss 0.015956
iteration 1300 / 2000: loss 0.021292
iteration 1400 / 2000: loss 0.001854
iteration 1500 / 2000: loss 0.000642
iteration 1600 / 2000: loss 0.009215
iteration 1700 / 2000: loss 0.039204
iteration 1800 / 2000: loss 0.006022
iteration 1900 / 2000: loss 0.002808
iteration 0 / 2000: loss 0.971445
iteration 100 / 2000: loss nan
iteration 200 / 2000: loss nan
iteration 300 / 2000: loss nan
iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000

iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 0.984132
iteration 100 / 2000: loss 4.861461
iteration 200 / 2000: loss 4.420886
iteration 300 / 2000: loss 4.020473
iteration 400 / 2000: loss 3.655541
iteration 500 / 2000: loss 3.331004
iteration 600 / 2000: loss 3.023848
iteration 700 / 2000: loss 2.756519
iteration 800 / 2000: loss 2.518901
iteration 900 / 2000: loss 2.296630
iteration 1000 / 2000: loss 2.072589
iteration 1100 / 2000: loss 1.888442
iteration 1200 / 2000: loss 1.723516
iteration 1300 / 2000: lo

iteration 200 / 2000: loss nan
iteration 300 / 2000: loss nan
iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 0.980865
iteration 100 / 2000: loss nan
iteration 200 / 2000: loss nan
iteration 300 / 2000: loss nan
iteration 400 / 2000: loss nan
iteration 500 / 2000: loss nan
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: 

iteration 600 / 2000: loss 0.019933
iteration 700 / 2000: loss 0.002892
iteration 800 / 2000: loss 0.004214
iteration 900 / 2000: loss 0.004589
iteration 1000 / 2000: loss 0.001545
iteration 1100 / 2000: loss 0.001938
iteration 1200 / 2000: loss 0.001060
iteration 1300 / 2000: loss 0.003237
iteration 1400 / 2000: loss 0.001367
iteration 1500 / 2000: loss 0.000356
iteration 1600 / 2000: loss 0.000368
iteration 1700 / 2000: loss 0.003314
iteration 1800 / 2000: loss 0.001936
iteration 1900 / 2000: loss 0.000524
iteration 0 / 2000: loss 0.995128
iteration 100 / 2000: loss 0.081865
iteration 200 / 2000: loss 0.004171
iteration 300 / 2000: loss 0.009906
iteration 400 / 2000: loss 0.002017
iteration 500 / 2000: loss 0.000438
iteration 600 / 2000: loss 0.000802
iteration 700 / 2000: loss 0.002059
iteration 800 / 2000: loss 0.000418
iteration 900 / 2000: loss 0.001964
iteration 1000 / 2000: loss 0.000970
iteration 1100 / 2000: loss 0.008964
iteration 1200 / 2000: loss 0.000931
iteration 1300 / 

iteration 1200 / 2000: loss 0.000935
iteration 1300 / 2000: loss 0.021892
iteration 1400 / 2000: loss 0.002815
iteration 1500 / 2000: loss 0.003149
iteration 1600 / 2000: loss 0.003068
iteration 1700 / 2000: loss 0.007558
iteration 1800 / 2000: loss 0.016829
iteration 1900 / 2000: loss 0.000453
iteration 0 / 2000: loss 0.984177
iteration 100 / 2000: loss 0.027484
iteration 200 / 2000: loss 0.000870
iteration 300 / 2000: loss 0.001366
iteration 400 / 2000: loss 0.000766
iteration 500 / 2000: loss 0.016542
iteration 600 / 2000: loss 0.002091
iteration 700 / 2000: loss 0.002924
iteration 800 / 2000: loss 0.002647
iteration 900 / 2000: loss 0.019054
iteration 1000 / 2000: loss 0.009218
iteration 1100 / 2000: loss 0.000518
iteration 1200 / 2000: loss 0.004261
iteration 1300 / 2000: loss 0.023654
iteration 1400 / 2000: loss 0.025496
iteration 1500 / 2000: loss 0.001724
iteration 1600 / 2000: loss 0.007165
iteration 1700 / 2000: loss 0.000930
iteration 1800 / 2000: loss 0.002085
iteration 190

iteration 1700 / 2000: loss 0.000487
iteration 1800 / 2000: loss 0.003016
iteration 1900 / 2000: loss 0.005503
iteration 0 / 2000: loss 0.925770
iteration 100 / 2000: loss 0.003218
iteration 200 / 2000: loss 0.000837
iteration 300 / 2000: loss 0.001969
iteration 400 / 2000: loss 0.000743
iteration 500 / 2000: loss 0.001178
iteration 600 / 2000: loss 0.043315
iteration 700 / 2000: loss 0.015239
iteration 800 / 2000: loss 0.010063
iteration 900 / 2000: loss 0.003257
iteration 1000 / 2000: loss 0.122480
iteration 1100 / 2000: loss 0.003662
iteration 1200 / 2000: loss 0.004308
iteration 1300 / 2000: loss 0.003404
iteration 1400 / 2000: loss 0.004687
iteration 1500 / 2000: loss 0.007607
iteration 1600 / 2000: loss 0.013304
iteration 1700 / 2000: loss 0.004726
iteration 1800 / 2000: loss 0.007100
iteration 1900 / 2000: loss 0.011048
iteration 0 / 2000: loss 0.980870
iteration 100 / 2000: loss 0.001256
iteration 200 / 2000: loss 0.003193
iteration 300 / 2000: loss 0.053977
iteration 400 / 200

In [15]:

for hs, lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(hs,lr, reg)]
    print 'hs: %e lr: %e reg: %e train accuracy: %f val accuracy: %f' % (
                hs,lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val

hs: 1.000000e+02 lr: 1.500000e-02 reg: 1.000000e-03 train accuracy: 0.200660 val accuracy: 0.202504
hs: 1.000000e+02 lr: 1.500000e-02 reg: 5.000000e-03 train accuracy: 0.208713 val accuracy: 0.209581
hs: 1.000000e+02 lr: 1.500000e-02 reg: 1.000000e-02 train accuracy: 0.192362 val accuracy: 0.195222
hs: 1.000000e+02 lr: 4.000000e-02 reg: 1.000000e-03 train accuracy: 0.090690 val accuracy: 0.090575
hs: 1.000000e+02 lr: 4.000000e-02 reg: 5.000000e-03 train accuracy: 0.088817 val accuracy: 0.087667
hs: 1.000000e+02 lr: 4.000000e-02 reg: 1.000000e-02 train accuracy: 0.107927 val accuracy: 0.111777
hs: 1.000000e+02 lr: 1.000000e-01 reg: 1.000000e-03 train accuracy: 0.110060 val accuracy: 0.114528
hs: 1.000000e+02 lr: 1.000000e-01 reg: 5.000000e-03 train accuracy: 0.138682 val accuracy: 0.150456
hs: 1.000000e+02 lr: 1.000000e-01 reg: 1.000000e-02 train accuracy: 0.110060 val accuracy: 0.114528
hs: 1.000000e+02 lr: 5.000000e-01 reg: 1.000000e-03 train accuracy: 0.151193 val accuracy: 0.161286


In [18]:
test_err = np.sum(np.square(net.predict(X_test) - y_test), axis=1).mean()
print test_err

0.061122304464281595
