First we need all the libraries

In [1]:
"""
Have fun with the number of epochs!

Be warned that if you increase them too much,
the VM will time out :)
"""

import numpy as np
from sklearn.datasets import load_boston
from sklearn.utils import shuffle, resample
from miniflow import *

## Using the Boston data set from sklearn

__TODO__:
  1. apply fixes for the data, to ensure statistical validity of the test and to prevent "multiplicity" issues

In [2]:
# Load data
data = load_boston()
X_ = data['data']
y_ = data['target']

Just a little more data cleaning

In [3]:
# Normalize data
X_ = (X_ - np.mean(X_, axis=0)) / np.std(X_, axis=0)

Here, we establish the basis for the data, __before__ we color in the abstraction

In [4]:
n_features = X_.shape[1]
n_hidden = 10
W1_ = np.random.randn(n_features, n_hidden)
b1_ = np.zeros(n_hidden)
W2_ = np.random.randn(n_hidden, 1)
b2_ = np.zeros(1)

__Now__ it's time for coloring :)

In [5]:
# Neural network
X, y = Input(), Input()
W1, b1 = Input(), Input()
W2, b2 = Input(), Input()

l1 = Linear(X, W1, b1)
s1 = Sigmoid(l1)
l2 = Linear(s1, W2, b2)
cost = MSE(y, l2)

feed_dict = {
    X: X_,
    y: y_,
    W1: W1_,
    b1: b1_,
    W2: W2_,
    b2: b2_
}

### Note the abstraction

The `feed_dict` was chosen to match up with the `topological_sort` that occurs in the main network

- I'm personally not a fan of this.
  - It's seems like "abstraction for abstraction's sake", which should always be avoided in software development

__TODO__:

1. Managed the graph without the need for the [topological sort](https://en.wikipedia.org/wiki/Topological_sorting#Kahn.27s_algorithm)

In [8]:
epochs = 1000 # total "iterations" or "passes" through the network
# Total number of examples
m = X_.shape[0]
batch_size = 11 # how much of the Boston data we're sampling at a given time

__TODO__:

1. Make the batch size sampling random
  - Currently, it will grab the same data __every__ time
  - This makes for "gradeable" results, but the science is in the crapper at that point.

In [9]:
steps_per_epoch = m // batch_size

graph = topological_sort(feed_dict)
trainables = [W1, b1, W2, b2] # Input-style nodes that will be adjusted with each pass

In [10]:
# Just a little helper debugging
print("Total number of examples = {}".format(m))

Total number of examples = 506


In [11]:
# Step 4
for i in range(epochs):
    loss = 0
    for j in range(steps_per_epoch):
        # Step 1
        # Randomly sample a batch of examples
        X_batch, y_batch = resample(X_, y_, n_samples=batch_size)

        # Reset value of X and y Inputs
        X.value = X_batch
        y.value = y_batch

        # Step 2
        forward_and_backward(graph)

        # Step 3
        sgd_update(trainables)

        loss += graph[-1].value

    print("Epoch: {}, Loss: {:.3f}".format(i+1, loss/steps_per_epoch))

Epoch: 1, Loss: 167.064
Epoch: 2, Loss: 34.726
Epoch: 3, Loss: 27.292
Epoch: 4, Loss: 23.512
Epoch: 5, Loss: 24.926
Epoch: 6, Loss: 22.924
Epoch: 7, Loss: 21.246
Epoch: 8, Loss: 16.992
Epoch: 9, Loss: 17.233
Epoch: 10, Loss: 15.257
Epoch: 11, Loss: 17.984
Epoch: 12, Loss: 15.821
Epoch: 13, Loss: 15.530
Epoch: 14, Loss: 14.769
Epoch: 15, Loss: 13.391
Epoch: 16, Loss: 14.393
Epoch: 17, Loss: 12.136
Epoch: 18, Loss: 11.251
Epoch: 19, Loss: 11.911
Epoch: 20, Loss: 10.779
Epoch: 21, Loss: 7.335
Epoch: 22, Loss: 10.980
Epoch: 23, Loss: 10.928
Epoch: 24, Loss: 11.836
Epoch: 25, Loss: 11.577
Epoch: 26, Loss: 9.441
Epoch: 27, Loss: 10.736
Epoch: 28, Loss: 12.212
Epoch: 29, Loss: 11.004
Epoch: 30, Loss: 8.297
Epoch: 31, Loss: 8.803
Epoch: 32, Loss: 8.764
Epoch: 33, Loss: 10.055
Epoch: 34, Loss: 10.161
Epoch: 35, Loss: 9.144
Epoch: 36, Loss: 8.523
Epoch: 37, Loss: 9.414
Epoch: 38, Loss: 9.336
Epoch: 39, Loss: 8.590
Epoch: 40, Loss: 10.390
Epoch: 41, Loss: 9.828
Epoch: 42, Loss: 8.521
Epoch: 43, L

Epoch: 358, Loss: 4.939
Epoch: 359, Loss: 3.908
Epoch: 360, Loss: 4.373
Epoch: 361, Loss: 3.901
Epoch: 362, Loss: 4.684
Epoch: 363, Loss: 4.425
Epoch: 364, Loss: 4.414
Epoch: 365, Loss: 4.285
Epoch: 366, Loss: 3.891
Epoch: 367, Loss: 4.574
Epoch: 368, Loss: 3.927
Epoch: 369, Loss: 3.738
Epoch: 370, Loss: 4.240
Epoch: 371, Loss: 4.417
Epoch: 372, Loss: 3.729
Epoch: 373, Loss: 4.010
Epoch: 374, Loss: 3.884
Epoch: 375, Loss: 4.274
Epoch: 376, Loss: 4.319
Epoch: 377, Loss: 4.286
Epoch: 378, Loss: 4.057
Epoch: 379, Loss: 3.835
Epoch: 380, Loss: 3.799
Epoch: 381, Loss: 4.164
Epoch: 382, Loss: 3.987
Epoch: 383, Loss: 3.713
Epoch: 384, Loss: 3.696
Epoch: 385, Loss: 4.072
Epoch: 386, Loss: 3.635
Epoch: 387, Loss: 5.021
Epoch: 388, Loss: 4.312
Epoch: 389, Loss: 4.293
Epoch: 390, Loss: 4.773
Epoch: 391, Loss: 4.605
Epoch: 392, Loss: 4.217
Epoch: 393, Loss: 5.187
Epoch: 394, Loss: 3.585
Epoch: 395, Loss: 3.933
Epoch: 396, Loss: 3.607
Epoch: 397, Loss: 4.091
Epoch: 398, Loss: 3.931
Epoch: 399, Loss

Epoch: 708, Loss: 4.430
Epoch: 709, Loss: 3.885
Epoch: 710, Loss: 3.611
Epoch: 711, Loss: 4.250
Epoch: 712, Loss: 3.286
Epoch: 713, Loss: 3.743
Epoch: 714, Loss: 3.905
Epoch: 715, Loss: 3.246
Epoch: 716, Loss: 4.246
Epoch: 717, Loss: 3.782
Epoch: 718, Loss: 3.889
Epoch: 719, Loss: 3.794
Epoch: 720, Loss: 3.453
Epoch: 721, Loss: 4.006
Epoch: 722, Loss: 3.757
Epoch: 723, Loss: 3.780
Epoch: 724, Loss: 3.578
Epoch: 725, Loss: 3.751
Epoch: 726, Loss: 4.053
Epoch: 727, Loss: 3.524
Epoch: 728, Loss: 3.374
Epoch: 729, Loss: 3.617
Epoch: 730, Loss: 4.179
Epoch: 731, Loss: 3.096
Epoch: 732, Loss: 3.467
Epoch: 733, Loss: 3.850
Epoch: 734, Loss: 3.497
Epoch: 735, Loss: 3.943
Epoch: 736, Loss: 3.238
Epoch: 737, Loss: 3.176
Epoch: 738, Loss: 3.596
Epoch: 739, Loss: 3.772
Epoch: 740, Loss: 3.621
Epoch: 741, Loss: 3.224
Epoch: 742, Loss: 3.128
Epoch: 743, Loss: 2.954
Epoch: 744, Loss: 3.625
Epoch: 745, Loss: 3.720
Epoch: 746, Loss: 4.039
Epoch: 747, Loss: 3.551
Epoch: 748, Loss: 3.326
Epoch: 749, Loss

Run in this current state, we see __severely__ diminishing returns after 400 epochs.

__TODO__:

1. Make it not so.
  - The network can always stand for improvement.
  - Perhaps hidden layers, to compensate for the slowed improvement rate