## Neural Networks 
- This was adopted from the PyTorch Tutorials. 
- http://pytorch.org/tutorials/beginner/pytorch_with_examples.html

## Neural Networks 
- Neural networks are the foundation of deep learning, which has revolutionized the 

```In the mathematical theory of artificial neural networks, the universal approximation theorem states[1] that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.```

### Generate Fake Data
- `D_in` is the number of dimensions of an input varaible.
- `D_out` is the number of dimentions of an output variable.
- Here we are learning some special "fake" data that represents the xor problem. 
- Here, the dv is 1 if either the first or second variable is 


In [34]:
# -*- coding: utf-8 -*-
import numpy as np

#This is 
x = np.array([ [0,0,0],[1,0,0],[0,1,0],[0,0,0] ])
y = np.array([[0,1,1,0]]).T
print("Input data:\n",X,"\n Output data:\n",y)

Input data:
 [[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]] 
 Output data:
 [[0]
 [1]
 [1]
 [0]]


### A Simple Neural Network 
- Here we are going to build a neural network with 6 hidden layers. 
-

In [57]:
D_in, H, D_out = 3, 3, 1
# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

In [58]:
# -*- coding: utf-8 -*-

learning_rate = 1e-3
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    
    #A relu is just the activation.
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 0.520526973721
1 0.513053503241
2 0.505682867527
3 0.498413739025
4 0.491244806502
5 0.484174774821
6 0.477202364718
7 0.47032631259
8 0.463545370285
9 0.45685830489
10 0.450263898532
11 0.443760948182
12 0.437348265457
13 0.431024676431
14 0.424789021449
15 0.418640154944
16 0.412576945257
17 0.406598274463
18 0.400703038197
19 0.394890145483
20 0.389158518572
21 0.383507092775
22 0.377934816306
23 0.37244065012
24 0.367023567765
25 0.361682555222
26 0.356416610764
27 0.351224744802
28 0.346105979745
29 0.341059349856
30 0.336083901114
31 0.331178691075
32 0.326342788736
33 0.321575274403
34 0.316875239561
35 0.312241786742
36 0.3076740294
37 0.303171091785
38 0.298732108817
39 0.294356225971
40 0.290042599147
41 0.285790394562
42 0.281598788624
43 0.277466967825
44 0.273394128621
45 0.269379477323
46 0.265422229986
47 0.261521612301
48 0.257676859485
49 0.253887216174
50 0.250151936322
51 0.246470283094
52 0.242841528763
53 0.239264954611
54 0.235739850827
55 0.232265516411
56 0.22

Fully connected 

In [59]:

y_pred = h_relu.dot(w2)
y_pred

array([[ 0.        ],
       [ 0.97891648],
       [ 1.01096967],
       [ 0.        ]])

In [55]:
y


array([[0],
       [1],
       [1],
       [0]])

In [56]:
w1

array([[-0.74492939],
       [-1.11689861],
       [-1.58493737]])

In [51]:
w2

array([[ 1.35167184],
       [ 0.70907942],
       [-0.11154362]])

In [49]:
# Relu just removes the negative numbers.  
h_relu

array([[ 0.        ,  0.        ,  0.        ],
       [ 0.72108356,  0.        ,  0.        ],
       [ 0.72753913,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ]])