# Educational neural networks in Python

This code is loosely inspired to [Andrej Karpathy](https://cs.stanford.edu/people/karpathy/)'s excellent [Hacker's guide to Neural Networks](http://karpathy.github.io/neuralnets/).

This implementation is not a one-to-one transliteration of the original javascript code into Python, but [there](https://github.com/urwithajit9/HG_NeuralNetwork) [are](https://github.com/johnashu/hackers_guide_to_neural_networks) [many](https://github.com/saiashirwad/Hackers-Guide-To-Neural-Networks-Python) [repositories](https://github.com/pannous/karpathy_neuralnets_python) [on](https://github.com/techniquark/Hacker-s-Guide-to-Neural-Networks-in-Python) [Github](https://github.com/Mutinix/hacker-nn/) that closely match it line-by-line. Use those to follow along the blog post.

The main purpose of this version is to simplify network definition and automate the computation of forward and backward passes. Both these tasks are exploded and manual (for clarity's sake!) in Karpathy's code.

For example, a single neuron can be written as:

```python
a, b, c = param([1.0, 2.0, -3.0])
x, y = const([-1.0, 3.0])

s = sigmoid(a * x + b * y + c)
print s.compute()  # 0.880797077978
```

Compare it with the original implementation:

```javascript
var a = new Unit(1.0, 0.0);
var b = new Unit(2.0, 0.0);
var c = new Unit(-3.0, 0.0);
var x = new Unit(-1.0, 0.0);
var y = new Unit(3.0, 0.0);

// create the gates
var mulg0 = new multiplyGate();
var mulg1 = new multiplyGate();
var addg0 = new addGate();
var addg1 = new addGate();
var sg0 = new sigmoidGate();

// do the forward pass
var forwardNeuron = function() {
  ax = mulg0.forward(a, x); // a*x = -1
  by = mulg1.forward(b, y); // b*y = 6
  axpby = addg0.forward(ax, by); // a*x + b*y = 5
  axpbypc = addg1.forward(axpby, c); // a*x + b*y + c = 2
  s = sg0.forward(axpbypc); // sig(a*x + b*y + c) = 0.8808
};
forwardNeuron();

console.log('circuit output: ' + s.value); // prints 0.8808
```

# Base case: single gate in the circuit

Shows a single gate implementing f(x,y) = xy.

In [13]:
from iogates import Constant
from opgates import MulGate

a = Constant(3)
b = Constant(-1)
ab = MulGate(a, b)

print 'a * b = ', ab.compute()

a * b =  -3


Notes:

* a gate has `forward()` and `backward()` methods that respectively evaluate the output of a gate given its inputs, or propagate the gradients from the output to inputs.
* A gate graph is assumed to be a directed and cycle-free. This does not prevent any two nodes to be connected by multiple, distinct paths.
* The `compute()` methods recursively and automatically updates a gate and all its ancestors. In the minimal example above, this is equivalent to calling `ab.forward()`

With minimal syntactic sugar, the code above can be rewritten as:

In [14]:
from sugar import *

a = const(3)
b = const(-1)
ab = a * b

print 'a * b = ', ab.compute()


a * b =  -3


## Strategy #1: Random Local Search

Random search, perturb inputs and accept them if they improve the output.

In [15]:
from random import random

best_in = (a.val, b.val) = (-2, 3)
best_out = ab.compute()

print 'Initial output: {} * {} = {}'.format(best_in[0], best_in[1], best_out)

tweak_amount = 0.01
for _ in range(10):
    
    a.val, b.val = (x + tweak_amount * (random() * 2 - 1) for x in best_in)
    out = ab.compute()
    if out > best_out:
        best_in = (a.val, b.val)
        best_out = out
        
print 'Final output: {:.3} * {:.3} = {:.3}'.format(best_in[0], best_in[1], best_out)

Initial output: -2 * 3 = -6
Final output: -1.98 * 2.99 = -5.92


## Strategy #2: Numerical Gradient

Perform one step of numerical gradient descent.

In [16]:
a.val, b.val = -2, 3  # initial inputs
eps = 0.0001  # tweak amount / epsilon
out = ab.compute()

prev_a, a.val = a.val, a.val + eps
dx = (ab.compute() - out) / eps  # 3.0

a.val, b.val = prev_a, b.val + eps
dy = (ab.compute() - out) / eps  # -2.0

step_size = 0.01
a.val, b.val = (val + step_size * der for val, der in ((a.val, dx), (b.val, dy)))
new_out = ab.compute()  # -5.870797

print 'Initial output: {}\nFinal output: {:.3}'.format(out, new_out)

Initial output: -6
Final output: -5.87


## Strategy #3: Analytic Gradient

Perform one step of gradient descent using analytical derivatives.

In [17]:
step_size = 0.01
a = param(-2, step_size)
b = param(3, step_size)
ab = a * b

print 'Initial output: {:}'.format(ab.compute())
ab.backprop()
print 'Final output: {:.3}'.format(ab.compute())

Initial output: -6
Final output: -5.87


Notes:
* `const` (constant) gates ignore incoming gradients and do not change their value.
* `param` (parameter) gates update the gate value by `step_size * gradient`.
* `compute()` (forward) and `backprop()` (backward) passes need to be interleaved.

   Calling `backprop()` multiple times in a row performs multiple gradient descent steps in the gradient direction at the initial, unchanging output value.

# Recursive Case: Circuits with Multiple Gates

Here's an example with multiple gates that depend on each other:

In [18]:
x, y, z = param((-2, 5, -4), 0.01)
xpyz = (x + y) * z

print 'Initial output: {:}'.format(xpyz.compute())  # -12
xpyz.backprop()
print 'Final output: {:.4}'.format(xpyz.compute())  # -11.59

Initial output: -12
Final output: -11.59


Let's compare analytical and numerical gradients:

In [19]:
assert(xpyz.check_numerical_gradient(verbose=True))

Param value: -2.04, Analytical grad: -3.97, Numerical grad: -3.97, Diff: 1.2167e-06
Param value: 4.96, Analytical grad: -3.97, Numerical grad: -3.97, Diff: 1.2167e-06
Param value: -3.97, Analytical grad: 2.92, Numerical grad: 2.92, Diff: -2.416e-07


# Example: single neuron

A 2-dimensional neuron tcomputes the following function f(x,y,a,b,c) = σ(ax + by + c) where σ is the sigmoid function.

In [20]:
a, b, c = param([1.0, 2.0, -3.0], lr=0.01)
x, y = const([-1.0, 3.0])
s = sigmoid(a * x + b * y + c)

assert(s.check_numerical_gradient(verbose=True))
print '---'
print 'Initial output: {:}'.format(s.compute())  # 0.880797077978
s.backprop()
print 'Final output: {:.5}'.format(s.compute())  # 0.882


Param value: 1.0, Analytical grad: -0.10499, Numerical grad: -0.10499, Diff: -2.3805e-07
Param value: 2.0, Analytical grad: 0.31498, Numerical grad: 0.31498, Diff: -6.2994e-08
Param value: -3.0, Analytical grad: 0.10499, Numerical grad: 0.10499, Diff: -9.5013e-08
---
Initial output: 0.880797077978
Final output: 0.882


A single neuron can also be defined as a single gate with five inputs:

In [21]:
a, b, c = param([1.0, 2.0, -3.0], lr=0.01)
x, y = const([-1.0, 3.0])
n = neuron(a, b, c, x, y)

assert(n.check_numerical_gradient(verbose=True))
print '---'
print 'Initial output: {:}'.format(n.compute())  # 0.880797077978
n.backprop()
print 'Final output: {:.5}'.format(n.compute())  # 0.882


Param value: 1.0, Analytical grad: -0.10499, Numerical grad: -0.10499, Diff: -2.3805e-07
Param value: 2.0, Analytical grad: 0.31498, Numerical grad: 0.31498, Diff: -6.2994e-08
Param value: -3.0, Analytical grad: 0.10499, Numerical grad: 0.10499, Diff: -9.5013e-08
---
Initial output: 0.880797077978
Final output: 0.882


# Binary classification

In [23]:
vectors = [[1.2, 0.7], [-0.3, 0.5], [-3, -1],
           [0.1, 1.0], [3.0, 1.1], [2.1, -3]]
labels = [+1, -1, +1, -1, -1, +1]
