# Educational neural networks in Python

This code is loosely inspired to [Andrej Karpathy](https://cs.stanford.edu/people/karpathy/)'s excellent but discontinued [Hacker's guide to Neural Networks](http://karpathy.github.io/neuralnets/).

This implementation is not a one-to-one transliteration of the original javascript code into Python, but [there](https://github.com/urwithajit9/HG_NeuralNetwork) [are](https://github.com/johnashu/hackers_guide_to_neural_networks) [many](https://github.com/saiashirwad/Hackers-Guide-To-Neural-Networks-Python) [repositories](https://github.com/pannous/karpathy_neuralnets_python) [on](https://github.com/techniquark/Hacker-s-Guide-to-Neural-Networks-in-Python) [Github](https://github.com/Mutinix/hacker-nn/) that closely match it line-by-line. Use those to follow along the blog post.

The purpose of this version is to simplify network definition and automate the computation of forward and backward passes. Both these tasks are exploded and manual (for clarity's sake!) in Karpathy's code.

# Base case: single gate in the circuit

Shows a single gate implementing f(x,y) = xy.

In [None]:
from gates import *

a = Constant(3)
b = Constant(-1)
ab = MulGate(a, b)

print 'a * b = ', ab.compute()

a * b =  -3.0


The code above is equivalent to:

In [18]:
from utils.sugar import *

a = const(3)
b = const(-1)
ab = a * b

print 'a * b = ', ab.compute()


a * b =  -3.0


## Strategy #1: Random Local Search

Random search, perturb inputs and accept them if they improve the output.

In [19]:
from random import random

best_in = (-2, 3)
best_out = best_in[0] * best_in[1]

print 'Initial output: {} * {} = {}'.format(best_in[0], best_in[1], best_out)

tweak_amount = 0.01
for _ in range(100):
    
    best_plus_noise = tuple(x + tweak_amount * (random() * 2 - 1) for x in best_in)
    out = best_plus_noise[0] * best_plus_noise[1]
    if out > best_out:
        best_in = best_plus_noise
        best_out = out
        
print 'Final output: {:.3} * {:.3} = {:.3}'.format(best_in[0], best_in[1], best_out)

Initial output: -2 * 3 = -6
Final output: -1.8 * 2.93 = -5.28


## Strategy #2: Numerical Gradient

Perform one step of numerical gradient descent.

In [20]:
a, b = -2, 3  # initial inputs
eps = 0.0001  # tweak amount
out = a * b

da = ((a + eps) * b - out) / eps  # 3.0
db = (a * (b + eps) - out) / eps  # -2.0

step_size = 0.01
a, b = a + step_size * da, b + step_size * db
print 'Initial output: {}\nFinal output: {:.3}'.format(out, a * b)

Initial output: -6
Final output: -5.87


## Strategy #3: Analytical Gradient

Perform one step of gradient descent using analytical derivatives.

In [21]:
a = param(-2)
b = param(3)
ab = a * b

print 'Initial output: {:}'.format(ab.compute())
ab.backprop(lr=0.01)
print 'Final output: {:.3}'.format(ab.compute())

# Recursive Case: Circuits with Multiple Gates

Here's an example with multiple gates that depend on each other:

In [22]:
x, y, z = param(-2, 5, -4)
xpyz = (x + y) * z

print 'Initial output: {:}'.format(xpyz.compute())  # -12
xpyz.backprop(0.01)
print 'Final output: {:.4}'.format(xpyz.compute())  # -11.59

Initial output: -12.0
Final output: -11.59


Let's compare analytical and numerical gradients:

In [23]:
from tests.numerical_gradients import check_gradients
assert(check_gradients(xpyz, verbose=True))

Param value: -2.04, Analytical grad: -1.6674, Numerical grad: -1.6674, Abs. Diff: 1.2975e-11
Param value: 4.96, Analytical grad: -1.6674, Numerical grad: -1.6674, Abs. Diff: -6.1632e-11
Param value: -3.97, Analytical grad: 1.2264, Numerical grad: 1.2264, Abs. Diff: -7.2884e-12


# Example: single neuron

A 2-dimensional neuron computes the following function f(x,y,a,b,c) = σ(ax + by + c) where σ is the sigmoid function.

In [24]:
a, b, c = param(1.0, 2.0, -3.0)
x, y = const(-1.0, 3.0)
s = sigmoid(a * x + b * y + c)

assert(check_gradients(s, verbose=True))
print '---'
print 'Initial output: {}'.format(s.compute())  # 0.880797077978
s.backprop(lr=0.01)
print 'Final output: {}'.format(s.compute())  # 0.882


Param value: 1.0, Analytical grad: -0.044098, Numerical grad: -0.044097, Abs. Diff: -1.6793e-07
Param value: 2.0, Analytical grad: 0.13229, Numerical grad: 0.13229, Abs. Diff: -1.5113e-06
Param value: -3.0, Analytical grad: 0.044097, Numerical grad: 0.044097, Abs. Diff: -1.6792e-07
---
Initial output: 0.880797077978
Final output: 0.882004356849


A single neuron can also be defined as a single gate with five inputs:

In [25]:
a, b, c = param(1.0, 2.0, -3.0)
x, y = const(-1.0, 3.0)
n = neuron(a, x, b, y, c)

assert(check_gradients(n, verbose=True))
print '---'
print 'Initial output: {}'.format(n.compute())  # 0.880797077978
n.backprop(lr=0.01)
print 'Final output: {}'.format(n.compute())  # 0.882


Param value: 1.0, Analytical grad: 0.060897, Numerical grad: 0.060897, Abs. Diff: 2.319e-07
Param value: 2.0, Analytical grad: -0.18268, Numerical grad: -0.18269, Abs. Diff: 2.087e-06
Param value: -3.0, Analytical grad: -0.060896, Numerical grad: -0.060896, Abs. Diff: 2.319e-07
---
Initial output: 0.880797077978
Final output: 0.882004356849


# Binary classification

In [26]:
from random import choice
from utils.sugar import *

dataset = (((1.2, 0.7), +1.0), ((-0.3, 0.5), -1.0), ((-3.0, -1.0), +1.0),
           ((0.1, 1.0), -1.0), ((3.0, 1.1), -1.0), ((2.1, -3.0), +1.0))

a, b, c = param(1, -2, -1)  # initial solution
x, y, label = const(0, 0, 0)  # not affected by backprop
f = minimum(1, label * (a * x + b * y + c))

for iteration in range(35001):

    if iteration % 2500 == 0 or (iteration % 10 == 0 and iteration < 40):
        correct = sum(f.compute() > 0 for (x.val, y.val), label.val in dataset)
        print 'Accuracy at iteration {}: {:.1f} [{:.2f} {:.2f} {:.2f}]'.format(
            iteration, (100.0 * correct) / len(dataset), a.val, b.val, c.val)

    (x.val, y.val), label.val = choice(dataset)
    f.compute()
    f.backprop()
    
    # a.grad += -a.val
    # b.grad += -b.val

    f.update_parameters(0.1)

Accuracy at iteration 0: 66.7 [1.00 -2.00 -1.00]


Accuracy at iteration 10: 83.3 [0.10 -2.30 -0.70]
Accuracy at iteration 20: 83.3 [0.22 -2.23 -0.60]


Accuracy at iteration 30: 83.3 [0.04 -2.26 -0.40]


Accuracy at iteration 2500: 83.3 [1.05 -4.88 1.70]


Accuracy at iteration 5000: 83.3 [1.47 -7.03 3.10]


Accuracy at iteration 7500: 83.3 [1.83 -9.46 4.20]


Accuracy at iteration 10000: 83.3 [2.16 -11.78 5.60]


Accuracy at iteration 12500: 100.0 [2.67 -13.88 6.80]


Accuracy at iteration 15000: 100.0 [3.09 -15.96 8.00]


Accuracy at iteration 17500: 83.3 [3.09 -18.21 8.90]


Accuracy at iteration 20000: 100.0 [3.42 -20.36 10.20]


Accuracy at iteration 22500: 100.0 [4.17 -22.14 11.40]


Accuracy at iteration 25000: 100.0 [4.11 -24.18 12.20]


Accuracy at iteration 27500: 100.0 [4.89 -25.70 13.40]


Accuracy at iteration 30000: 100.0 [5.07 -27.34 14.20]


Accuracy at iteration 32500: 100.0 [5.04 -27.78 14.40]


Accuracy at iteration 35000: 100.0 [5.04 -27.78 14.40]
