# `Python` Hacker's guide to Neural Networks

Hi there, I am not Andrej Karpathy. My name is James and I'm a computation linguist and PhD candidate in Anthropology at the University of Missouri. I work with neural nets in my everyday life and every once in awhile I get someone interested enough to ask my about my work. Well, neural nets aren't necesarily the easiest concept to grasp until I found [this blog post](http://karpathy.github.io/neuralnets/), which really simplifies the language of neural nets and provides a good primer for anyone interested. Thanks, Andrej! This post is rad!

This will also go in my blog-in-progress, [Gradient Dissents](https://jcbain.github.io). I know, there isn't a lot up there yet as Anthropologists don't think of blogging to get their work out there. We usually just wear clothes with skulls on it. Don't worry, I will start posting some of my PhD work here soon.

### Motivation

The original post is written in `javascript` but many, often academic type, have very little exposure to `javascript`. Take me for example, I am a computation linguist by way of Evolutionary Anthropology (don't ask...or do) and my interest in programming flourished through computational statistics. `R` and `Python` are the first languages that come to mind. So this is a post for those and just anyone who is interested in `Python`. Again, this is in no way an attack against `javascript` or the original post. Remember, the post was so great that I decided to take my time to convert to convert it to `Python`.


**Disclaimer**: *These chapters will be lightly editted.*




# Chapter 1: Real-valued Circuits



In [1]:
import math
import random

In [2]:
def forward_multiply_gate(x,y):
    return x * y 

forward_multiply_gate(-2,3)  # returns -6

-6

In [3]:
# circuit with a single gate

def forward_multiply_gate(x,y): 
    return x *y

x, y = -2,3 # input values

# try changing x,y randomly small amounts and keep track of what works best
tweak_amount = 0.01
best_out = -math.inf
best_x, best_y = x, y

for i in range(100):
    x_try = x + tweak_amount * (random.random() * 2 - 1) # tweak x a little bit
    y_try = y + tweak_amount * (random.random() * 2 - 1) # tweak y a little bit
    out = forward_multiply_gate(x_try,y_try)
    if out > best_out:
        best_out = out
        best_x, best_y = x_try, y_try

In [4]:
print("best x: {}".format(best_x))
print("best y: {}".format(best_y))
print("best out: {}".format(best_out))

best x: -1.9905475335740135
best y: 2.994992467407144
best out: -5.961674869070039


In [5]:
x, y = -2, 3
out = forward_multiply_gate(x,y)
h = 0.0001

# compute derivative with respect to x
xph = x + h
out2 = forward_multiply_gate(xph, y) # -5.997
x_derivative = (out2 - out) / h # 3.0

# compute derivative with respect to y
yph = y + h
out3 = forward_multiply_gate(x, yph) # -6.0002
y_derivative = (out3 - out)/h # 2.0

In [6]:
y_derivative

-2.0000000000042206

In [7]:
class Unit:
    def __init__(self, value, grad):
        self.value = value
        self.grad = grad
        


In [8]:
class MultiplyGate:
    def forward(self,u0,u1):
        self.u0 = u0
        self.u1 = u1
        self.utop = Unit(u0.value*u1.value, 0.0)
        return self.utop
    
    def backward(self):
        self.u0.grad += self.u1.value * self.utop.grad
        self.u1.grad += self.u0.value * self.utop.grad

In [9]:
class AddGate:
    def forward(self,u0,u1):
        self.u0 = u0
        self.u1 = u1
        self.utop = Unit(u0.value + u1.value, 0.0)
        return self.utop
    
    def backward(self):
        self.u0.grad += 1 * self.utop.grad
        self.u1.grad += 1 * self.utop.grad

In [10]:
# helper function
def sigmoid_fun(x):
    return 1/(1+math.exp(-x))

class SigmoidGate:

    
    def forward(self,u0):
        self.u0 = u0
        self.utop = Unit(sigmoid_fun(self.u0.value), 0.0)
        return self.utop
    
    def backward(self):
        s = sigmoid_fun(self.u0.value)
        self.u0.grad += (s * (1 - s)) * self.utop.grad
    

In [11]:
a = Unit(1.0,0.0)
b = Unit(2.0,0.0)
c = Unit(-3.0, 0.0)
x = Unit(-1.0,0.0)
y = Unit(3.0,0.0)

In [12]:
mulg0 = MultiplyGate()
mulg1 = MultiplyGate()
addg0 = AddGate()
addg1 = AddGate()
sg0 = SigmoidGate()

In [13]:


ax = mulg0.forward(a,x)
by = mulg1.forward(b,y)
axpby = addg0.forward(ax,by)
axpbypc = addg1.forward(axpby,c)
s = sg0.forward(axpbypc)


In [14]:
s.value

0.8807970779778823

In [15]:
s.grad = 1.0

In [16]:
sg0.backward()
addg1.backward()
addg0.backward()
mulg1.backward()
mulg0.backward()

In [17]:
a.grad

-0.10499358540350662

In [18]:
step_size = 0.01

a.value += step_size * a.grad
b.value += step_size * b.grad
c.value += step_size * c.grad
x.value += step_size * x.grad
y.value += step_size * y.grad

In [19]:
ax = mulg0.forward(a,x)
by = mulg1.forward(b,y)
axpby = addg0.forward(ax,by)
axpbypc = addg1.forward(axpby,c)
s = sg0.forward(axpbypc)

In [20]:
s.value

0.8825501816218984

In [21]:
def forward_fast_circuit(a,b,c,x,y):
    return 1/(1 + math.exp(- (a*x + b*y + c)))

In [22]:
a,b,c,x,y = 1,2,-3,-1,3
h = 0.0001

In [23]:
a_grad = (forward_fast_circuit(a+h,b,c,x,y) - forward_fast_circuit(a,b,c,x,y))/h
b_grad = (forward_fast_circuit(a,b+h,c,x,y) - forward_fast_circuit(a,b,c,x,y))/h
c_grad = (forward_fast_circuit(a,b,c+h,x,y) - forward_fast_circuit(a,b,c,x,y))/h
x_grad = (forward_fast_circuit(a,b,c,x+h,y) - forward_fast_circuit(a,b,c,x,y))/h
y_grad = (forward_fast_circuit(a,b,c,x,y+h) - forward_fast_circuit(a,b,c,x,y))/h

In [24]:
print(a_grad,b_grad,c_grad,x_grad,y_grad)

-0.10499758359205913 0.3149447748351797 0.10498958734506125 0.10498958734506125 0.2099711788272618
