# Gradients!

Most of the content is inspired from [this](https://karpathy.github.io/neuralnets/) blogpost by [Andrej Karpathy](https://karpathy.ai/)!

## Base case: Single Gate in the circuit

The circuit takes in two real-valued inputs `x` and `y` to compute `x*y`, and the gate is `*`

$$f(x,y) = xy$$

In [None]:
def forwardMultiplyGate (x, y):
  return x * y

forwardMultiplyGate(2, -3)

-6

## Goal?

How can we tweak the input slightly to increase the output?

## Approach #1: Random!

In [None]:
import random
x, y = 2, -3

tweak_amount = 0.01
best_out = float('-inf')
best_x, best_y = x, y

for _ in range(100):
  x_try = x + tweak_amount * random.uniform(-1, 1)
  y_try = y + tweak_amount * random.uniform(-1, 1)

  out = forwardMultiplyGate(x_try, y_try)

  if (out > best_out):
    best_out = out
    best_x, best_y = x_try, y_try

print(f"New x = {best_x}; New y = {best_y}")
print(f"Out = {best_out}")
print(f"Has the output increased? {best_out > forwardMultiplyGate(x, y)}")

New x = 1.9912347470185285; New y = -2.990072615170193
Out = -5.953936487435449
Has the output increased? True


## Approach #2: Numerical Gradient

Idea: Instead of pulling on the circuit’s output, we’ll iterate over every input one by one, increase it very slightly and look at what happens to the output value.

> The derivative can be thought of as a force on each input as we pull on the output to become higher.

$$\frac{\partial f(x,y)}{\partial x} = \frac{f(x+h,y) - f(x,y)}{h}$$

$$\frac{\partial f(x,y)}{\partial y} = \frac{f(x,y+h) - f(x,y)}{h}$$

`h` is a small *tweak amount* - similar to what we used in **Approach #1**. I hope you are familiar with simple calculus. LHS simply put means "the change in `f(x,y)` if we change `x` by some *small* amount", and it is said as, *derivative of `f(x,y)` with respect to `x`*!

In [None]:
x, y = 2, -3
out = forwardMultiplyGate(x, y)
h = 0.01

# df/dx
xph = x + h
out_w_x = forwardMultiplyGate(xph, y)
x_deriv = (out_w_x - out) / h

# df/dy
yph = y + h
out_w_y = forwardMultiplyGate(x, yph)
y_deriv = (out_w_y - out) / h

2.01 -6.029999999999999 -2.999999999999936
-2.99 -5.98 1.9999999999999574


## Approach #3: Analytical Gradient



Let us start here,

$$\frac{\partial f(x,y)}{\partial x} = \frac{f(x+h,y) - f(x,y)}{h}$$

$$\frac{\partial f(x,y)}{\partial x} = \frac{f(x+h,y)-f(x,y)}{h} = \frac{(x+h)y-xy}{h} = \frac{xy+hy-xy}{h} = \frac{hy}{h} = y$$

Hmm, interesting! The derivative of `f(x,y)` with respect to `x` is just, `y`! That's what we discovered in the previous approach too. As you can tell now, by symmetry, derivative wrt `y` would be... `x`.

In [None]:
x, y = 2, -3
out = forwardMultiplyGate(x, y)
x_grad = y
y_grad = x

alpha = 0.01
x += alpha * x_grad
y += alpha * y_grad
new_out = forwardMultiplyGate(x, y)

To compute the gradient we used 3 basic approaches!
1. **Random Search** is a naive approach where we make small random changes in the input and keep track of which random combination yields highest increase in the output.
2. Then we looked into **Numerical Gradient** method, which is quite simple (but expensive to compute). We compute it by probing the circuit’s output value as we tweak the inputs one at a time.
3. Lastly, using **Analytical Gradient** method we can be smarter and analytically derive an expression for gradient. It is same as *numerical* method, but it is the fastest and needs no tweaking.

## Circuits with Multiple Gates!

Let us say, we have an expression such as,

$$f(x,y,z) = (x+y)*z$$

In [None]:
def forwardMultiplyGate(a, b):
  return a * b

def forwardAddGate(a, b):
  return a + b

def forwardCircuit(x, y, z):
  q = forwardAddGate(x, y)
  f = forwardMultiplyGate(q, z)
  return f

x, y, z = -2, 5, -4
forwardCircuit(x, y, z)

What do we want?we are interested in finding the derivatives with respect to the three inputs `x`,`y`, and `z`. But how do we compute it now that there are multiple gates involved?