# Introduction

This is a python notebppk made to explain the concept of neural networks and how you can code one yourself while using Mr Andrej Karpathy's amazing (<a href="https://www.youtube.com/watch?v=VMj-3S1tku0">video</a>) as reference for all the code.

# What is a Neural Network ?
To put it in simple terms, neural networks are mathematical expressions that take data as inputs (along with their corresponding weights and bias).  
This data is then send through multiple 'layers' with each layer containing multiple 'neurons'. The network learns by adjusting the weights and biases of each neuron in order to <b>minimize the loss</b>. The lower the loss, the higher the accuracy of the model. 

## What is a Neuron ?
It is the basic building block of a neural network. It takes in the following inputs: 

- Weights: This represents the importance level of the corresponding input. (how important a given feature is)
- Bias: Addition of bias term helps in adjusting neuron's output.
- Activation Function: This function helps to supress the output value within a specific range. It determines the neuron's outputs based on its inputs

## Building the Value object
As we talked about in the prev paragraph, a Neuron takes in multiple values as input. Each value has some mathematical operation performed on it. This includes addition, multiplication etc.

Below is a very diagram which helps in visualizing the concept of Neuron and Value. 

<img src="./images/simple_neuron.png" />

## Neural Network Structure
So we saw what a Neuron is and what a Value object is. 
Now lets talk a bit about what the overall structure of a neural network is. A simple neural network consists of 3 types of layers:  
1. Input Layer: The number of neurons in this layer is based on the number of features/inputs from initial data 
2. Hidden Layer(s): These are one or more layers of neurons that process the information from the input layer. 
3. Output Layer: This layer produces the final results. Usually the number of (output) neurons in this layer depends on the problem beeing solved. Example, if we are training the neural network to classify 3 different types of dogs and so each output neuron will contain a probability of the input belonging to one of the three classes/types. 

We will now move on to talking about the different processes within a neural network starting with the forward pass.

### Forward Pass
The forward pass (also called the forward propagation) processes information in the following order:
1. Each neuron receives inputs
2. The neuron computes the weighted sum of the inputs. 

In [None]:
import math
# creating the Value object and enabling functions such as addition and subtraction
class Value:
    def __init__(self, data, _children=(), _op="", label="") -> None:
        """
        data: value of the neuron (activation value)
        _children: set. used to store the children of the node
        label: neuron reference name
        _op: operation performed to produce the Value. 
        label: label on the Value object
        """
        self.data = data
        self._prev = set(_children)
        self._op = _op
        self.label = label
        # keeping track of derivative of the node
        self.grad = 0
        # store chain rule. by default does nothing
        self._backward = lambda: None

    def __repr__(self) -> str:
        """
        print the value stored within the Value object
        """
        return f"Value(data={self.data})"

    def __add__(self, other) -> int | float:
        # if we are trying to add integer to a Value object, convert the number into a Value object then extract its value and add
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data, (self, other), "+")

        def _backward():
            # here we accumulate gradient as it is possible we may have multivariable contribution backwards. depositing gradients from mutliple branches
            self.grad += 1.0 * out.grad
            other.grad += 1.0 * out.grad

        out._backward = _backward
        return out

    def __radd__(self, other):
        return self + other

    def __mul__(self, other) -> int | float:
        """
        other: usually this is the weight
        """
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data, (self, other), "*")

        def _backward():
            self.grad += other.data * out.grad
            other.grad += self.data * out.grad

        out._backward = _backward
        return out

    # this is basically a fallback for if the __mul__ function does not get the inputs in the other self, other. to learn more about it try this link : https://stackoverflow.com/questions/5181320/under-what-circumstances-are-rmul-called

    def __rmul__(self, other):
        return self * other

    def __truediv__(self, other):
        # self * 1/other = self/other
        return self * other**-1

    def __neg__(self):
        return self * -1

    def __sub__(self, other):
        # subtraction implementation
        return self + (-other)

    def __pow__(self, other):
        # power function.
        assert isinstance(
            other, (int, float)
        ), "only supporting int/float powers for now"
        out = Value(self.data**other, (self,), f"**{other}")

        def _backward():
            # multiplying out.grad is essnetial to the chain rule
            new_val = other - 1
            self.grad += other * (self.data**new_val) * out.grad

        out._backward = _backward
        return out

    def tanh(self):
        x = self.data
        t = (math.exp(x) - math.exp(-x)) / (math.exp(x) + math.exp(-x))
        out = Value(t, (self,), "tanh")

        def _backward():
            # derivative of the tanh func. starting from loss L
            self.grad += (1 - t**2) * out.grad

        out._backward = _backward
        return out

    def exp(self):
        x = self.data
        out = Value(math.exp(x), (self,), "exp")

        def _backward():
            self.grad += out.data * out.grad

        out._backward = _backward

        return out

    def backward(self):
        topo = []
        visited = set()

        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._prev:
                    build_topo(child)
                topo.append(v)

        build_topo(self)
        self.grad = 1.0
        for node in reversed(topo):
            node._backward()