# Neural Networks – Learning Notes

## Biological Neurons

**Human Brain Neurons:**  
- ~10¹¹ neurons, each connected to ~10⁴ others.  
- Neuron switching time: ~0.001 sec; computer switching time: ~10⁻¹⁰ sec.  
- Parallel processing allows rapid recognition and decision-making.  

**Neuron Structure & Terminologies:**  

| Component | Function |
|-----------|---------|
| **Dendrites** | Collect information from neighboring neurons and transmit to axon. |
| **Cell Body (Soma)** | Processes inputs from dendrites; contains organelles (mitochondria, Golgi bodies, etc.). |
| **Axon** | Transfers impulses from soma to dendrites of next neuron via synapse. Connected to soma via **axon hillock**. |
| **Synapse** | Chemical bridge between axon terminal and next neuron's dendrites. |

---

## Neural Network Concept

- A neuron: **dendrites** collect input → **axon** sends output → **synapse** transmits signal.  
- **Mathematical Model:**  
  - Input vector → output vector.  
  - Synapses modeled as **weights** to carry information.  
- Multiple neurons → **Artificial Neural Network (ANN)**.  

---

## Artificial Neuron Models

### Linear Neuron
- Simple, computationally limited.  
- Similar to linear regression.  

**Mathematical Formulation:**  
- Scalar form:  
\[
y = b + \sum_{i=1}^{n} x_i w_i
\]  
- Vector form:  
\[
y = \mathbf{w} \cdot \mathbf{x} = \mathbf{w}^\top \mathbf{x}
\]  

Where:  
- \(y\) = target/output  
- \(b\) = bias term  
- \(x_i\) = ith input  
- \(w_i\) = ith weight  

---

### Perceptron (Rosenblatt, 1958)
- A generalization of linear neurons, also called **Linear Threshold Unit (LTU)**.  
- Works on non-Boolean values with weighted inputs.  
- Used for **linearly separable data**.  

**Perceptron Components:**  
1. **Inputs:** Vector of features \(x = [x_1, x_2, ..., x_n]^T\)  
2. **Weights:** \(w = [w_1, w_2, ..., w_n]^T\)  
3. **Weighted Sum:**  
\[
\hat{y} = \sum_{i=1}^{n} w_i x_i
\]  
4. **Bias (Intercept):**  
\[
\hat{y} = b + \sum_{i=1}^{n} w_i x_i
\]  
5. **Step Activation Function:**  
\[
output =
\begin{cases}
1 & \text{if } \hat{y} \ge \text{threshold} \\
0 & \text{otherwise}
\end{cases}
\]  
- Bias can act as the threshold: \(threshold = -bias\)  

**Intuition:**  
- Large bias → neuron activates easily  
- Small bias → neuron rarely fires  

# Computational Graph and Backpropagation – Learning Notes

## Computational Graphs
- A **computational graph** is a way to represent mathematical functions using graph theory.  
- Nodes represent **operations or variables**.  
- Edges represent the **data flow** (values or weights) between nodes.  

**Illustration:**  
Equation:  
\[
s = 2x
\]  
\[
y = s + a
\]  

- The computational graph has nodes for `x`, `s`, `a`, and `y`.  
- Edges show how data flows from input → operations → output.  
- Forward propagation: compute values from inputs to predict output using the graph.  

---

## Backpropagation
- **Backpropagation** is an algorithm used to train neural networks by **propagating errors backward** to update weights/parameters.  
- It uses the **chain rule** of derivatives to compute how the loss changes with respect to each parameter.  

**Example:** Logistic Regression  

- Inputs: \(x_1, x_2\)  
- Weighted sum + bias: \(z = w_1 x_1 + w_2 x_2 + b\)  
- Activation: \(\hat{y} = \sigma(z)\) (sigmoid function)  
- Label: \(y\)  
- Loss: Binary Cross-Entropy  
\[
\mathcal{L}(y, \hat{y}) = -\left[y \log(\hat{y}) + (1-y)\log(1-\hat{y})\right]
\]  

**Backpropagation Steps:**  
1. Compute **forward pass** to get predicted output \(\hat{y}\).  
2. Compute **loss** using the actual label \(y\).  
3. Compute **gradients** of loss w.r.t weights using chain rule:  
   \[
   \frac{\partial \mathcal{L}}{\partial w_i} = \frac{\partial \mathcal{L}}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial w_i}
   \]  
4. Update weights:  
\[
w_i = w_i - \eta \frac{\partial \mathcal{L}}{\partial w_i}
\]  
where \(\eta\) is the learning rate.  

- Repeat steps for all training examples until convergence.