Here's the complete explanation with proper Markdown and LaTeX formatting that you can directly copy into a Jupyter Notebook:

# Neural Network from Scratch: Mathematical Foundations

## 1. Network Architecture

### Layer Dimensions:
- **Input layer**: 2 neurons (for 2D input data)  
- **Hidden layer**: $n_h$ neurons (configurable)  
- **Output layer**: 1 neuron (single output)


```
Input → Hidden Layer → Output
(x₁,x₂)     h        ŷ
```

## 2. Forward Propagation

### Hidden Layer Calculation
$$
\mathbf{h} = \sigma(\mathbf{X}\mathbf{W}_1 + \mathbf{b}_1)
$$

Where:
- $\mathbf{X} \in \mathbb{R}^{m \times 2}$: Input matrix (m samples)
- $\mathbf{W}_1 \in \mathbb{R}^{2 \times n_h}$: Weight matrix
- $\mathbf{b}_1 \in \mathbb{R}^{1 \times n_h}$: Bias vector
- $\sigma$: Sigmoid activation

### Output Layer Calculation
$$
\mathbf{\hat{y}} = \sigma(\mathbf{h}\mathbf{W}_2 + \mathbf{b}_2)
$$

Where:
- $\mathbf{W}_2 \in \mathbb{R}^{n_h \times 1}$: Weight matrix
- $\mathbf{b}_2 \in \mathbb{R}^{1 \times 1}$: Bias scalar

### Sigmoid Activation
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

### Sigmoid Derivative
$$
\sigma'(x) = \sigma(x)(1 - \sigma(x))
$$

## 3. Loss Function (MSE)

$$
\mathcal{L} = \frac{1}{m}\sum_{i=1}^m (y_i - \hat{y}_i)^2
$$


## 4. Backpropagation

### Output Layer Gradients
Error at output:
$$
\mathbf{\delta}_2 = (\mathbf{y} - \mathbf{\hat{y}}) \odot \sigma'(\mathbf{\hat{y}})
$$

Weight gradients:
$$
\frac{\partial \mathcal{L}}{\partial \mathbf{W}_2} = \mathbf{h}^T \mathbf{\delta}_2
$$

Bias gradient:
$$
\frac{\partial \mathcal{L}}{\partial \mathbf{b}_2} = \sum \mathbf{\delta}_2 \text{ (across samples)}
$$


### Hidden Layer Gradients
Error propagation:
$$
\mathbf{\delta}_1 = (\mathbf{\delta}_2 \mathbf{W}_2^T) \odot \sigma'(\mathbf{h})
$$

Weight gradients:
$$
\frac{\partial \mathcal{L}}{\partial \mathbf{W}_1} = \mathbf{X}^T \mathbf{\delta}_1
$$

Bias gradient:
$$
\frac{\partial \mathcal{L}}{\partial \mathbf{b}_1} = \sum \mathbf{\delta}_1 \text{ (across samples)}
$$


## 5. Parameter Updates

Gradient descent updates:
$$
\mathbf{W} \leftarrow \mathbf{W} - \eta \frac{\partial \mathcal{L}}{\partial \mathbf{W}}
$$
$$
\mathbf{b} \leftarrow \mathbf{b} - \eta \frac{\partial \mathcal{L}}{\partial \mathbf{b}}
$$

Where $\eta$ is the learning rate.


## 6. Training Algorithm Steps

1. **Initialize** all weights and biases
2. **Forward pass**:
   - Compute hidden layer outputs
   - Compute final predictions
3. **Calculate loss** between predictions and true values
4. **Backward pass**:
   - Compute output layer gradients
   - Propagate error to hidden layer
   - Compute hidden layer gradients
5. **Update parameters** using gradient descent
6. **Repeat** for specified number of epochs


## Implementation Notes

- All equations are implemented using NumPy array operations
- Matrix dimensions must align properly at each step
- The sigmoid activation introduces non-linearity
- MSE loss penalizes large errors quadratically

This Markdown content with LaTeX equations will render perfectly in Jupyter Notebook. The equations will display as properly formatted mathematical notation when the notebook is run.

To use:
1. Create a new Markdown cell in Jupyter
2. Paste this entire content
3. Run the cell to see beautifully formatted equations and explanations

You can intersperse these explanation cells with code cells containing the actual implementation for a complete learning experience.