# Problem 1
Consider a neural network with inputs $x$, $y$, and $z$ that computes the following sequence of operations:

$
\begin{aligned}
    m &= x + y, \\
    n &= m \cdot z, \\
    p &= \max(0, n), \\
    q &= \sigma(p), \quad \text{where } \sigma(a) = \frac{1}{1 + e^{-a}} \text{ is the sigmoid function}, \\
    r &= xq + y.
\end{aligned}
$

-  Draw the computational graph for this network.
-  Compute the following partial derivatives:
  $
  \frac{\partial r}{\partial x}, \quad \frac{\partial r}{\partial y}, \quad \frac{\partial r}{\partial z}, \quad \frac{\partial r}{\partial m}, \quad \frac{\partial r}{\partial n}.
 $
  
- Express the derivatives using:
  1. Backpropagation rules, and
  2. Intermediate variables and derivatives (e.g., $sigma'(a)$ ).



# Problem 2: Forward Pass in a Neural Network

You have a neural network with:

- A single hidden layer with ReLU activation
- An output layer with sigmoid activation

Weight matrices and biases are as follows:

$
W^{(1)} = \begin{bmatrix} 0.5 & 0.2 \\ 0.3 & 0.7 \end{bmatrix}, \quad b^{(1)} = \begin{bmatrix} 0.1 \\ -0.3 \end{bmatrix},
$
$
W^{(2)} = \begin{bmatrix} 0.6 & 0.8 \end{bmatrix}, \quad b^{(2)} = \begin{bmatrix} 0.2 \end{bmatrix}.
$

The input is:

$
x = \begin{bmatrix} 1.0 \\ 2.0 \end{bmatrix}.
$

Tasks:

1. Perform a forward pass through the network to compute the output.
2. Use the L2 loss function:
   $
   L(y_\text{true}, y_\text{pred}) = \frac{1}{2}(y_\text{true} - y_\text{pred})^2
   $
   to calculate the loss for $y_\text{true} = 1.0$.

---

# Problem 3: Backpropagation for Gradient Calculation

Consider a single-layer neural network with:

- Input: $(x_1, x_2)$
- Weights: $w = [w_1, w_2]$
- Bias: $b$
- Output: 
  $
  \hat{y} = \sigma(w_1 x_1 + w_2 x_2 + b), \quad \sigma(a) = \frac{1}{1 + e^{-a}}.
  $
- Loss Function: 
  $
  L(y, \hat{y}) = (y - \hat{y})^2.
  $

### Given:
$
(x_1, x_2) = [0.5, -1.0], \quad y = 0.3, \quad w_1 = 0.8, \quad w_2 = -0.5, \quad b = 0.1.
$


### a) Compute $\frac{\partial L}{\partial w_1}$ and $\frac{\partial L}{\partial w_2}$ using backpropagation.

# Problem 4: KL Divergence and Cross-Entropy

Suppose we have a true distribution of
- **True distribution P**:
  $
  P = \{A: 0.6, B: 0.3, C: 0.1
  $
and an estimated distribution of
  $
  Q = \{A: 0.4, B: 0.5, C: 0.1\},$
  

### a) Compute KL divergence $D_{KL}(P \parallel Q)$
### b) Suppose we obtain a better estimate Q′ s.t.
  
  $
  \quad Q' = \{A: 0.7, B: 0.2, C: 0.1\}.
  $


Compute the cross-entropy H (P, Q′).
### c) Is Kullback- Leibler Divergence symmetric? i.e. if we reverse the distributions for P, Q, will we get the same result?
### d) Suppose we have a true label y = [1, 0, 0], and a neural network model computes estimated probabilities $\hat{y} = [0.7, 0.2, 0.1] $. Compute the cross-entropy loss. correct this markdown code and write code version


# Problem 5: Backpropagation on Multiple Layers

Consider a two-layer neural network:

### Forward Pass Equations:

- **Input:**
  $
  \mathbf{x} = [x_1, x_2]
  $
  
- **Hidden layer:**
  $
  \mathbf{h} = \sigma(W^{(1)} \mathbf{x} + b^{(1)})
  $
  
- **Output:**
  $
  \hat{y} = \sigma(W^{(2)} \mathbf{h} + b^{(2)})
  $
  where $\sigma(a) = \frac{1}{1 + e^{-a}}$ is the sigmoid activation function.

### Given:

- **Weights and biases:**
  - $ W^{(1)} = \begin{bmatrix} 0.1 & 0.3 \\ 0.2 & 0.4 \end{bmatrix} $
  - $ b^{(1)} = [0.1, -0.2]$
  - $ W^{(2)} = [0.5, 0.6] $
  - $b^{(2)} = [0.3] $


### a) Perform the forward pass to compute $\hat{y}$ for $\mathbf{x} = [1.0, -1.0]$.

### b) Compute the gradient of the loss function $ L(y, \hat{y}) = \frac{1}{2}(y - \hat{y})^2$ with respect to $ W^{(1)}, W^{(2)}, b^{(1)}, b^{(2)}$ using backpropagation.


## Problem 6: Neural Networks and Backpropagation

![image.png](attachment:image.png)

## Compute   $\frac{\partial y}{\partial z^{(1)}}$,$\frac{\partial y}{\partial z^{(2)}}$, $\frac{\partial y}{\partial x}$, $\frac{\partial y}{\partial A_{*,j}}$ , where $A_{∗,j}$ is the $j-th$ column of $A$ 

## Problem 7 Neural Networks and Backpropagation

![image.png](attachment:image.png)
## Compute   $\frac{\partial \psi }{\partial w^{(1)}}$,$\frac{\partial \psi}{\partial w^{(2)}}$, $\frac{\partial \psi}{\partial A}$, $\frac{\partial \psi}{\partial w^{(3)}}$ 

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

# Problem 8: Kullback-Leibler Divergence

### Given:
1. **True Distribution (P):** A binomial distribution with parameters:
   - $ N = 2 $, $ p = 0.4 $
   - $ x \in \{0, 1, 2\} $, where:
     $
     P(x) = \binom{N}{x} p^x (1-p)^{N-x}
     $

2. **Estimated Distribution (Q):** A discrete uniform distribution:
   - $ Q(x) = \frac{1}{3}, \, \forall x \in \{0, 1, 2\} $

1. **Compute $ D_{KL}(P \parallel Q) $:**  
   Use the formula:
   $
   D_{KL}(P \parallel Q) = \sum_{x \in \mathcal{X}} P(x) \log \frac{P(x)}{Q(x)}$
   
   
   
   







![image.png](attachment:image.png)
