## Problem 1 

### 1A) Autoencoder loss

Up until this point, the exercises have been focused on supervised learning. In the coming weeks we will
start to focus on unsupervised approaches, where this week’s exercise will be concentrated on autoencoders.
Assume a dataset of N d-dimensional feature vectors, where the nth vector is represented as:
**x(n) = [x(n)1 , x(n)2 , · · · , x(n)d ]T .**

An autoencoder usually consists of two parts, an encoder and a decoder. The encoder is tasked with
learning a useful representation of the data while the decoder tries to reconstruct the original input from the
representation obtained by the encoder. Consider a simple autoencoder with one hidden layer consisting of
M neurons, no bias and linear activation functions. For a single sample, the encoder of this model can be
described by

### Encoder:
### **h(x(n)) = Wx(n) = c(n)**

where:
- **h(x(n))** : The **encoder function** which takes in the **Input vector (x(m))**
- **X(n)**    : **Input vector**. Its a **d-dimensional** vector (d-features)
- **W**       : is the **weight matrix**
- **Wx(n)**   : is the **matrix multiplication** of the **weights** and the **Input vector**
- **c(n)**    : is the **compressed representation**



### Decoder:
### **g(c(n)) = W∗c(n) =  ̃x(n)**

where:
- **g(c(n))** : The decoder function which takes in the compressed representation and tries to reconstruct the original input.
-  **c(n)**   : Compressed representation
- W*          : It the weight matrix of the decoder W* = W^T
- W∗c(n)      : It is the matrix multiplication of the weight matrix (W*) and the compressed representation **(c(n))**
-  ̃x(n)       : It is the reconstructed output of the autoencoder


A common type of regularization for autoencoders is so called tied-weights. For the autoencoder described
above, this regularization can be described by setting W∗ = WT , thus limiting the capacity of the autoencoder
and reducing the potential for overfitting.

### Encoder Example:
 
 - A 3-dimensional vector(d = 3):  $x^{(n)} = [1, 2, 3]$
 - Two neurons in the hidden layer (M = 2), so W is a 2 x 3 matrix: $W = \begin{bmatrix} 0.5 & 0.1 & 0.3 \\ 0.2 & 0.4 & 0.6 \end{bmatrix}$
 
 - Multiplying W by x(n) gives: 
 $c^{(n)} = Wx^{(n)} = \begin{bmatrix} 0.5 \cdot 1 + 0.1 \cdot 2 + 0.3 \cdot 3 \\ 0.2 \cdot 1 + 0.4 \cdot 2 + 0.6 \cdot 3 \end{bmatrix}$
                    
 $c^{(n)} = \begin{bmatrix} 1.6 \\ 2.8 \end{bmatrix}$

 ### Decoder Example:
 - Given that the Encoder gave compressed representation:  **$c^{(n)} = \begin{bmatrix} 1.6 \\ 2.8 \end{bmatrix}$** 
 - The decoder weight matrix $W^* = W^T$ (Same as Encoder weights, but Transposed):

- $W^* = W^T =  \begin{bmatrix} 0.5 & 0.2 \\ 0.1 & 0.4 \\ 0.3 & 0.6 \end{bmatrix}$
- To reconstruct, we have to Multiply the $W^*$ by $c^{(n)}$:
- $\tilde{x}^{(n)} = W^*C^{(n)} =  \begin{bmatrix} 0.5 * 1.6 + 0.2 * 2.8 \\0.1 * 1.6 + 0.4 * 2.8\\ 0.3 * 1.6 + 0.6 * 2.8\end{bmatrix} = \begin{bmatrix} 1.36 \\ 1.28 \\ 2.16 \end{bmatrix}$

- So, $\tilde{x}^{(n)} = \begin{bmatrix} 1.36 \\ 1.28 \\ 2.16 \end{bmatrix}$ is the reconstructed version of the original input


### <span style="color: lightgreen">1A) Answer</span>

#### Assuming a mean squared error loss function and tied-weights, what would be the loss function of the autoencoder described above? 

- Mean Squared Error(MSE) calculates the average squared difference between the original input ($x^{(n)}$) and the reconstructed output $\tilde{x}^{(n)}$ 

- For a single datapoint ($x^{(n)}$), the squared error is:

$Squared Error = ||x^{(n)} - \tilde{x}^{(n)}||^2$

- Since the autoencoder uses tied weights ($W* = W^T$), the reconstructed output $\tilde{x}^{(n)}$ is:

$\tilde{x}^{(n)} = g(c^{(n)}) = W*c^{(n)} = W^Tc^{(n)}$ 

and $c^{(n)} = h(x^{(n)})= Wx^{(n)}$

- Substituting $c^{(n)}$ into the reconstruction:
$\tilde{x}^{(n)} = W^T(Wx^{(n)})$

So, the MSE loss function for the entire dataset of $N$ samples is:
  
$Loss = \frac{1}{N}\sum_{n=1}^{N} ||x^{(n)} - W^T(Wx^{(n)})||^2$





#### 1a) Assuming d > M , what do we call such an autoencoder?

If the input dimension $d$ is greater than the number of hidden neurons $M (d > M)$, the autoencoder is called an **undercomplete** autoencoder.

#### Why?
The hidden layer har fewer neurons ($M$) than the input dimension ($d$), which forces he autoencoder to learn a compressed representation of the data. This compression helps the model focus on the most important features of the data, making it useful for tasks like dimensionality reduction or feature extraction.
