# Deep Learning for Computer Vision

---

**Goethe University Frankfurt am Main**

Winter Semester 2022/23

<br>

## *Assignment 2 (Regularization)*

---

**Points:** 10<br>
**Due:** 10.11.2022, 10 am<br>
**Contact:** Matthias Fulde ([fulde@cs.uni-frankfurt.de](mailto:fulde@cs.uni-frankfurt.de))<br>

---

**Your Name:** Max Althaus

<br>

<br>

## Table of Contents

---

- [1 L1 Regularization](#1-L1-Regularization-(5-Points))
  - [1.1 Implementation](#1.1-Implementation-(3-Points))
  - [1.2 Explanation](1.2-Explanation-(2-Points))
- [2 L2 Regularization](#2-L2-Regularization-(5-Points))
  - [2.1 Implementation](#2.1-Implementation-(3-Points))
  - [2.2 Explanation](#2.2-Explanation-(2-Points))


<br>

## Setup

---

In this notebook we use the only the **NumPy** library.

We import definitions of regularizers from the `regularization.py` module and enable autoreload, so that the imported functions are automatically updated whenever the code is changed.

In [1]:
import numpy as np

from regularization import L1_reg, L2_reg

%load_ext autoreload
%autoreload 2

<br>

## Exercises

---

### 2 L1 Regularization (5 Points)

---

In this exercise we want to implement **L1 regularization**. Here, the regularizer is the absolute value of the model's weights, defined as

$$
    R(W) = \sum_{i=1}^D \sum_{j=1}^K \vert W_{i,j} \vert.
$$

In order to control the effects of the regularization term, we introduce the regularization strength $\lambda$ as a hyperparameter. The complete loss for our model is then the sum of the data loss $\mathcal{L}$ and the regularization loss $R$, that is

$$
    J(W) = \mathcal{L}(W) + \lambda R(W).
$$


<br>

### 1.1 Implementation (3 Points)

---

Complete the definition of the `L1_reg` function in the `regularization.py` file.

The function takes a parameter matrix $W$ of shape $(D+1, K)$, where $K$ is the number of categories and $D$ is the dimension of the inputs. The last row is assumed to be the bias. The second parameter is the regularization strength.

The function should return a tuple $(R, dW)$ with the regularization loss $R$, computed only for the weights and not the bias, and the gradient of the loss $dW$ with respect to the parameters. So the loss $R$ is a scalar and $dW$ has the same shape as $W$.

Use only vectorized NumPy operations for the implementation. No loops are allowed.

<br>

#### Test 1.1.1

To test your implementation, you can run the following code.

In [32]:
# Define dummy parameters.
W = np.array([
    [ 1.2,  3.6,  8.1],
    [ 4.0, -1.0,  3.6],
    [-9.6,  2.5, -6.3],
    [ 3.5, -7.2, -2.0]
])

# Compute regularization loss.
R, dW = L1_reg(W, 0.5)


#### Wofür bilden wir die Ableitung?

In [33]:
# Compare loss.
loss_equal = abs(R - 19.95) < 1e-5

# Compare derivatives.
grad_equal = np.array_equal(dW, np.array([
    [ 0.5,  0.5,  0.5],
    [ 0.5, -0.5,  0.5],
    [-0.5,  0.5, -0.5],
    [ 0.0,  0.0,  0.0]
]))

# Show results.
print(loss_equal and grad_equal)

True


<br>

### 1.2 Explanation (2 Points)

---

Briefly describe in your own words how the L1 regularization affects the parameters of the model.

<br>

##### Answer

The L1 regularization reduces the weights of the model to avoide overfitting. It reduces in a linear ratio to the sum of alle weights.

<br>

### 2 L2 Regularization (5 Points)

---

In this exercise we want to implement **L2 regularization**. Here, the regularizer is the squared euclidean distance of the model's weights, defined as

$$
    R(W) = \sum_{i=1}^D \sum_{j=1}^K W_{i,j}^2.
$$

Again, we have the regularization strength $\lambda$ as an additional hyperparameter, controlling by how much we restrict the model's parameters. The complete loss for our model is the sum of the data loss $\mathcal{L}$ and the regularization loss $R$, that is

$$
    J(W) = \mathcal{L}(W) + \lambda R(W).
$$


<br>

### 2.1 Implementation (3 Points)

---

Complete the definition of the `L2_reg` function in the `regularization.py` file.

The function takes a parameter matrix $W$ of shape $(D+1, K)$, where $K$ is the number of categories and $D$ is the dimension of the inputs. The last row is assumed to be the bias. The second parameter is the regularization strength.

The function should return a tuple $(R, dW)$ with the regularization loss $R$, computed only for the weights and not the bias, and the gradient of the loss $dW$ with respect to the parameters. So the loss $R$ is a scalar and $dW$ has the same shape as $W$.

Use only vectorized NumPy operations for the implementation. No loops are allowed.

<br>

#### 2.1.1 Test

To test your implementation, you can run the following code.

In [40]:
# Define dummy parameters.
W = np.array([
    [ 1.2,  3.6,  8.1],
    [ 4.0, -1.0,  3.6],
    [-9.6,  2.5, -6.3],
    [ 3.5, -7.2, -2.0]
])

# Compute regularization loss.
R, dW = L2_reg(W, 0.5)

#### Wieso wird hier die Ableitung ohne r gebildet?

In [39]:
# Compare loss.
loss_equal = abs(R - 124.035) < 1e-5

# Compare gradient.
grad_equal = np.array_equal(dW, [
    [ 1.2,  3.6,  8.1],
    [ 4.0, -1.0,  3.6],
    [-9.6,  2.5, -6.3],
    [ 0.0,  0.0,  0.0]
])

# Show results.
print(loss_equal and grad_equal)

True


<br>

### 2.2 Explanation (2 Points)

---

Briefly describe in your own words how the L2 regularization affects the parameters of the model.

<br>

##### Answer

The L2 regularization reduces the weights in W in a quadratic way. So if there are very high values, it will penalize more.