# Computer Vision PS (WS21/22)


# Exercise sheet D (ExD)

**Group members**: please list all group members here

**Total (possible) points**: 14

In [None]:
import torch
import torch.nn as nn
from torch import tensor
from torch.autograd import grad

## ExD.1 (3 points)

Let us assume we have a point $\mathbf{x} \in \mathbb{R}^3$ and 10 additional points $\mathbf{y}_1,\ldots,\mathbf{y}_{10}$ with $\forall i: \mathbf{y}_i \in \mathbb{R}^3$.

Now, say we want to compute (1) 

$$\mathbf{o}_{1i} = \mathbf{y}_i^2\enspace,$$

then (2)

$$ o_{2i} = \exp\left( \frac{-\| \mathbf{x} - \mathbf{o}_{1i} \|^2}{\sigma} \right)$$

followed by (3)

$$ o_{3} = \sum_{i} o_{2i}$$

and (4)

$$ o_4 = \frac{1}{1+\exp{(-o_3)}}$$

If we want to minimize $o_4$ over $\mathbf{y}_1,\ldots,\mathbf{y}_{10}$ via gradient descent, we need gradients wrt. those points, i.e.,

$$ \frac{\partial o_4}{\partial \mathbf{y}_i}\enspace.$$

In [None]:
# set random seed
torch.manual_seed(123)

y = torch.randn(10, 3, requires_grad=True)  # y_i's with respect to which we need gradients
x = tensor([8.,3.,4.])                      # x

Below is an implementation of (1)-(4) from above. Use `torch.autograd.grad` to compute gradients wrt. the $\mathbf{y}_i$.

In [None]:
o1 = torch.pow(y, 2.)
o2 = torch.exp(-(x-o1).norm(dim=1, p=2)**2)
o3 = o2.sum()
o4 = torch.sigmoid(o3)

Compute $\frac{\partial o_4}{\partial \mathbf{y}_i}$ and explain why we do not get any useful gradients.

In [None]:
# YOUR CODE GOES HERE

**Explanation**: ...

## ExD.2 (5 points)

Say we have a linear map $f: \mathbb{R}^{100} \to \mathbb{R}^{45}, \ \mathbf{x} \mapsto \mathbf{A}\mathbf{x}$, implemented via `nn.Linear(100, 45, bias=False)` and we want to compute the *Lipschitz* constant of this linear map. 

**Definition**: A function $f: \mathbb{R}^n \to \mathbb{R}^m$ is called Lipschitz continuous if there exists a constant $L \geq 0$ such that 

$$\forall \mathbf{x},\mathbf{y} \in \mathbb{R}^n: \| f(\mathbf{x}) - f(\mathbf{y})\|_2 \leq L \| \mathbf{x}-\mathbf{y} \|_2$$

For a linear map (as above), the Lipschitz constant is given by the largest *singular value* of $\mathbf{A}$.

For this exercise, (1) initialize a linear layer with `nn.Linear(100, 45, bias=False)` and then (2) compute the Lipschitz constant (look into `torch.linalg` for help). Subsequently, write a function that takes, as arguments, a linear layer (i.e., a variable of type `nn.Linear`), a desired Lipschitz constant $L' \geq 0$ and reconfigures the weight matrix such that this Lipschitz constant is satisfied.

In [None]:
torch.manual_seed(1234)
# (1) initialize the linear layer in the configurations specified above
# (2) compute Lipschitz constant (L)

# YOUR CODE GOES HERE
# print('L(f): {:.3f}'.format(L))

In [None]:
def constrain(f: nn.Linear, L:float):
    # YOUR CODE GOES HERE"
    pass

## ExD.3 (6 points)

Say you have a grayscale image of size $W \times H$, i.e., one ``color'' channel. This image can be stored as a $1 \times W \times H$ tensor. In case you have multiple ($B$) such images, you could store them as a $B \times 1 \times W \times H$ tensor.

In [None]:
# e.g., 6 (random) images of size 32x32
torch.manual_seed(1234)
img = torch.randn(6,1,32,32)

The idea of this exercise is, to first split the image into $P \times P$ patches (e.g., $8 \times 8$), then vectorize each patch and forward each vectorized patch through a simple MLP, implementing

$$f: \mathbf{x} \mapsto \mathbf{B}\text{ReLU}(\mathbf{A}\mathbf{x})$$

where $\mathbf{A} \in \mathbb{R}^{P^2 \times P^2}$ and $\mathbf{B} \in \mathbb{R}^{P^2 \times 128}$. 

**Evaluation**: use the `img` tensor from above as input, and take $P=8$. The output should then be a tensor of size $6 \times 16 \times 128$.




The splitting into patches can be easily done using the `Rearrange` layer in the `einops` library (see [here](https://einops.rocks/)). You can install `einops` via 

```bash
pip install einops
```

Below you find a working example of this strategy.

**Important**: In the original version of this notebook, I wanted you to do this, but I accidentially included the solution :) 

So, the task now is to implement the same functionality via 2D convolutions (think about the correct size and stride of the convolution kernel).

In [None]:
# imports that require einops
from einops import rearrange, repeat
from einops.layers.torch import Rearrange
import torch.nn.functional as F

In [None]:
patch_H, patch_W = 8, 8
patch_D = patch_W * patch_H
embedding_dim = 128

mapping = nn.Sequential(
            Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', 
                       p1 = patch_H, 
                       p2 = patch_W),
            nn.Linear(patch_D, patch_D, bias=False),
            nn.ReLU(),
            nn.Linear(patch_D, embedding_dim, bias=False)
        )

In [None]:
import numpy as np
np.sum([p.numel() for p in mapping.parameters()])

In [None]:
# test with
out = mapping(img)
print(out.size())

In [None]:
# Below is a template class that you can use 

class MyOp(nn.Module):
    def __init__(self, patch_W=8, patch_H=8, embedding_dim=128):
        super().__init__()
        
        assert patch_W == patch_H
        
        self.patch_W = patch_W
        self.patch_H = patch_H
        self.embedding_dim = embedding_dim
        
        #
        #
        # YOUR CODE GOES HERE ... defining the operations
        #
        #
    
    def forward(self, x):
        #
        #
        # YOUR CODE GOES HERE ... defining the forward pass through the operations
        #
        #

In [None]:
# test with
obj = MyOp(8,8,128)
print(obj(img).size())