## Learnable strides and Backprobagation

Simple example. 
Define $y$ output image (after convolution) and $x$ the input image.

Let $R = \{ (0,0),(0,1),(1,0),(1,1) \}$ enumerate all the location of the kernel $w$
In the case of multiple input channels $R$ can be a set of 3D indexes
for example $R = \{ (0,0,0),(0,0,1) \dots \}$

Also let $s=(s_x,s_y)$ be a set of positive scalars that encapsulates the strides
per location


The output for the output pixel $p$ is defined as: 
$$ y_{p} = \sum_{p_n \in R} w_{p_n}x_{p*s + p_n} $$

The $x_{p*s + p_n}$ value on the input image can be calculated via bilinear interpolation
...

Backpropagation
Supose we have an output $y \in \Re^{2 \times 2}$

$$
\frac{\partial L}{\partial y} = 
\begin{bmatrix}
\frac{\partial L}{\partial y_{0,0}} & \frac{\partial L}{\partial y_{1,0}} &
\frac{\partial L}{\partial y_{0,1}} & \frac{\partial L}{\partial y_{1,1}}
\end{bmatrix}
$$

$$
\frac{\partial y}{\partial s} = 
\begin{bmatrix}
\frac{\partial y_{0,0}}{\partial s_{x}} & \frac{\partial y_{0,0}}{\partial s_{y}} \\
\frac{\partial y_{1,0}}{\partial s_{x}} & \frac{\partial y_{1,0}}{\partial s_{y}} \\
\frac{\partial y_{0,1}}{\partial s_{x}} & \frac{\partial y_{0,1}}{\partial s_{y}} \\
\frac{\partial y_{1,1}}{\partial s_{x}} & \frac{\partial y_{1,1}}{\partial s_{y}} \\
\end{bmatrix}
$$

Where for example $\frac{\partial y_{0,0}}{\partial s_{x}}$ is equal to 
$$
\frac{\partial y_{0,0}}{\partial s_{x}} = \sum_{p_n \in R} w_{p_n} \frac{\partial x_{(0,0)*s + p_n}}{\partial s_{x}}
$$

In [1]:
import numpy as np

In [2]:
# The same code as in the original MMCV implementation

def get_coordinate_weight(argmax_h, argmax_w, height, width, im_data, data_width, bp_dir):
    if (argmax_h <= -1 or argmax_h >= height or argmax_w <= -1 or argmax_w >= width):
        return 0;
  

    argmax_h_low = int(np.floor(argmax_h))
    argmax_w_low = int(np.floor(argmax_w))
    argmax_h_high = int(argmax_h_low + 1)
    argmax_w_high = int(argmax_w_low + 1)

    weight = 0

    if (bp_dir == 0):
        if (argmax_h_low >= 0 and argmax_w_low >= 0):
            
            weight += -1 * (argmax_w_low + 1 - argmax_w) * im_data[argmax_h_low , argmax_w_low]
            
        if (argmax_h_low >= 0 and argmax_w_high <= width - 1):
            weight += -1 * (argmax_w - argmax_w_low) *im_data[argmax_h_low , argmax_w_high]
            
        if (argmax_h_high <= height - 1 and argmax_w_low >= 0):
            weight += (argmax_w_low + 1 - argmax_w) *im_data[argmax_h_high, argmax_w_low]
            
        if (argmax_h_high <= height - 1 and argmax_w_high <= width - 1):
            weight += (argmax_w - argmax_w_low) * im_data[argmax_h_high , argmax_w_high]
    
    elif (bp_dir == 1):
        if (argmax_h_low >= 0 and argmax_w_low >= 0):
            weight += -1 * (argmax_h_low + 1 - argmax_h) * im_data[argmax_h_low,argmax_w_low]
            
        if (argmax_h_low >= 0 and argmax_w_high <= width - 1):
            weight += (argmax_h_low + 1 - argmax_h) *im_data[argmax_h_low , argmax_w_high]
            
        if (argmax_h_high <= height - 1 and argmax_w_low >= 0):
            weight += -1 * (argmax_h - argmax_h_low) *im_data[argmax_h_high, argmax_w_low]
            
        if (argmax_h_high <= height - 1 and argmax_w_high <= width - 1):
            weight += (argmax_h - argmax_h_low) * im_data[argmax_h_high , argmax_w_high]
            

    
    return weight


In [3]:
# This code get the gradient of y[h_out, w_out] wrt s_x or s_y
# This code agrees with my implementation in CUDA

input = [[[[10., 2., 3.], [0., 1., 2.], [30., 5., 2.]]]]

w_out = 0
h_out = 0

s_x = 2.2
s_y = 2.4

w_in = w_out * s_x - 0;
h_in = h_out * s_y - 0; 

# Input dimentions
height = 3
width =  3



val = 0

# Kernel values
weight = np.array([0.4, 0.2, 0.1, 0.9])
weight = weight.reshape(2,2)

# 0 for x, 1 for y
direction = 1

# Locations of 2x2 kernel
k = [0,1,0,1]
l = [0,0,1,1]
for i,j in zip(k,l):
    
    # x,y axis in indexing are different in c++
    inv_h = h_in + j * 1
    inv_w = w_in + i * 1
    print('pos ',inv_h, inv_w)

    
    if (inv_h <= -1 or inv_w <= -1 or inv_h >= height or inv_w >= width):
        inv_h = inv_w = -2

    val += weight[i,j]*get_coordinate_weight(inv_w, inv_h, height, width, np.array(input)[0,0,:,:], 0, direction)
    
val

pos  0.0 0.0
pos  0.0 1.0
pos  1.0 0.0
pos  1.0 1.0


-2.0