<h3> Computer Vision building blocks </h3>
<p> I intend to use this notebook to implement the basic building blocks of all computer vision algorithms. I feel like I often forget what an implementation for a convolution looks like, or what erosion/dilation actually do under the hood. I hope to implement some of these building blocks here and use it as a reference to come back to, whenever I need to refresh certain concepts. </p>

<h4> Convolution </h4>

In [43]:
import numpy as np
import cv2
import math
np.set_printoptions(suppress=True)  # suppress scientific notation

In [56]:
def convolution(input_matrix, conv_filter, stride=1, padding='valid'):
    """
    Slide the conv_filter over input matrix
    and do element-wise multiplication
    """
    output_height = output_width = len(input_matrix) - len(conv_filter) + 1
    output = np.zeros((output_height, output_width))
    for i in range(0, output_height):
        for j in range(0, output_width):
            temp_output = input_matrix[i:i+len(conv_filter), j:j+len(conv_filter)] * conv_filter  # element-wise
            output[i][j] = np.sum(temp_output)
    
    return output

In [92]:
def max_pooling(input_matrix, pool_size, stride=1):
    output_height = output_width = len(input_matrix) - pool_size + 1 - stride
    output = np.zeros((output_height, output_width))
    for i in range(0, output_height):
        for j in range(0, output_width):
            if i == 0 and j == 0:
                output[i][j] = np.max(input_matrix[i:i + pool_size, j:j + pool_size])
            elif i == 0 and j > 0:
                output[i][j] = np.max(input_matrix[i:i + pool_size, j + stride:j + stride + pool_size])
            elif i > 0 and j == 0:
                output[i][j] = np.max(input_matrix[i + stride:i + stride + pool_size, j:j + pool_size])
            else: # i > 0 and j > 0
                output[i][j] = np.max(input_matrix[i + stride:i + stride + pool_size, j + stride:j + stride + pool_size])
    
    return output

In [109]:
A = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
arr = np.random.randint(0, 100, (10, 10))
conv_filter = np.array([[1, 1], [1, 1]])
# convolution(A, conv_filter)
max_pooling(arr, pool_size=5, stride=1)

array([[97., 90., 90., 82., 90.],
       [97., 91., 91., 98., 98.],
       [97., 91., 91., 98., 98.],
       [97., 91., 91., 98., 98.],
       [88., 91., 94., 98., 98.]])

In [99]:
arr = np.random.randint(0, 100, (4, 4))
arr[0:2, 0:2]

array([[35, 90],
       [22, 96]])

<h4> Geometry </h4>
<p>
    Projective space: An extension of Euclidean space in which two lines always meet at a point.
</p>

<h4> Distance metrics </h4>

<h4> Image transformation </h4>
References: <br>
https://docs.opencv.org/3.4/d4/d61/tutorial_warp_affine.html <br>
https://docs.opencv.org/4.x/dd/d52/tutorial_js_geometric_transformations.html

In [35]:
def get_rotation_matrix(img_center, degrees, scale):
    alpha = scale * math.cos(math.radians(degrees))  # cos(theta)
    beta = scale * math.sin(math.radians(degrees))   # sin(theta)
    return np.array([[alpha, beta, (1 - alpha) * center[0] - beta * center[1]],
                             [-beta, alpha, beta * center[0] + (1 - alpha) * center[1]]])

In [48]:
def get_unscaled_rotation_matrix(degrees):
    sin_theta = math.cos(math.radians(degrees))
    cos_theta = math.sin(math.radians(degrees))
    return np.array([[cos_theta, -sin_theta], [sin_theta, cos_theta]])

In [50]:
img = cv2.imread('dog.jpeg')
center = (img.shape[0] // 2, img.shape[1] // 2)
degrees = 45
scale = 1.0
rot_mat = cv2.getRotationMatrix2D(center, degrees, scale)
print('OpenCV rotation matrix')
print(rot_mat)
rot_mat = get_rotation_matrix(center, degrees, scale)
print('My rotation matrix')
print(rot_mat)
print('Unscaled rotation matrix')
rot_mat = get_unscaled_rotation_matrix(degrees)
print(rot_mat)

OpenCV rotation matrix
[[  0.70710678   0.70710678 -53.01933598]
 [ -0.70710678   0.70710678 128.        ]]
My rotation matrix
[[  0.70710678   0.70710678 -53.01933598]
 [ -0.70710678   0.70710678 128.        ]]
Unscaled rotation matrix
[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]


In [53]:
# Rotate an image 90 degrees
def rotate(img, degrees, scale):
    img = cv2.imread(img)
    center = (img.shape[0] // 2, img.shape[1] // 2)
    rotation_matrix = cv2.getRotationMatrix2D(center, degrees, scale)
    rotated_img = cv2.warpAffine(img, rotation_matrix, (img.shape[0], img.shape[1]))
    cv2.imwrite('rotated_img.jpg', rotated_img)
    return rotated_img

In [54]:
rotate('dog.jpeg', degrees=-45, scale=1.0)

array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       ...,

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]], dtype=uint8)