Helin Aslı Aksoy
150200705

Running time is about 15-20 minute.


# Homework 4

Please read the instructions before starting.

- Only use array manipulation functions from ```numpy```. (Similar to last homework)
- You can use ```PIL``` for reading images and ```ipywidgets``` and ```display``` to display them.
- Use ```numpy``` operations and arrays as much as possible for performance criteria. Try to avoid using for-loops as they will drastically slow down your implementations for large-scale images. Slow implementations will have a penalty during grading.
- You can overwrite the template as long as the above conditions are not violated and the functionality is kept the same. Keep in mind that you will **only** submit the ```hw4.ipynb``` notebook and ```previous_homework.py``` file

 Fill the the marked areas in the cells for each question. 

- - -


 
## Question 1 [80pt]

In this question, you will implement Lucas Kanade optical flow algorithm.

- We begin with calculating $I_x$, $I_y$, and $I_t$.
- For each window, formulate Least-Squares according to the equality $I_x u + I_y v + I_t = 0$
- Ignore ill-conditioned pixels that have the small minimum eigen value for the covariance matrix $A^TA$ where $A$ denotes the matrix $[I_x, I_y]$ for the points in a window $\mathcal{W}_p$ at pixel $p$.
- Solve the Least-Squares equations for $u$ and $v$.

In [14]:
import cv2
import numpy as np


def video_to_numpy(path: str) -> np.ndarray:
    """ Convert the video frames into a numpy array

    Args:
        path (str): path of the video

    Returns:
        np.ndarray: 3D numpy array of shape (T, H, W)
    """
    cap = cv2.VideoCapture(path)

    frameWidth = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frameHeight = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frames = []

    ret, frame = cap.read()
    while ret:
        grayFrame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        img_arr = grayFrame[::3, ::3]
        frames.append(img_arr)
        ret, frame = cap.read()

    return np.stack(frames).astype(np.uint8)


In [15]:
from typing import Tuple
import numpy as np

from previous_homework import sobel_horizontal, sobel_vertical, gaussian_filter

# Determine a value
WINDOW_SIZE = 15
THRESHOLD =  0.001

image_sequence = video_to_numpy("video.mp4")
u_sequence = np.zeros(image_sequence.shape, dtype=np.float32)
v_sequence = np.zeros(image_sequence.shape, dtype=np.float32)


def derivatives(img: np.ndarray, next_img: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """ Calculate derivative images.

    Args:
        img (np.ndarray): 2D gray video frame of shape (H, W)
        next_img (np.ndarray): 2D next gray video frame of shape (H, W)

    Returns:
        Tuple[np.ndarray, np.ndarray, np.ndarray]:
            - x derivative I_x of shape (H, W)
            - y derivative I_y of shape (H, W)
            - temporal derivative I_t of shape (H, W)
    """
    Ix = sobel_horizontal(img)
    Iy = sobel_vertical(img)
    It = next_img - img

    return Ix, Iy, It
    raise NotImplementedError


def lucas_kanade(x_derivative: np.ndarray,
                 y_derivative: np.ndarray,
                 time_derivative: np.ndarray,
                 window_size: int,
                 threshold: float
                 ) -> np.ndarray:
    """ Lucas Kanade optical flow for single frame transition

    Args:
        x_derivative (np.ndarray): x derivative I_x of shape (H, W)
        y_derivative (np.ndarray): y derivative I_y of shape (H, W)
        time_derivative (np.ndarray): temporal derivative I_t of shape (H, W)
        window_size (int): Window size of W_p (square windows)
        threshold (float): Eigen value threshold of the covariance matrix A^T A 

    Returns:
        np.ndarray: flow matrix of shape (H, W, 2) containing x and y flows.
    """
    window_size = int(window_size / 2)
    height, width = x_derivative.shape
    flow = np.zeros((height, width, 2))
    for i in range(height):
        for j in range(width):
            top = max(0, i - window_size)
            bottom = min(height, i + window_size)
            left = max(0, j - window_size)
            right = min(width, j + window_size)
            Ix = x_derivative[top:bottom, left:right]
            Iy = y_derivative[top:bottom, left:right]
            It = time_derivative[top:bottom, left:right]

            Ix = Ix.reshape(-1, 1)
            Iy = Iy.reshape(-1, 1)
            b = -It.reshape(-1, 1)

            A = np.hstack((Ix, Iy))
            ATA = A.T.dot(A)
            if np.linalg.det(ATA) < threshold:
                flow[i, j] = [0, 0]
            else:
                flow[i, j] = np.linalg.inv(ATA).dot(A.T).dot(b).reshape(2)

    return flow
    raise NotImplementedError


for index in range(len(image_sequence) - 1):

    x_derivative, y_derivative, time_derivative = derivatives(
        image_sequence[index], image_sequence[index + 1])

    uv_values = lucas_kanade(
        x_derivative, y_derivative, time_derivative,
        window_size=WINDOW_SIZE, threshold=THRESHOLD)

    u_sequence[index] = uv_values[:, :, 0]
    v_sequence[index] = uv_values[:, :, 1]


### Run the cell below to visualize your implementation

In [16]:
from visualizers.flow import FlowRenderer

FlowRenderer(image_sequence,
             u_sequence,
             v_sequence)()


VBox(children=(HBox(children=(Play(value=0, description='Play', max=318), IntSlider(value=0, max=318))), VBox(…

# Question 2 [20pt]

Write your answers under the questions in the cells below

- Why can we not reliably compute flow for windows $\mathcal{W}_p$ with small eigen values?

Computing flow for windows with small eigenvalues can be unreliable because small eigenvalues correspond to directions in which the intensity of the image is changing slowly. When the intensity of the image is changing slowly, it can be difficult to accurately track the motion of features in the image over time. This can lead to errors in the computed flow, which can make it unreliable.

Another reason that computing flow for windows with small eigenvalues can be unreliable is that these windows may not contain enough texture or distinctive features to accurately track. When there are not enough distinctive features in the window, it can be difficult to accurately determine the motion of the features that are present, which can lead to errors in the computed flow.

Overall, it is generally more reliable to compute flow for windows with larger eigenvalues, as these windows tend to contain more texture and distinctive features, and the intensity of the image is changing more rapidly in these directions. This makes it easier to accurately track the motion of features over time and compute more reliable flow estimates.

- Explain the aperture problem.

The aperture problem refers to the difficulty in determining the depth of an image or scene based on a single 2D image. In order to determine the depth of an image or scene, you need to have information about the distances between different objects in the scene. This can be difficult to obtain from a single 2D image, especially if the objects in the scene have similar sizes or are at similar distances from the camera.

One way to overcome the aperture problem is to use multiple cameras or viewpoints to capture images of the scene from different angles. By comparing the images from different viewpoints, it is possible to determine the depth of objects in the scene. Another approach is to use specialized sensors or techniques, such as time-of-flight sensors or structured light systems, to directly measure the depth of objects in the scene. These approaches can help to overcome the aperture problem and enable more accurate depth estimation in computer vision and image processing applications.

- Are image corners well-conditioned points for optical flow? Show your answer.

Optical flow is a technique used to estimate the motion of objects in an image or video sequence. To accurately estimate the motion of an object, it is helpful to have points in the image that are easy to track over time and are not affected by noise or other factors that could interfere with the tracking. These points are known as well-conditioned points.

Image corners can sometimes be well-conditioned points for optical flow estimation because they tend to have a high degree of local contrast and distinctive features, which makes it easier to track their motion over time. Image corners also tend to have a relatively small neighborhood, which means that there is less surrounding image content that can interfere with the tracking of the corner.

However, image corners are not always well-conditioned points for optical flow estimation. For example, if an image corner is occluded or is too close to the edge of the image, it may be difficult to accurately track its motion. Image corners can also be affected by noise or other image artifacts, which can make it difficult to accurately estimate the flow at these locations.

In summary, whether or not image corners are well-conditioned points for optical flow estimation depends on the specific characteristics of the image and the corner itself, as well as the accuracy and reliability of the optical flow algorithm being used.