In [None]:
%matplotlib inline
import cv2
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
from scipy.ndimage import gaussian_filter
from tqdm import tqdm

# Optical Flow

Throughout the class, we've been looking at many ways to analyze and extract spatial information from images. Now we'll briefly talk about how we can extract the temporal information in a sequence of images.

One way of getting temporal information is through optical flow, which is the apparent motion of brightness patterns in the image. Ideally, optical flow would be the same as the motion field, which is a representation of real world 3D motion. So by finding the optical flow field, we can hopefully observe real world motion behavior of objects that are present in the images.

In [None]:
vid_path = 'data/taxi.gif'

from IPython.display import Image
Image(open(vid_path, 'rb').read())

We will use `cv2.VideoCapture()` to load the data in the gif file, retrive some properties of the sequence, and also convert it to a sequence of grayscale images.

(2 points) Complete the cell below. You can refer to OpenCV's docs about the [`get()`](https://docs.opencv.org/4.6.0/d8/dfe/classcv_1_1VideoCapture.html#aa6480e6972ef4c00d74814ec841a2939) and [`read()`](https://docs.opencv.org/4.6.0/d8/dfe/classcv_1_1VideoCapture.html#a473055e77dd7faa4d26d686226b292c1) methods of the `VideoCapture` obeject.

In [None]:
gif = cv2.VideoCapture(vid_path)

num_frames = 
height = 
width = 

frames = []
for i in range(num_frames):
    # TODO
    frames.append(frame_gray)

plt.imshow(frames[0], cmap='gray')
plt.colorbar()
plt.show()

In the lecture, we saw that to calculate the optical flow, we assumed brightness contancy, small motion, and spatial coherence. This allows us to use gradients within a frame and between frames to determine motion of pixels from one frame to another.

(3 points) Fill out the function below that takes two frames, computes the $\sum I_xI_x$, $\sum I_xI_y$, and $\sum I_yI_y$ terms for the second frame, and the $\sum I_xI_t$ and $\sum I_yI_t$ terms between the two frames.

For $I_x$ and $I_y$, use `np.gradient()` on the second frame, and $I_t$ should be the pixel-wise difference between the two frames. $I_xI_x$, $I_yI_y$, $I_xI_y$, $I_xI_t$, and $I_yI_t$ are computed with pixel wise multiplication as usual. To compute the sums, at every pixel sum up elements in a window weighted by Gaussian weights. To be specific, use `scipy.ndimage.gaussian_filter()` with `sigma=3`.

In [None]:
def get_gradients(im1, im2):
    """Computes the gradients needed to find optical flow between im2 and im1.
      
      Args:
      - im1: First frame of shape (height, width).
      - im2: Second frame of shape (height, width).
      Returns:
      - i_xx_sum: Array of shape (height, width) where each element represents
          the sum of  I_x I_x values in a Gaussian window around the pixel.
      - i_xy_sum: ... I_x I_y ...
      - i_yy_sum: ... I_y I_y ...
      - i_xt_sum: ... I_x I_t ...
      - i_yt_sum: ... I_y I_t ...
    """
    # TODO
    
    return i_xx_sum, i_xy_sum, i_yy_sum, i_xt_sum, i_yt_sum

(1 point) Why do we sum the gradient terms in a window around each pixel? What assumption are we making here that allows us to do this?

In [None]:
# Here we assume spatial coherence.

(4 points) Using those gradients, we can create a least squares equation that is similar to the Harris Corner Detector. In the function below, fill out the missing lines to finish the function to find optical flow.

Note that when solving for $[u, v]^T$, we don't use `np.linalg.lstsqr()` or other functions from an external library - we can solve for it directly, as from the slides we see $d=(A^TA)^{-1}(-A^Tb)$, and we check beforehand whether $A^TA$ is invertible.

In [None]:
def optical_flow(im1, im2, stride):
    """Computes the optical flow between frames im2 and im1.
    Args:
        im1: First frame of size [height, width].
        im2: Second frame of size [height, width].
        stride: Determines how dense you want the optical flow field to be.
    Returns:
        u: x value of optical flow vector of every pixel.
        v: y value of optical flow vector of every pixel.
    """

    # Calls our previous function to get the image gradients.
    height, width = im1.shape
    i_xx_sum, i_xy_sum, i_yy_sum, i_xt_sum, i_yt_sum = \
        get_gradients(im1, im2)

    # Create containers for storing the u, v components of optical flow
    # vector for each pixel.
    u = np.zeros(im1.shape)
    v = np.zeros(im2.shape)

    # Loop through every stride pixels
    for i in range(0, height, stride):
        for j in range(0, width, stride):
            # Find the A^T @ A matrix for this pixel (see slides).
            # TODO
            
            # Find the A^T @ b matrix for this pixel (see slides).
            # TODO

            # Compute the determinant of the A^T @ A matrix, and if it zero,
            # we move on to the next pixel.
            # TODO
            if det == 0:
                continue

            # Directly solve for the u, v terms.
            # TODO

    return u, v

We can now test it on the image sequence we extracted from the gif file. If it's taking too long, you can change the stride to be 2 instead of 1. (For me, it takes around 31 seconds with `stride=1`.)

In [None]:
us = []
vs = []

for i in tqdm(range(1, num_frames)):
    u, v = optical_flow(frames[i-1], frames[i], 1)
    us.append(u)
    vs.append(v)

Let's visualize the optical flow field gradient. The cell below computes the magnitude of optical flow at every pixel for each frame and saves the result as a gif file.

In [None]:
ms = np.sqrt(np.array(us)**2 + np.array(vs)**2)
ms /= np.max(ms)
ms = (ms * 255).astype(np.uint8)

import imageio
import os
out_path = 'output/flow_magnitude.gif'
os.makedirs('output', exist_ok=True)
imageio.mimsave(out_path, ms)
Image(open(out_path, 'rb').read())

We could also visualize the optical vector field directly, using `plt.quiver().` For clarity we will only display the first frame.

In [None]:
i = 1

x = np.arange(0, width, 1)
y = np.arange(0, height, 1)
x, y = np.meshgrid(x, y)
step = 10
plt.figure(figsize=(12, 8))
plt.quiver(
    x[::step, ::step], y[::step, ::step],
    us[i][::step, ::step], vs[i][::step, ::step],
    color='r', pivot='middle', headwidth=2, headlength=3)
plt.gca().invert_yaxis()
plt.show()