# Frames Interpolation

In this notebook, we'll create three slow-motion versions of a given video using different interpolation methods. Then, we'll discuss optical flow and the pinhole camera model.

First, we'll implement the slow-motion methods:

1. Repetition interpolation:
   $ f_{\text{new}} = f_{\text{previous}} $

2. Linear interpolation:
   $ f_{\text{new}} = t \cdot f_{\text{previous}} + (1 - t) \cdot f_{\text{later}} $

3. Optical flow interpolation:
   $ f_{\text{new}} = t \cdot f_{\text{optical\_flow}} + (1 - t) \cdot f_{\text{later}} $

In [56]:
import cv2
import numpy as np

video_name = 'BusterKeaton'
fator = 8  # number of frames inserted between each pair of consecutive frames of the original video

# Read the video
cap = cv2.VideoCapture(f'video/{video_name}.mp4')
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# repetition interpolation
outrep_width = width
outrep_height = height

# linear interpolation
outlin_width = width
outlin_height = height

# optical flow interpolation
outopt_width = width
outopt_height = height

# video with the 3 methods combined side by side
outcomb_width = 3*width 
outcomb_height = height

# Create VideoWriter object to save the videos
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
outrep_path = 'video/out_rep.mp4'
outlin_path = 'video/out_lin.mp4'
outopt_path = 'video/out_opt.mp4'
outcomb_path = f'video/out_comb_{video_name}.mp4'

outrep = cv2.VideoWriter(outrep_path, fourcc, fps, (outrep_width, outrep_height))
outlin = cv2.VideoWriter(outlin_path, fourcc, fps, (outlin_width, outlin_height))
outopt = cv2.VideoWriter(outopt_path, fourcc, fps, (outopt_width, outopt_height))
outcomb = cv2.VideoWriter(outcomb_path, fourcc, fps, (outcomb_width, outcomb_height))

In [57]:
# Auxiliary function to combine the frames of the videos into a single frame
def combine_frames(frames):
    # Define the dimensions of the new frame
    height = frames[0][0].shape[0]
    width = frames[0][0].shape[1]
    channels = frames[0][0].shape[2]
    combined_frame = np.zeros((height, width * len(frames), channels), dtype=np.uint8)

    # Combine the frames of the videos
    for i, frame in enumerate(frames):
        combined_frame[:, i * width : (i + 1) * width, :] = frame[0]

    return combined_frame

# x and y coordinate map (original video) - for the optical flow method
coord_x, coord_y = np.meshgrid(np.arange(width), np.arange(height))
cont_frames = 0
total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
bloco = int(total_frames / 10)

# rewind the original video to the beginning
cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
ret, prev_frame = cap.read()

while cap.isOpened():

    cont_frames += 1

    # print the processing progress
    if cont_frames % bloco == 0:
        print('Processing: ', int(cont_frames / bloco) * 10, '%')
    ret, frame = cap.read()

    if not ret:
        break

    # sequence starts with the previous frame (prev_frame)
    frame_repeat = cv2.resize(prev_frame, (outrep_width, outrep_height))
    frame_linear = cv2.resize(prev_frame, (outlin_width, outlin_height))
    frame_optflow = cv2.resize(prev_frame, (outopt_width, outopt_height))

    combined_frame = combine_frames([[frame_repeat], [frame_linear], [frame_optflow]])
    combined_frame = cv2.resize(combined_frame, (outcomb_width, outcomb_height))

    # write each frame to the corresponding output video
    outrep.write(frame_repeat)
    outlin.write(frame_linear)
    outopt.write(frame_optflow)
    outcomb.write(combined_frame)
    
    # Perform optical flow
    prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        
    flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)

    # insert intermediate frames
    for i in range(1, fator):

        optical_flow_weight = (fator - i) / fator
        next_frame_weight = i / fator

        # Repetition interpolation
        frame_repeat = prev_frame

        # Linear interpolation
        frame_linear = cv2.addWeighted(prev_frame, optical_flow_weight, frame, next_frame_weight, 0)

        # Optical flow interpolation
        map_x = coord_x + (flow[..., 0] * (i / fator))
        map_y = coord_y + (flow[..., 1] * (i / fator))

        frame_optflow = cv2.remap(prev_frame, map_x.astype(np.float32), map_y.astype(np.float32), interpolation=cv2.INTER_LINEAR)

        # finally, perform linear interpolation between the next frame and the frame transformed with optical flow
        frame_optflow = cv2.addWeighted(frame_optflow, optical_flow_weight, frame, next_frame_weight, 0)

        # Combine the frames of the videos (repetition, linear, and optical flow)
        combined_frame = combine_frames([[frame_repeat], [frame_linear], [frame_optflow]])

        combined_frame = cv2.resize(combined_frame, (outcomb_width, outcomb_height))
        
        # write each frame to the corresponding output video
        outrep.write(frame_repeat)
        outlin.write(frame_linear)
        outopt.write(frame_optflow)
        outcomb.write(combined_frame)

    prev_frame = frame

cap.release()
outrep.release()
outlin.release()
outopt.release()
outcomb.release()

Processing:  10 %
Processing:  20 %
Processing:  30 %
Processing:  40 %
Processing:  50 %
Processing:  60 %
Processing:  70 %
Processing:  80 %
Processing:  90 %
Processing:  100 %


## Notes:

1. **Repetition Interpolation:**
   This method simply repeats the previous frame to fill in the intermediate frames, resulting in a playback that appears "stuttered" since the intermediate frames do not contain new motion information. Thus, this method is very simple and causes some visual discomfort.

2. **Linear Interpolation:**
   Linear interpolation calculates the intermediate frames as a weighted combination of the previous and subsequent frames, producing smooth transitions between frames, resulting in smoother playback than repetition interpolation. However, if there are fast or complex movements, we notice a blur/lag in the image since the movement is not uniform between the frames in reality. For example, in the Buster Keaton video, we can see the hand in two positions at the same time.

3. **Optical Flow Interpolation:**
   Optical flow interpolation uses optical flow to estimate the movement of pixels between two consecutive frames and then adjusts the pixel coordinates to create more precise intermediate frames. In tests, the result was very similar to linear interpolation, presenting the same sensation of blur or lag in the image at times, especially when the movement was more complex. Additionally, it showed failures not seen in the other methods, such as the appearance of black spots in the corners of the image.

## Optical Flow and Pinhole

Suppose the following scenario: the pinhole camera captures a video while moving horizontally at a constant speed $ v_{\text{cam}} $. A fixed object (a tree, for example) is at a distance $ d $ from the image plane (the plane containing the sensors). The camera generates a video at 60 frames per second, and the distance from the camera's aperture (pinhole) to the image plane (focal distance) is $ f $.

For a pinhole camera, the relationship between the position of a point in the real world and its projection in the image is given by the pinhole formula:

$ x = \frac{X \cdot f}{Z} $

where:
- $ x $ is the horizontal coordinate of the point projected in the image,
- $ X $ is the horizontal coordinate of the point in the real world,
- $ Z $ is the distance of the point to the image plane (in this case, $ d $),
- $ f $ is the focal distance.

When the camera moves a small distance $ \Delta x $ in the horizontal direction with a speed $ v_{\text{cam}} $ between two consecutive frames (in 1/60 seconds), this translation causes an apparent displacement of the points in the image.

The displacement $ \Delta x $ in the position of the camera causes a displacement $ \Delta x_{\text{img}} $ in the image that can be approximated by:

$ \Delta x_{\text{img}} = \frac{v_{\text{cam}} \cdot \Delta t \cdot f}{d} $

where:
- $ \Delta t $ is the time between two consecutive frames, which is $ \frac{1}{60} $ seconds for a 60 fps video.

Substituting $ \Delta t $:

$ \Delta x_{\text{img}} = \frac{v_{\text{cam}} \cdot \frac{1}{60} \cdot f}{d} $

The optical flow is this apparent displacement in the image:

$ \text{Optical Flow} = \Delta x_{\text{img}} = \frac{v_{\text{cam}} \cdot f}{60 \cdot d} $

To calculate the distance $ d $ from the optical flow, we can rearrange the optical flow equation:

$ d = \frac{v_{\text{cam}} \cdot f}{60 \cdot \text{Optical Flow}} $


When $ d $ tends to infinity, the fraction $ \frac{1}{d} $ tends to zero. Therefore, we can analyze the behavior of the optical flow under this condition:

$ \text{Optical Flow} \approx \frac{v_{\text{cam}} \cdot f}{60 \cdot \infty} = 0 $

In other words, when $ d $ is extremely large (practically infinite), the optical flow tends to zero. This means that for very distant objects, such as the Moon or the stars, the apparent movement in the image captured by the camera due to the horizontal displacement of the camera is imperceptible. These objects appear fixed in the camera's field of view. When we observe the Moon or the stars, even when moving laterally, these objects appear fixed in the sky. This is exactly what the optical flow analysis suggests – the apparent displacement of the points (optical flow) becomes practically zero. In other words, this result is consistent with our expectations.

