# Dense Optical Flow with Gunnar Farnebäck's Algorithm

Dense optical flow identifies the movement of each pixel across a series of images. In
this notebook, I'll build Farnebäck's algorithm from the ground up.

**RULES:** As usual, **`OpenCV`** is banned in this repository.

**References:**

-   [1]
    [Polynomial Expansion for Orientation and Motion Estimation](https://www.ida.liu.se/ext/WITAS-ev/Computer_Vision_Technologies/PaperInfo/farneback02.html)

-   [2]
    [Optical Flow - Michael Black - MLSS 2013 Tübingen](https://www.youtube.com/watch?v=tIwpDuqJqcE)

-   [3]
    [ericPrince's Pure python implementation of Gunnar Farneback's optical flow algorithm](https://github.com/ericPrince/optical-flow)

**Important Note:** To fully grasp Farnebäck's Algorithm, I highly recommend reading
[1]. This notebook provides a foundational overview, sufficient for basic learning but
not an exhaustive understanding of the algorithm.


In [None]:
import numpy as np
from scipy.ndimage import correlate1d
from functools import partial
import skimage
import my_utils

# np.set_printoptions(threshold=np.inf)

## Motivation

In the world of computer vision, tracking the displacement of pixels across consecutive
frames is usually based on two assumptions:

1. **Brightness Constancy:** It states that the brightness of a pixel remains constant
   between consecutive images, despite its movement to a new position. This idea is
   captured by the equation:

    $$
    I(x + u, y + v, t + 1) = I(x, y, t)
    $$

    Here, $u$ and $v$ represent the horizontal and vertical shifts in the pixel's
    position, illustrating that the pixel's intensity doesn't change as it moves.

2. **Spatial Smoothness:** It is based on the idea that neighboring pixels usually move
   in a similar fashion because they are likely part of the same surface. Surfaces tend
   to be smooth, implying that adjacent pixels will have comparable motion. This concept
   is summarized as follows:

    $$
    u_p = u_n \quad \text{and} \quad v_p = v_n, \quad \forall n \in G(p)
    $$

    It means that the movement of a pixel $p$ in both horizontal ($u$) and vertical
    ($v$) directions is similar to that of its neighboring pixel $n$, suggesting that
    optical flow changes smoothly across the image.

**Objective Function:**

Derived from these assumptions, we can define the objective functions as follow:

-   **Brightness Constancy:**

    $$
    E_D(u, v) = \sum (I(x + u, y + v, t + 1) - I(x, y, t))^2
    $$

    This equation emphasizes that deviations from brightness constancy are minimized,
    assuming the presence of Gaussian noise.

-   **Spatial Smoothness:**

    $$
    E_s(u, v) = \sum(u_p - u_n)^2 + \sum(v_p - v_n)^2, \quad \forall n \in G(p)
    $$

    This formula encourages smoothness by penalizing variations in the motion between a
    pixel and its neighbors.

**Solving the Equations**

The principle of brightness constancy implies that a pixel's intensity does not change
over time as it moves. This is mathematically represented as:

$$
I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)
$$

For slight movements ($\Delta x$, $\Delta y$, $\Delta t$), we can use the first-order
Taylor series expansion for image intensity $I$:

$$
I(x + \Delta x, y + \Delta y, t + \Delta t) \approx I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t
$$

By applying the brightness constancy condition and simplifying, we eliminate the term
$I(x, y, t)$ on both sides:

$$
I_x \Delta x + I_y \Delta y + I_t \Delta t = 0
$$

Dividing through by $\Delta t$ (assuming it is not zero) and using
$\Delta x / \Delta t = u$ and $\Delta y / \Delta t = v$ for velocity components, we
arrive at the optical flow constraint equation:

$$
I_x u + I_y v + I_t = 0
$$


## Polynomial Expansion for Orientation and Motion Estimation

Polynomial expansion fits polynomials to local pixel neighborhoods to approximate image
motion and structure as detailed in [1]. It uses quadratic, linear, and constant terms
(A, B, C) to describe local image structure and motion:

-   **Quadratic Term (A):** Represents the curvature of the image intensity surface,
    indicating the geometric structure like edges or flat areas. High values signal
    significant curvature or rapid intensity changes.

-   **Linear Term (B):** Represents the gradient at each pixel, showing the direction
    and magnitude of the most substantial intensity change, crucial for identifying
    motion direction and feature orientation.

-   **Constant Term (C):** Represents the average intensity in a pixel's neighborhood,
    indicating the area's overall brightness or darkness without detailing structure or
    motion.


### Signal, Certainty, and Applicability

The signal $f$ and its local approximation $\hat{f}$, a finite-dimensional vector in
$C^n$, reflect the signal within a specific neighborhood, represented as an $n \times 1$
column vector.

**Certainty** is the confidence in signal values, indicated by non-negative real
numbers. The field of certainty is $c$, with $\hat{c}$ as the $n \times 1$ vector for
local certainty levels.

**Applicability** defines the relevance of basis functions within the neighborhood,
expressed as non-negative values in an $n \times 1$ vector, $a$. Unlike certainty,
applicability focuses on the significance of each point, with non-zero values indicating
relevance. While intuitively applicability might range between $[0, 1]$, no such
restriction is necessary, as the scale of these values doesn't impact their relevance.

For generating applicability and certainty specifics, see page 43 and Section 3.10 of
[1]. The functions `generate_applicability` and `generate_certainty` detail their
implementation.


In [None]:
def generate_applicability(sigma):
    """Calculate 1D Gaussian applicability kernel."""
    n = int(4 * sigma + 1)  # Capture significant parts of the Gaussian distribution
    x = np.arange(-n, n + 1)
    applicability = np.exp(-(x**2) / (2 * sigma**2))
    return x, applicability


def generate_certainty(height, width, denominator=5):
    """should it be gaussion or linear"""
    kernel = np.minimum(
        1,
        1 / denominator * np.minimum(np.arange(height)[:, None], np.arange(width)),
    )
    kernel = np.minimum(
        kernel,
        1
        / denominator
        * np.minimum(
            height - 1 - np.arange(height)[:, None],
            width - 1 - np.arange(width),
        ),
    )
    return kernel

### Separable Correlation

Separable correlation is a technique that enhances computational efficiency in signal
processing and image analysis by simplifying correlation operations, especially with
large filters. It involves decomposing a two-dimensional kernel into two orthogonal
one-dimensional kernels, allowing two-dimensional correlation to be performed as two
separate one-dimensional processes: first across rows, then down columns, or vice versa.

Mathematically, if a two-dimensional kernel $H$ can be expressed as the outer product of
two one-dimensional vectors $u$ and $v$:

$$
H = u \otimes v
$$

then the kernel is separable. For an input signal or image $I$, the two-dimensional
correlation with a separable kernel $H$ occurs in two phases:

1. **Horizontal Pass:** Apply one-dimensional kernel $u$ across each row of $I$.
2. **Vertical Pass:** Apply one-dimensional kernel $v$ down each column of the
   horizontal pass result.


### Defining `poly_exp`

To construct the polynomial expansion function `poly_exp`, we follow the guidelines of
Chapter 4 in [1]. The process is divided into four main parts:

-   **Initialization and Applicability Generation:** Detailed on page 43, this step
    involves setting up the initial conditions and generating the applicability matrix
    based on $\sigma$.

-   **Calculation of Polynomial Coefficients `b`:** As described in Section 4.3, this
    involves calculating the polynomial expansion coefficients for horizontal
    (x-direction) and vertical (y-direction) correlations separately. This step also
    includes multiplying the certainty map with the image signal to prioritize signal
    values based on their certainty.

-   **Cross-Correlation and Polynomial Parameters Calculation:** Utilizes matrices `G`
    and `v` to solve for `r` using Eq. 4.9, where $r = G^{-1}v$. Here, `r` represents
    the parameters of the second-order polynomial modeling the signal `f`.

-   **Final Polynomial Terms Extraction:** Based on equations 4.3 and 4.4, the quadratic
    (`A`), linear (`B`), and constant (`C`) terms of the polynomial are derived,
    respectively.

Finally, the dimensions of the coefficients and residual error calculation are
summarized as follows:

-   Coefficient `b`: (n, n, 6)
-   Parameters `r`: (f, f, 6)
-   Signal `f`: (f, f)
-   Residual error $e$ is calculated as $\|b \cdot r - f\|_W$, as shown in equation 3.19.


In [None]:
def poly_exp(f, certainty, sigma):
    # Initialization and Applicability Generation
    height, width = f.shape
    x, applicability = generate_applicability(sigma)

    # Calculation of Polynomial Coefficients `b`
    bx = np.stack(
        [
            np.ones(applicability.shape),
            x,
            np.ones(applicability.shape),
            x**2,
            np.ones(applicability.shape),
            x,
        ],
        axis=-1,
    )  # (n, 6)
    by = np.stack(
        [
            np.ones(applicability.shape),
            np.ones(applicability.shape),
            x,
            np.ones(applicability.shape),
            x**2,
            x,
        ],
        axis=-1,
    )  # (n, 6)

    # Pre-calculate product of certainty and signal
    cf = certainty * f

    # Cross-Correlation and Polynomial Parameters Calculation
    G = np.empty(list(f.shape) + [bx.shape[-1]] * 2)  # (height, width, 6, 6)
    v = np.empty(list(f.shape) + [bx.shape[-1]])  # (height, width, 6)

    # Apply separable cross-correlations
    # Pre-calculate quantities.
    ab = np.einsum("i,ij->ij", applicability, bx)
    abb = np.einsum("ij,ik->ijk", ab, bx)
    # Calculate G and v for each pixel with cross-correlation
    for i in range(bx.shape[-1]):
        for j in range(bx.shape[-1]):
            G[..., i, j] = correlate1d(
                certainty, abb[..., i, j], axis=0, mode="constant", cval=0
            )

        v[..., i] = correlate1d(cf, ab[..., i], axis=0, mode="constant", cval=0)

    # Pre-calculate quantities.
    ab = np.einsum("i,ij->ij", applicability, by)
    abb = np.einsum("ij,ik->ijk", ab, by)
    # Calculate G and v for each pixel with cross-correlation
    for i in range(bx.shape[-1]):
        for j in range(bx.shape[-1]):
            G[..., i, j] = correlate1d(
                G[..., i, j], abb[..., i, j], axis=1, mode="constant", cval=0
            )

        v[..., i] = correlate1d(v[..., i], ab[..., i], axis=1, mode="constant", cval=0)

    # Solve r for each pixel
    r = np.linalg.solve(G, v)

    # Final Polynomial Terms Extraction
    # Quadratic term
    A = np.empty(list(f.shape) + [2, 2])
    A[..., 0, 0] = r[..., 3]
    A[..., 0, 1] = r[..., 5] / 2
    A[..., 1, 0] = A[..., 0, 1]
    A[..., 1, 1] = r[..., 4]

    # Linear term
    B = np.empty(list(f.shape) + [2])
    B[..., 0] = r[..., 1]
    B[..., 1] = r[..., 2]

    # constant term
    C = r[..., 0]

    return A, B, C

After defining the polynomial expansion function `poly_exp`, we can visualize the terms
`A`, `B`, and `C` as follows. The quadratic term `A` captures the even part of the
signal, while the linear term `B` encapsulates the odd part. Meanwhile, `C` reflects
variations in the local differential convolution level.


In [None]:
import cv2

img = cv2.imread("input/yosemite/yos02.tif")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print("img:", img.shape, img.dtype, img.min(), img.max())
# TODO: Use float64 should also work. Input image should be floating point.
# gray = gray.astype(np.float64)
# gray /= 255
print(f"gray: {gray.shape}, {gray.dtype}, {gray.min()}, {gray.max()}")

height, width, *_ = img.shape
certainty = generate_certainty(height, width, denominator=5)
print(f"certainty: {certainty.shape}, {certainty.dtype}, {certainty.min()}, {certainty.max()}")

A, B, C = poly_exp(f=gray, certainty=certainty, sigma=4)
print("A:", A.shape)
print("B:", B.shape)
print("C:", C.shape)

my_utils.visualize_polynomial_expansion(
    img, A, B, C, out_path="./output/polynomial_expansion.png"
)

## Displacement Estimation

Polynomial expansion allows for approximating the neighborhood of a pixel. When considering an ideal translation, we analyze the behavior of a quadratic polynomial signal:

$$
f_1(x) = x^T A_1 x + b_1^T x + c_1
$$

and construct a new signal $f_2$ displaced globally by $d$, as shown in equation 7.2 in [1]:

$$
f_2(x) = f_1(x - d) = (x^T A_2 x) + (b_2^T x) + c_2.
$$

By equating coefficients in the quadratic polynomials, we find:

- $A_2 = A_1$,
- $b_2 = b_1 - 2A_1d$,
- $c_2 = d^T A_1 d - b_1^T d + c_1$.

For non-singular $A_1$, the translation $d$ can be calculated as:

$$
d = -\frac{1}{2} A_1^{-1} (b_2 - b_1).
$$

This principle, as [1] states, is applicable regardless of signal dimensionality.

Furthermore, [1] suggests utilizing local polynomial approximations for each pixel across images, implying the use of $A_1(x)$, $b_1(x)$, and $c_1(x)$ for the first image, and $A_2(x)$, $b_2(x)$, and $c_2(x)$ for the second. Ideally, this would result in $A_1 = A_2$, but practically, an approximation is used:

$$
A(x) = \frac{A_1(x) + A_2(x)}{2}.
$$

Additionally, [1] defines

$$
\Delta b(x) = -\frac{1}{2}(b_2(x) - b_1(x)),
$$

leading to the primary constraint for displacement estimation:

$$
A(x)d(x) = \Delta b(x),
$$

where $d(x)$ represents a spatially varying displacement field, moving from a global to a local displacement context.


### Calculating Displacement over a Neighborhood

Section 7.3 in [1] indicates that solving the equation

$$
A(x)d(x) = \Delta b(x),
$$

pointwise does not yield satisfactory results. To improve accuracy, [1] recommends
optimizing the solution across a neighborhood $I$ around $x$. This is achieved by
minimizing the weighted sum:

$$
\sum_{\Delta x \in I} w(\Delta x)\|A(x + \Delta x)d(x) - \Delta b(x + \Delta x)\|^2,
$$

where $w(\Delta x)$ serves as a weight function, enhancing the contribution of more
relevant points within the neighborhood. The optimal displacement $d(x)$ is found with:

$$
d(x) = \left(\sum wA^T A\right)^{-1} \sum wA^T \Delta b,
$$

after simplifying the notation for clarity. The corresponding minimum error $e(x)$,
indicating the solution's accuracy, is expressed as:

$$
e(x) = \sum w\Delta b^T \Delta b - d(x)^T \left(\sum wA^T \Delta b\right).
$$

This formulation computes $A^T A$, $A^T \Delta b$, and $\Delta b^T \Delta b$ pointwise,
then averages these using the weight $w$ before calculating the displacement. The error
$e(x)$ inversely relates to confidence: smaller values indicate greater reliability. The
solution is both existent and unique unless the neighborhood is uniformly affected by
the aperture problem.


### Estimating a Parameterized Displacement Field

Following the approach in section 6.3, we can enhance robustness by parameterizing the
displacement field according to a specific motion model. For a 2D context, we utilize
the eight-parameter model described by equation (6.6):

$$
d_x(x, y) = a_1 + a_2 x + a_3 y + a_7 x^2 + a_8 xy,
$$

$$
d_y(x, y) = a_4 + a_5 x + a_6 y + a_7 xy + a_8 y^2.
$$

Expressing this model in matrix form, akin to equations (6.28) and (6.29) but without an
extra temporal dimension, we get:

$$
d = Sp,
$$

where $S$ is the matrix containing the spatial variables,

$$
S = \left[ \begin{array}{cc}
1 & x & y & 0 & 0 & 0 & x^2 & xy \\
0 & 0 & 0 & 1 & x & y & xy & y^2 \\
\end{array} \right],
$$

and $p$ is the parameter vector,

$$
p = \left[ \begin{array}{c}
a_1 \\
a_2 \\
a_3 \\
a_4 \\
a_5 \\
a_6 \\
a_7 \\
a_8 \\
\end{array} \right].
$$

For the weighted least squares problem, we solve

$$
\sum w_i \|A_i S_i p - \Delta b_i\|^2,
$$

where $i$ indexes coordinates in a neighborhood. The solution,

$$
p = \left( \sum w_i S_i^T A_i^T A_i S_i \right)^{-1} \sum w_i S_i^T A_i^T \Delta b_i,
$$

illustrates that any linearly parameterizable motion model is applicable. As seen in
section 6.3, calculations for $S^T A^T AS$ and $S^T A^T \Delta b$ can be averaged with
weights $w$. This methodology echoes the procedure for constant motion models but
applies broadly across different parameterizations.

An alternative approach suggests using one parametric displacement field to approximate
the entire signal, simplifying the calculation of parameters to

$$
p = \left( \sum S_i^T A_i^T A_i S_i \right)^{-1} \sum S_i^T A_i^T \Delta b_i,
$$

summing over the entire signal to compute the displacement field parameters.


In [None]:
def motion_model(x, model):
    # Evaluate warp parametrization model at pixel coordinates
    if model == "constant":
        S = np.eye(2)

    elif model in ("affine", "eight_param"):
        # (height, width, 6 or 8)
        S = np.empty(list(x.shape) + [6 if model == "affine" else 8])

        S[..., 0, 0] = 1
        S[..., 0, 1] = x[..., 0]
        S[..., 0, 2] = x[..., 1]
        S[..., 0, 3] = 0
        S[..., 0, 4] = 0
        S[..., 0, 5] = 0

        S[..., 1, 0] = 0
        S[..., 1, 1] = 0
        S[..., 1, 2] = 0
        S[..., 1, 3] = 1
        S[..., 1, 4] = x[..., 0]
        S[..., 1, 5] = x[..., 1]

        if model == "eight_param":
            S[..., 0, 6] = x[..., 0] ** 2
            S[..., 0, 7] = x[..., 0] * x[..., 1]

            S[..., 1, 6] = x[..., 0] * x[..., 1]
            S[..., 1, 7] = x[..., 1] ** 2

    else:
        raise ValueError("Invalid parametrization model")

    return S

### Incorporating Prior Knowledge

A principal challenge with the method so far is the assumption that local polynomials at
the same coordinates in two signals are identical, except for a displacement. Since the
polynomial expansions are local models, they will vary spatially, introducing errors in
the constraints

$$
A(x)d(x) = \Delta b(x).
$$

For small displacements, this issue is not too severe, but it becomes more problematic
with larger displacements. Fortunately, we are not limited to comparing two polynomials
at the exact same coordinate. If we possess prior knowledge about the displacement
field, we can compare the polynomial at $x$ in the first signal to the polynomial at
$x + \tilde{d}(x)$ in the second signal, where $\tilde{d}(x)$ is the initial
displacement field rounded to integer values. This approach essentially allows us to
estimate the relative displacement between the real value and the rounded a priori
estimate, which is hopefully smaller.

This observation is incorporated into the algorithm by replacing the equations

$$
A(x) = \frac{A_1(x) + A_2(\tilde{x})}{2},
$$

$$
\Delta b(x) = -\frac{1}{2}(b_2(\tilde{x}) - b_1(x)) + A(x) \tilde{d}(x),
$$

where

$$
\tilde{x} = x + \tilde{d}(x).
$$

The first two terms in $\Delta b$ are involved in computing the remaining displacement,
while the last term adds back the rounded a priori displacement. We can observe that for
$\tilde{d}$ identically zero, these equations revert to the original form, as would be
expected.

The displacement estimation algorithm derived in the last three sections is illustrated
with a block diagram below. Inputs are the quadratic polynomial expansion coefficients
for the two signals, $A_1$, $b_1$, $A_2$, $b_2$, and an a priori displacement field
$d_{in}$. The output is the estimated displacement field $d_{out}$.

<figure>
  <div style="display: flex; justify-content: space-between; max-width: 300px; /* Adjust this value as needed */">
    <div style="text-align: center;">
      <img src="./images/fig_7_8.png" style="max-width: 100%; height: auto;">
      <p><strong></strong></p>
      <p>The block diagram of the basic displacement estimation algorithm (DE) adapted from [1].</p>
    </div>
  </div>
</figure>


### Reducing Noise

Upon closely examining the residual displacement field, it becomes evident that the
majority of noise originates from areas that either lack significant structures or have
very low contrast. Notably, this issue is pronounced in regions experiencing the
aperture problem, leading to noise within the parallel displacement components. To
address this, a technique involves "enforcing" the background field onto estimates that
are uncertain. This is achieved by incorporating a regularization term into equation
(7.22), aiming to minimize the expression:

$$
\sum_{\Delta x \in I} w(\Delta x)\|A(x + \Delta x)d(x) - \Delta b(x + \Delta x)\|^2 + \mu\|d(x) - d' (x)\|^2,
$$

where $d'$ denotes the previously estimated background displacement field, and $\mu$
represents a constant. The underlying concept is that the regularization term exerts
minimal impact when the displacement is strongly constrained by the summation in the
formula but becomes significant in its absence. This method is particularly effective
for the aperture problem, where the normal component is well-constrained, unlike the
parallel component. The solution to equation (7.35) is articulated as:

$$
d(x) = \left(\mu I + \sum wA^T A\right)^{-1} \left(\mu d'(x) + \sum wA^T \Delta b\right),
$$

simplifying the notation to enhance readability. Figure 7.19 outlines a block diagram
for the modified basic displacement estimation algorithm. This can be integrated with
the processes depicted in either figure 7.9 or figure 7.10 for iterative or multi-scale
algorithm variations.

Selecting an appropriate value for $\mu$ remains an open challenge. The method we
explored involves setting $\mu$ to the mean of half the trace of $G_{avg}$ (referencing
the notation from figure 7.19), calculated across the entire image. Although this
approach markedly diminishes noise, it also tends to decrease the magnitude of actual
residuals. In the context of motion detection, this compromise is deemed reasonable.
Figure 7.20 displays both the total and residual displacement fields from figures 7.2(a)
and 7.18(b), re-evaluated using this refined algorithm.


In [None]:
def estimate_displacement_with_regularization(A, S_T, S, delB, w, mu):
    # Pre-calculate quantities recommended by paper
    A_T = A.swapaxes(-1, -2)
    ATA = S_T @ A_T @ A @ S
    ATb = (S_T @ A_T @ delB[..., None])[..., 0]
    # btb = delB.swapaxes(-1, -2) @ delB
    G_avg = np.mean(ATA, axis=(0, 1))
    h_avg = np.mean(ATb, axis=(0, 1))
    p_avg = np.linalg.solve(G_avg, h_avg)  # fig. 7.8
    d_avg = (S @ p_avg[..., None])[..., 0]

    # Default value for mu is to set mu to 1/2 the trace of G_avg
    if mu is None:
        mu = 1 / 2 * np.trace(G_avg)

    # Apply separable cross-correlation to calculate linear equation
    # G = correlate1d(A_T @ A, w, axis=0, mode="constant", cval=0)
    G = correlate1d(ATA, w, axis=0, mode="constant", cval=0)
    G = correlate1d(G, w, axis=1, mode="constant", cval=0)

    # h = correlate1d(
    #     (A_T @ delB[..., None])[..., 0], w, axis=0, mode="constant", cval=0
    # )
    h = correlate1d(ATb, w, axis=0, mode="constant", cval=0)
    h = correlate1d(h, w, axis=1, mode="constant", cval=0)

    # Refine estimate of displacement field
    d = np.linalg.solve(G + mu * np.eye(2), h + mu * d_avg)
    return d


def estimate_displacement_without_regularization(A, S_T, S, delB, w):
    # Pre-calculate quantities recommended by paper
    A_T = A.swapaxes(-1, -2)
    ATA = S_T @ A_T @ A @ S
    ATb = (S_T @ A_T @ delB[..., None])[..., 0]
    # btb = delB.swapaxes(-1, -2) @ delB

    # If mu is 0, it means the global/average parametrized warp should not be
    # calculated, and the parametrization should apply to the local calculations
    # if mu == 0: # page 132
    # Apply separable cross-correlation to calculate linear equation
    # for each pixel: G*d = h
    G = correlate1d(ATA, w, axis=0, mode="constant", cval=0)
    G = correlate1d(G, w, axis=1, mode="constant", cval=0)

    h = correlate1d(ATb, w, axis=0, mode="constant", cval=0)
    h = correlate1d(h, w, axis=1, mode="constant", cval=0)

    d = (S @ np.linalg.solve(G, h)[..., None])[..., 0]
    return d

### Multi-scale Displacement Estimation

The issue of overly large displacements can be mitigated by performing the analysis at a
coarser scale. This entails utilizing a larger applicability kernel for the polynomial
expansion and/or applying a lowpass filter to the signal first, as discussed in section
4.5. The result is an algorithm capable of handling larger displacements, albeit with a
decrease in accuracy.

This leads to the adoption of a multi-scale approach. Begin with a coarse scale to
achieve a rough yet reasonable displacement estimate, and then refine this estimate
across finer scales to achieve progressively more accurate estimates. Figure 7.10
illustrates a diagram for a three-scale displacement estimation algorithm. To minimize
computations, both signals $f_1$ and $f_2$ are lowpass filtered and subsampled between
scales, but the algorithm is compatible with any multi-scale polynomial expansion
scheme. If the signal undergoes subsampling, it's necessary to upsample the estimated
displacement fields between scales, adjusting the values to match the new scale
accordingly. As in previous methods, the a priori displacement $d_{in}$ at the coarsest
scale is initially set to zero, unless there is direct knowledge of the displacement
field.

Unlike the iterative displacement estimation algorithm, this method necessitates the
calculation of new polynomial expansion coefficients for each scale. However, as we will
see in the following section, this only marginally impacts the computational complexity,
especially if subsampling is employed. It's also possible to integrate both strategies,
iterating multiple times at each scale, although this might not be an efficient
practice, except perhaps at the coarsest scale.

notice that `gen_gaussian_pyramids` will convert the output to floating point ranging
from 0 to 1.


In [None]:
def gen_gaussian_pyramids(img_list, n_pyr):
    """
    Applies Gaussian pyramid transformations to a list of images, zips the transformed images together,
    and then reverses the order of the resulting list.

    Parameters:
    - img_list: List of images to transform. Each image should be compatible with skimage.transform.pyramid_gaussian.
    - n_pyr: The number of pyramid layers to use in the transformation.

    Returns:
    - A reversed list of the zipped, pyramid-transformed images.
    """
    # Apply the Gaussian pyramid transformation to each image in the list with the specified number of layers
    transformed_images = list(
        map(partial(skimage.transform.pyramid_gaussian, max_layer=n_pyr), img_list)
    )

    # Zip the transformed images together and reverse the order
    zipped_and_reversed = reversed(list(zip(*transformed_images)))

    return zipped_and_reversed

### Iterative Displacement Estimation

The simplest solution, as depicted in figure 7.9, involves iterating the displacement
estimation process three times. The output displacement from one iteration serves as the
a priori displacement for the subsequent iteration. Initially, the a priori displacement
field $d_{in}$ is typically set to zero, unless there is actual knowledge available
about the displacement field. The same polynomial expansion coefficients are utilized
across all iterations and are required to be computed only once. While it is feasible to
set a fixed number of iterations, iterating until the displacement estimates have
converged is also a viable approach.

The vulnerability of this method lies primarily in the first iteration. If the
displacements (relative to the a priori displacements) are excessively large, it is
unreasonable to anticipate improvements in the output displacements, rendering further
iterations ineffective.


In [None]:
def flow_iterative(
    f1, f2, sigma, sigma_flow, num_iter=1, d=None, model="constant", mu=None
):

    # TODO: add initial warp parameters as optional input?

    height, width, *_ = f1.shape
    c1 = generate_certainty(height, width, 5)  # (height, width) float64 0.0 1.0
    c2 = generate_certainty(height, width, 5)  # (height, width) float64 0.0 1.0

    # Calculate the polynomial expansion at each point in the images
    A1, B1, C1 = poly_exp(f1, c1, sigma)
    A2, B2, C2 = poly_exp(f2, c2, sigma)

    # Pixel coordinates of each point in the images, (height, width, 2)
    x = np.stack(
        np.broadcast_arrays(np.arange(f1.shape[0])[:, None], np.arange(f1.shape[1])),
        axis=-1,
    ).astype(int)

    # Initialize displacement field
    if d is None:
        d = np.zeros(list(f1.shape) + [2])  # (height, width, 2)

    # Set up applicability convolution window
    n_flow = int(4 * sigma_flow + 1)
    xw = np.arange(-n_flow, n_flow + 1)
    w = np.exp(-(xw**2) / (2 * sigma_flow**2))

    S = motion_model(x, model)

    S_T = S.swapaxes(-1, -2)

    # Iterate convolutions to estimate the optical flow
    for _ in range(num_iter):
        # Set d~ as displacement field fit to nearest pixel (and constrain to not
        # being off image). Note we are setting certainty to 0 for points that
        # would have been off-image had we not constrained them
        d_ = d.astype(int)  # priori displacement
        x_ = x + d_

        # x_ = np.maximum(np.minimum(x_, np.array(f1.shape) - 1), 0)

        # Constrain d~ to be on-image, and find points that would have
        # been off-image
        x_2 = np.maximum(np.minimum(x_, np.array(f1.shape) - 1), 0)
        off_f = np.any(x_ != x_2, axis=-1)
        x_ = x_2

        # Set certainty to 0 for off-image points
        c_ = c1[x_[..., 0], x_[..., 1]]
        c_[off_f] = 0

        # Calculate A and delB for each point, according to paper
        A = (A1 + A2[x_[..., 0], x_[..., 1]]) / 2
        A *= c_[
            ..., None, None
        ]  # recommendation in paper: add in certainty by applying to A and delB

        delB = -1 / 2 * (B2[x_[..., 0], x_[..., 1]] - B1) + (A @ d_[..., None])[..., 0]
        delB *= c_[
            ..., None
        ]  # recommendation in paper: add in certainty by applying to A and delB

        if mu == 0:
            d = estimate_displacement_without_regularization(A, S_T, S, delB, w)
        else:
            d = estimate_displacement_with_regularization(A, S_T, S, delB, w, mu)

    return d

In [None]:
def calc_optical_flow_farneback(img1, img2):
    assert img1.shape == img2.shape

    n_pyr = 3
    opts = dict(
        sigma=4.0,
        sigma_flow=4.0,
        num_iter=3,
        model="constant",
        mu=None,  # if mu is zero, there will be singularity error in linalg error. Use None and let it be computed automatically
    )

    d = None  # optical flow field

    gaussion_pyramids = gen_gaussian_pyramids([img1, img2], n_pyr=n_pyr)
    for pyr1, pyr2 in gaussion_pyramids:
        if d is not None:
            # TODO: account for shapes not quite matching
            d = skimage.transform.pyramid_expand(d, channel_axis=-1)
            d = d[: pyr1.shape[0], : pyr2.shape[1]] * 2

        d = flow_iterative(pyr1, pyr2, d=d, **opts)

    return d

In [None]:
import cv2

cap = cv2.VideoCapture(cv2.samples.findFile("input/snatch.mp4"))  # 25 fps
ret, frame1 = cap.read()
prev = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
# prev: (1080, 1920) uint8 0 255
hsv = np.zeros_like(frame1)
hsv[..., 1] = 255

fourcc = cv2.VideoWriter_fourcc(*"mp4v")  # Codec
print("cap.get(cv2.CAP_PROP_FPS)", cap.get(cv2.CAP_PROP_FPS))
out = cv2.VideoWriter(
    "./output/snatch_optical_flow.mp4",
    fourcc,
    fps=cap.get(cv2.CAP_PROP_FPS),
    frameSize=(frame1.shape[1], frame1.shape[0]),
)

stop = 1
while 1:
    ret, frame2 = cap.read()
    if not ret or stop == 0:
        print("No frames grabbed!")
        break
    next = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)
    # next: (1080, 1920) uint8 0 255

    flow = calc_optical_flow_farneback(prev, next)
    # flow: (1080, 1920, 2) float64

    bgr = my_utils.flow_to_color(flow, hsv)

    # out.write(bgr)
    cv2.imwrite("./output_farneback.png", bgr)

    # cv2.imshow("frame2", bgr)
    # k = cv2.waitKey(30) & 0xFF
    # if k == 27:
    #     break
    # elif k == ord("s"):
    #     cv2.imwrite("opticalfb.png", frame2)
    #     cv2.imwrite("opticalhsv.png", bgr)

    prev = next
    stop -= 1
    print("stop: ", stop)

cap.release()
out.release()
cv2.destroyAllWindows()