# Dense Optical Flow with Gunnar Farnebäck's Algorithm

Dense optical flow aims to identify the movement of each pixel across a series of
images. In this notebook, I'll build Farnebäck's algorithm from the ground up. Dense
optical flow is important in fields such as structure from motion, video compression,
and video stabilization, among others.

**RULES:** As usual, **`OpenCV`** is banned in this repository.

**References:**

-   [1]
    [Optical Flow - Michael Black - MLSS 2013 Tübingen](https://www.youtube.com/watch?v=tIwpDuqJqcE)

-   [2]
    [Polynomial Expansion for Orientation and Motion Estimation](https://www.ida.liu.se/ext/WITAS-ev/Computer_Vision_Technologies/PaperInfo/farneback02.html)
    
-   [3]
    [ericPrince's Pure python implementation of Gunnar Farneback's optical flow algorithm](https://github.com/ericPrince/optical-flow)


In [None]:
import numpy as np
from scipy.ndimage import correlate1d
from functools import partial
import skimage.io
import skimage.transform
import my_utils

# np.set_printoptions(threshold=np.inf)

## Motivation

In the world of computer vision, understanding how pixels move between images in a sequence is crucial. This process involves tracking the displacement of pixels across consecutive frames, which is based on two key assumptions:

**Assumptions:**

1. **Brightness Constancy:** This principle states that the brightness of a pixel remains constant between consecutive images, despite its movement to a new position. This idea is captured by the equation:

   $$
   I(x + u, y + v, t + 1) = I(x, y, t)
   $$

   Here, $u$ and $v$ represent the horizontal and vertical shifts in the pixel's position, illustrating that the pixel's intensity doesn't change as it moves.

2. **Spatial Smoothness:** This assumption is based on the idea that neighboring pixels usually move in a similar fashion because they are likely part of the same surface. Surfaces tend to be smooth, implying that adjacent pixels will have comparable motion. This concept is summarized as follows:

   $$
   u_p = u_n \quad \text{and} \quad v_p = v_n, \quad \forall n \in G(p)
   $$

   It means that the movement of a pixel $p$ in both horizontal ($u$) and vertical ($v$) directions is similar to that of its neighboring pixel $n$, suggesting that optical flow changes smoothly across the image.

**Objective Function:**

Derived from these assumptions, the objective functions are:

-   **Brightness Constancy:**

    $$
    E_D(u, v) = \sum (I(x + u, y + v, t + 1) - I(x, y, t))^2
    $$

    This equation emphasizes that deviations from brightness constancy are minimized, assuming the presence of Gaussian noise.

-   **Spatial Smoothness:**

    $$
    E_s(u, v) = \sum(u_p - u_n)^2 + \sum(v_p - v_n)^2, \quad \forall n \in G(p)
    $$

    This formula encourages smoothness by penalizing variations in the motion between a pixel and its neighbors.

**Solving the Equations**

The principle of brightness constancy implies that a pixel's intensity does not change over time as it moves. This is mathematically represented as:

$$
I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)
$$

For slight movements ($\Delta x$, $\Delta y$, $\Delta t$), we can use the first-order Taylor series expansion for image intensity $I$:

$$
I(x + \Delta x, y + \Delta y, t + \Delta t) \approx I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t
$$

By applying the brightness constancy condition and simplifying, we eliminate the term $I(x, y, t)$ on both sides:

$$
I_x \Delta x + I_y \Delta y + I_t \Delta t = 0
$$

Dividing through by $\Delta t$ (assuming it is not zero) and using $\Delta x / \Delta t = u$ and $\Delta y / \Delta t = v$ for velocity components, we arrive at the optical flow constraint equation:

$$
I_x u + I_y v + I_t = 0


## polynomial expansion for orientation and motion estimation

In Gunnar Farnebäck's thesis and related work on optical flow and image analysis, the
polynomial expansion of local neighborhoods around each pixel is used to approximate the
local image structure and motion. This approach, often associated with the estimation of
optical flow, involves fitting a polynomial to the intensity values of a local
neighborhood around each pixel in an image. The quadratic, linear, and constant terms in
this polynomial expansion have specific interpretations relating to the image's local
structure and motion. Let's break down what A, B, and C represent in the context of a 2D
grayscale image and their physical interpretations:

**Quadratic Term (A):** The quadratic term, represented by matrix A, captures the
curvature of the image intensity surface in the local neighborhood of a pixel. This term
can be thought of as containing coefficients of a quadratic polynomial that models how
the intensity varies in two dimensions. It reflects the local geometric structure of the
image around each pixel, such as edges, corners, or flat areas. High values in this term
indicate regions with high curvature or rapid changes in intensity, which often
correspond to edges or texture in the image.

**Linear Term (B):** The linear term, represented by vector B, captures the first-order
change in intensity in the local neighborhood, essentially representing the gradient of
the image at each pixel. This term indicates the direction and magnitude of the most
significant intensity change. In the context of motion, it can be related to the primary
direction of motion or flow in the local area. This term is crucial for understanding
the orientation and intensity gradient of features within the image.

**Constant Term (C):** The constant term, represented by C, corresponds to the average
or base intensity level within the local neighborhood around a pixel. It represents the
overall intensity offset of the local region. This term is less about the local
structure or motion and more about the general brightness or darkness of the area.


### Signal, Certainty, and Applicability

Let $f$ represent the complete signal, while $\hat{f}$ signifies the local neighborhood
around a specific point. It is important to note that the neighborhood is of finite
size, making $\hat{f}$ a member of a finite-dimensional vector space, denoted as $C^n$.
Regardless of the space's dimensionality it resides in, $\hat{f}$ is depicted as an
$n \times 1$ column vector. Certainty quantifies our confidence in the signal's values
at each location, represented by non-negative real numbers. The symbol $c$ encapsulates
the entire field of certainty, while the $n \times 1$ column vector $\hat{c}$
specifically denotes the certainty levels within $\hat{f}$. Applicability refers to a
unique form of "certainty" applicable to the basis functions. Rather than acting as a
direct measure of certainty or confidence, it highlights the relevance or significance
of each point within the neighborhood. Like certainty, applicability, denoted by $a$, is
expressed as an $n \times 1$ vector consisting of non-negative values. Points of zero
applicability could technically be omitted from the neighborhood; however, practical
considerations might justify their inclusion. Although intuitively it might seem
appropriate to confine applicability values within the $[0, 1]$ range, such restrictions
are unnecessary as the scale of these values is ultimately irrelevant.

For further details on generating applicability, refer to page 43 and Section 3.10 of
[1]. The functions `generate_applicability` and `generate_certainty` are implemented as
follows.


In [None]:
def generate_applicability(sigma):
    """Calculate 1D Gaussian applicability kernel."""
    n = int(4 * sigma + 1)  # Capture significant parts of the Gaussian distribution
    x = np.arange(-n, n + 1)
    applicability = np.exp(-(x**2) / (2 * sigma**2))
    return x, applicability


def generate_certainty(height, width, denominator=5):
    """should it be gaussion or linear"""
    kernel = np.minimum(
        1,
        1 / denominator * np.minimum(np.arange(height)[:, None], np.arange(width)),
    )
    kernel = np.minimum(
        kernel,
        1
        / denominator
        * np.minimum(
            height - 1 - np.arange(height)[:, None],
            width - 1 - np.arange(width),
        ),
    )
    return kernel

### Separable Correlation

Separable correlation is a versatile computational strategy employed across various
domains, including signal processing and image analysis. This technique simplifies
correlation operations, significantly enhancing efficiency, particularly when dealing
with large filters or kernels. The essence of separability lies in the ability to
decompose a two-dimensional kernel into two orthogonal one-dimensional kernels. Such
decomposition enables the execution of two-dimensional correlation operations as a
series of two sequential one-dimensional processes, initially across rows and
subsequently down columns, or vice versa.

To understand this concept mathematically, consider a two-dimensional kernel $H$ that
can be represented as the outer product of two one-dimensional vectors, $u$ and $v$:

$$
H = u \otimes v
$$

In this expression, $\otimes$ symbolizes the outer product. A kernel is deemed separable
if it is possible to identify vectors $u$ and $v$ for which the above relationship is
true.

When applying this to an input signal or image, denoted as $I$, the operation of
two-dimensional correlation with a separable kernel $H$ unfolds in two distinct phases:

1. **Horizontal Pass:** The initial one-dimensional kernel $u$ is applied across each
   row of $I$.
2. **Vertical Pass:** Subsequently, the second one-dimensional kernel $v$ is applied
   down each column of the result from the horizontal pass.

Adopting this strategy substantially lowers computational complexity. In a scenario
involving a signal with dimensions $M \times N$ and a kernel sized $K \times K$,
executing a conventional two-dimensional correlation requires $O(MNK^2)$
multiplications. Conversely, with separable correlation, the requisite number of
multiplications drops to $O(MNK + MNK) = O(2MNK)$, marking a significant reduction.

While the term "separable correlation" might broadly encompass any correlation operation
(including autocorrelation) that benefits from separability, it underscores the
computational efficiency gained by decomposing complex operations into more manageable,
sequential steps. Regardless of its specific application, the principle of separability
facilitates more resource-efficient computations, proving invaluable in enhancing
performance and expediting processing times in numerous applications.


In [None]:
def poly_exp(f, certainty, sigma):
    x, applicability = generate_applicability(sigma)

    # b: calculate b from the paper. Calculate separately for X and Y dimensions
    # [n, 6]
    bx = np.stack(
        [
            np.ones(applicability.shape),
            x,
            np.ones(applicability.shape),
            x**2,
            np.ones(applicability.shape),
            x,
        ],
        axis=-1,
    )
    by = np.stack(
        [
            np.ones(applicability.shape),
            np.ones(applicability.shape),
            x,
            np.ones(applicability.shape),
            x**2,
            x,
        ],
        axis=-1,
    )

    # Pre-calculate product of certainty and signal
    cf = certainty * f

    # G and v are used to calculate "r" from the paper: v = G*r. eg. 4.9
    # r is the parametrization of the 2nd order polynomial for f
    G = np.empty(list(f.shape) + [bx.shape[-1]] * 2)
    v = np.empty(
        list(f.shape) + [bx.shape[-1]]
    )  #  the product of the signal and the applicability

    # Apply separable cross-correlations

    # Pre-calculate quantities recommended in paper
    ab = np.einsum("i,ij->ij", applicability, bx)
    abb = np.einsum("ij,ik->ijk", ab, bx)

    # Calculate G and v for each pixel with cross-correlation
    for i in range(bx.shape[-1]):
        for j in range(bx.shape[-1]):
            G[..., i, j] = correlate1d(
                certainty, abb[..., i, j], axis=0, mode="constant", cval=0
            )  # use certainty

        v[..., i] = correlate1d(cf, ab[..., i], axis=0, mode="constant", cval=0)

    # Pre-calculate quantities recommended in paper
    ab = np.einsum("i,ij->ij", applicability, by)
    abb = np.einsum("ij,ik->ijk", ab, by)

    # Calculate G and v for each pixel with cross-correlation
    for i in range(bx.shape[-1]):
        for j in range(bx.shape[-1]):
            G[..., i, j] = correlate1d(
                G[..., i, j], abb[..., i, j], axis=1, mode="constant", cval=0
            )  # use G

        v[..., i] = correlate1d(v[..., i], ab[..., i], axis=1, mode="constant", cval=0)

    # Solve r for each pixel
    r = np.linalg.solve(G, v)

    # Quadratic term
    A = np.empty(list(f.shape) + [2, 2])
    A[..., 0, 0] = r[..., 3]
    A[..., 0, 1] = r[..., 5] / 2
    A[..., 1, 0] = A[..., 0, 1]
    A[..., 1, 1] = r[..., 4]

    # Linear term
    B = np.empty(list(f.shape) + [2])
    B[..., 0] = r[..., 1]
    B[..., 1] = r[..., 2]

    # constant term
    C = r[..., 0]

    # b: [n, n, 6]
    # r: [f, f, 6]
    # f: [f, f]
    # e = b*r - f, eg. 3.19

    return A, B, C



After defining the polynomial expansion function `poly_exp`, we can proceed to visualize
the variables `A`, `B`, and `C`.


In [None]:
import cv2

img = cv2.imread("input/yosemite/yos02.tif")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print("img:", img.shape, img.dtype, img.min(), img.max())
# TODO: Use float64 should also work. Input image should be floating point.
# gray = gray.astype(np.float64)
# gray /= 255
print(f"gray: {gray.shape}, {gray.dtype}, {gray.min()}, {gray.max()}")

height, width, *_ = img.shape
certainty = generate_certainty(height, width, denominator=5)
print(f"certainty: {certainty.shape}, {certainty.dtype}, {certainty.min()}, {certainty.max()}")

A, B, C = poly_exp(f=gray, certainty=certainty, sigma=4)
print("A:", A.shape)
print("B:", B.shape)
print("C:", C.shape)

my_utils.visualize_polynomial_expansion(
    img, A, B, C, out_path="./output/polynomial_expansion.png"
)

## Displacement Estimation

Since the result of polynomial expansion is that each neighborhood is approximated by a
polynomial, it becomes interesting to analyze what happens if a polynomial undergoes an
ideal translation. Consider the exact quadratic polynomial:

$$
f_1(x) = x^T A_1 x + b_1^T x + c_1
$$

and construct a new signal $f_2$ by a global displacement by $d$ as in eg. 7.2 in [2]

$$
f_2(x) = f_1(x - d) = x^T A_2 x + b_2^T x + c_2.
$$

Equating the coefficients in the quadratic polynomials yields

$$
A_2 = A_1,
$$

$$
b_2 = b_1 - 2A_1 d,
$$

$$
c_2 = d^T A_1 d - b_1^T d + c_1.
$$

The key observation is that by equation $b_2 = b_1 - 2A_1 d$, we can solve for the
translation $d$, at least if $A_1$ is non-singular,

$$
d = -\frac{1}{2} A_1^{-1} (b_2 - b_1).
$$

We note that this observation holds for any signal dimensionality.

The intuitive explanation is that the stationary points of

$$
f(x) = x^T Ax + b^T x + c
$$

can be found by differentiating \(f\) and setting the result to 0,

$$
\nabla f(x) = 2Ax + b = 0,
$$

$$
x = -\frac{1}{2} A^{-1} b.
$$

If we assume that \(A\) is non-singular and rewrite the equation for \(d\) as

$$
d = \left(-\frac{1}{2} A_2^{-1} b_2\right) - \left(-\frac{1}{2} A_1^{-1} b_1\right),
$$

we obtain the displacement as the observed movement of the stationary point.

Obviously, the assumptions that an entire signal can be represented as a single
polynomial and that two signals are related by a global translation are quite
unrealistic. Nonetheless, the basic relation can still be applied to real signals, even
though errors are introduced when these assumptions are relaxed. The question then
becomes whether these errors can be kept small enough to yield useful algorithms.

To address this, in practical consideratino, we start by replacing the global polynomial
described in the equation with local polynomial approximations. This means conducting a
polynomial expansion of both images, resulting in expansion coefficients $A_1(x)$,
$b_1(x)$, and $c_1(x)$ for the first image and $A_2(x)$, $b_2(x)$, and
$c_2(x)$ for the second image. Ideally, this should result in $A_1 = A_2$ according
to the earlier equation, but in practice, we have to settle for the approximation

$$
A(x) = \frac{A_1(x) + A_2(x)}{2}.
$$

We also introduce

$$
\Delta b(x) = -\frac{1}{2}(b_2(x) - b_1(x))
$$

to obtain the primary constraint

$$
A(x)d(x) = \Delta b(x),
$$

where \(d(x)\) indicates that we have also replaced the global displacement with a
spatially varying displacement field.


### calculating displacement over a neightborhood

In this section, we return to the challenge of estimating a general displacement field, not limited to the x-axis. As observed in the previous section, a pointwise solution of 

$$
A(x)d(x) = \Delta b(x),
$$ 

does not yield satisfactory outcomes. To enhance this, we adopt the assumption, as before, that the displacement field varies only gradually. This time, our aim is to find $d(x)$ that satisfies 

$$
A(x)d(x) = \Delta b(x),
$$ 

as closely as possible over a neighborhood $I$ of $x$, or more formally, by minimizing

$$
\sum_{\Delta x \in I} w(\Delta x)\|A(x + \Delta x)d(x) - \Delta b(x + \Delta x)\|^2,
$$

where $w(\Delta x)$ acts as a weight function (applicability). The minimum is obtained for

$$
d(x) = \left(\sum wA^T A\right)^{-1} \sum wA^T \Delta b,
$$

having simplified some indices for readability. The minimum value is given by

$$
e(x) = \sum w\Delta b^T \Delta b - d(x)^T \left(\sum wA^T \Delta b\right).
$$

In practical terms, this means we compute $A^T A$, $A^T \Delta b$, and $\Delta b^T \Delta b$ pointwise and average these with $w$ before solving for the displacement. The minimum value $e(x)$ serves as a reversed confidence indicator, where smaller numbers signify higher confidence. The solution presented exists and is unique unless the entire neighborhood faces the aperture problem.

At times, it may be beneficial to include a certainty weight $c(x + \Delta x)$ in the equation. This adjustment is most straightforwardly addressed by scaling $A$ and $\Delta b$ accordingly.


### estimating a parameterized displacement field

Like in section 6.3, we can improve robustness if the displacement field can be
parameterized according to some motion model. The approach is very similar, and we
derive it for the eight-parameter model in 2D, given by equation (6.6),

$$
d_x(x, y) = a_1 + a_2 x + a_3 y + a_7 x^2 + a_8 xy,
$$

$$
d_y(x, y) = a_4 + a_5 x + a_6 y + a_7 xy + a_8 y^2.
$$

We can rewrite this in matrix form similar to (6.28) and (6.29), except that we do not
have an extra temporal dimension,

$$
d = Sp,
$$

$$
S = \left[ \begin{array}{cc}
1 & x & y & 0 & 0 & 0 & x^2 & xy \\
0 & 0 & 0 & 1 & x & y & xy & y^2 \\
\end{array} \right],
$$

$$
p = \left[ \begin{array}{c}
a_1 \\
a_2 \\
a_3 \\
a_4 \\
a_5 \\
a_6 \\
a_7 \\
a_8 \\
\end{array} \right].
$$

Inserting into equation for the weighted least squares problem,

$$
\sum w_i \|A_i S_i p - \Delta b_i\|^2,
$$

where we use $i$ to index the coordinates in a neighborhood. The solution is

$$
p = \left( \sum w_i S_i^T A_i^T A_i S_i \right)^{-1} \sum w_i S_i^T A_i^T \Delta b_i.
$$

We can notice that, just as in section 6.3, any motion model which is linear in its
parameters can be used. We also notice that, like in the previous section, we can
compute $S^T A^T AS$ and $S^T A^T \Delta b$ pointwise and then average these with $w$.
In fact, the solution reduces to the previous equation for the constant motion model.

A minor variation of the idea is to approximate the entire signal with one parametric
displacement field, allowing us to compute the parameters by

$$
p = \left( \sum S_i^T A_i^T A_i S_i \right)^{-1} \sum S_i^T A_i^T \Delta b_i,
$$

where the summation is over the whole signal.


In [None]:
def motion_model(x, model):
    # Evaluate warp parametrization model at pixel coordinates
    if model == "constant":
        S = np.eye(2)

    elif model in ("affine", "eight_param"):
        # (height, width, 6 or 8)
        S = np.empty(list(x.shape) + [6 if model == "affine" else 8])

        S[..., 0, 0] = 1
        S[..., 0, 1] = x[..., 0]
        S[..., 0, 2] = x[..., 1]
        S[..., 0, 3] = 0
        S[..., 0, 4] = 0
        S[..., 0, 5] = 0

        S[..., 1, 0] = 0
        S[..., 1, 1] = 0
        S[..., 1, 2] = 0
        S[..., 1, 3] = 1
        S[..., 1, 4] = x[..., 0]
        S[..., 1, 5] = x[..., 1]

        if model == "eight_param":
            S[..., 0, 6] = x[..., 0] ** 2
            S[..., 0, 7] = x[..., 0] * x[..., 1]

            S[..., 1, 6] = x[..., 0] * x[..., 1]
            S[..., 1, 7] = x[..., 1] ** 2

    else:
        raise ValueError("Invalid parametrization model")

    return S

### Incorporating Prior Knowledge

A principal challenge with the method so far is the assumption that local polynomials at
the same coordinates in two signals are identical, except for a displacement. Since the
polynomial expansions are local models, they will vary spatially, introducing errors in
the constraints

$$
A(x)d(x) = \Delta b(x).
$$

For small displacements, this issue is not too severe, but it becomes more problematic
with larger displacements. Fortunately, we are not limited to comparing two polynomials
at the exact same coordinate. If we possess prior knowledge about the displacement
field, we can compare the polynomial at $x$ in the first signal to the polynomial at
$x + \tilde{d}(x)$ in the second signal, where $\tilde{d}(x)$ is the initial
displacement field rounded to integer values. This approach essentially allows us to
estimate the relative displacement between the real value and the rounded a priori
estimate, which is hopefully smaller.

This observation is incorporated into the algorithm by replacing the equations

$$
A(x) = \frac{A_1(x) + A_2(\tilde{x})}{2},
$$

$$
\Delta b(x) = -\frac{1}{2}(b_2(\tilde{x}) - b_1(x)) + A(x) \tilde{d}(x),
$$

where

$$
\tilde{x} = x + \tilde{d}(x).
$$

The first two terms in $\Delta b$ are involved in computing the remaining displacement,
while the last term adds back the rounded a priori displacement. We can observe that for
$\tilde{d}$ identically zero, these equations revert to the original form, as would be
expected.

The displacement estimation algorithm derived in the last three sections is illustrated
with a block diagram in figure 7.8. Inputs are the quadratic polynomial expansion
coefficients for the two signals, $A_1$, $b_1$, $A_2$, $b_2$, and an a priori
displacement field $d_{in}$. The output is the estimated displacement field $d_{out}$.



### Iterative Displacement Estimation

The simplest solution, as depicted in figure 7.9, involves iterating the displacement
estimation process three times. The output displacement from one iteration serves as the
a priori displacement for the subsequent iteration. Initially, the a priori displacement
field $d_{in}$ is typically set to zero, unless there is actual knowledge available
about the displacement field. The same polynomial expansion coefficients are utilized
across all iterations and are required to be computed only once. While it is feasible to
set a fixed number of iterations, iterating until the displacement estimates have
converged is also a viable approach.

The vulnerability of this method lies primarily in the first iteration. If the
displacements (relative to the a priori displacements) are excessively large, it is
unreasonable to anticipate improvements in the output displacements, rendering further
iterations ineffective.


### Multi-scale Displacement Estimation

The issue of overly large displacements can be mitigated by performing the analysis at a
coarser scale. This entails utilizing a larger applicability kernel for the polynomial
expansion and/or applying a lowpass filter to the signal first, as discussed in section
4.5. The result is an algorithm capable of handling larger displacements, albeit with a
decrease in accuracy.

This leads to the adoption of a multi-scale approach. Begin with a coarse scale to
achieve a rough yet reasonable displacement estimate, and then refine this estimate
across finer scales to achieve progressively more accurate estimates. Figure 7.10
illustrates a diagram for a three-scale displacement estimation algorithm. To minimize
computations, both signals $f_1$ and $f_2$ are lowpass filtered and subsampled between
scales, but the algorithm is compatible with any multi-scale polynomial expansion
scheme. If the signal undergoes subsampling, it's necessary to upsample the estimated
displacement fields between scales, adjusting the values to match the new scale
accordingly. As in previous methods, the a priori displacement $d_{in}$ at the coarsest
scale is initially set to zero, unless there is direct knowledge of the displacement
field.

Unlike the iterative displacement estimation algorithm, this method necessitates the
calculation of new polynomial expansion coefficients for each scale. However, as we will
see in the following section, this only marginally impacts the computational complexity,
especially if subsampling is employed. It's also possible to integrate both strategies,
iterating multiple times at each scale, although this might not be an efficient
practice, except perhaps at the coarsest scale.

notice that `gen_gaussian_pyramids` will convert the output to floating point ranging
from 0 to 1.


In [None]:
def gen_gaussian_pyramids(img_list, n_pyr):
    """
    Applies Gaussian pyramid transformations to a list of images, zips the transformed images together,
    and then reverses the order of the resulting list.

    Parameters:
    - img_list: List of images to transform. Each image should be compatible with skimage.transform.pyramid_gaussian.
    - n_pyr: The number of pyramid layers to use in the transformation.

    Returns:
    - A reversed list of the zipped, pyramid-transformed images.
    """
    # Apply the Gaussian pyramid transformation to each image in the list with the specified number of layers
    transformed_images = list(
        map(partial(skimage.transform.pyramid_gaussian, max_layer=n_pyr), img_list)
    )

    # Zip the transformed images together and reverse the order
    zipped_and_reversed = reversed(list(zip(*transformed_images)))

    return zipped_and_reversed

### Reducing Noise

Upon closely examining the residual displacement field, it becomes evident that the
majority of noise originates from areas that either lack significant structures or have
very low contrast. Notably, this issue is pronounced in regions experiencing the
aperture problem, leading to noise within the parallel displacement components. To
address this, a technique involves "enforcing" the background field onto estimates that
are uncertain. This is achieved by incorporating a regularization term into equation
(7.22), aiming to minimize the expression:

$$
\sum_{\Delta x \in I} w(\Delta x)\|A(x + \Delta x)d(x) - \Delta b(x + \Delta x)\|^2 + \mu\|d(x) - d' (x)\|^2,
$$

where $d'$ denotes the previously estimated background displacement field, and $\mu$
represents a constant. The underlying concept is that the regularization term exerts
minimal impact when the displacement is strongly constrained by the summation in the
formula but becomes significant in its absence. This method is particularly effective
for the aperture problem, where the normal component is well-constrained, unlike the
parallel component. The solution to equation (7.35) is articulated as:

$$
d(x) = \left(\mu I + \sum wA^T A\right)^{-1} \left(\mu d'(x) + \sum wA^T \Delta b\right),
$$

simplifying the notation to enhance readability. Figure 7.19 outlines a block diagram
for the modified basic displacement estimation algorithm. This can be integrated with
the processes depicted in either figure 7.9 or figure 7.10 for iterative or multi-scale
algorithm variations.

Selecting an appropriate value for $\mu$ remains an open challenge. The method we
explored involves setting $\mu$ to the mean of half the trace of $G_{avg}$ (referencing
the notation from figure 7.19), calculated across the entire image. Although this
approach markedly diminishes noise, it also tends to decrease the magnitude of actual
residuals. In the context of motion detection, this compromise is deemed reasonable.
Figure 7.20 displays both the total and residual displacement fields from figures 7.2(a)
and 7.18(b), re-evaluated using this refined algorithm.


In [None]:
def estimate_displacement_with_regularization(A, S_T, S, delB, w, mu):
    # Pre-calculate quantities recommended by paper
    A_T = A.swapaxes(-1, -2)
    ATA = S_T @ A_T @ A @ S
    ATb = (S_T @ A_T @ delB[..., None])[..., 0]
    # btb = delB.swapaxes(-1, -2) @ delB
    G_avg = np.mean(ATA, axis=(0, 1))
    h_avg = np.mean(ATb, axis=(0, 1))
    p_avg = np.linalg.solve(G_avg, h_avg)  # fig. 7.8
    d_avg = (S @ p_avg[..., None])[..., 0]

    # Default value for mu is to set mu to 1/2 the trace of G_avg
    if mu is None:
        mu = 1 / 2 * np.trace(G_avg)

    # Apply separable cross-correlation to calculate linear equation
    # G = correlate1d(A_T @ A, w, axis=0, mode="constant", cval=0)
    G = correlate1d(ATA, w, axis=0, mode="constant", cval=0)
    G = correlate1d(G, w, axis=1, mode="constant", cval=0)

    # h = correlate1d(
    #     (A_T @ delB[..., None])[..., 0], w, axis=0, mode="constant", cval=0
    # )
    h = correlate1d(ATb, w, axis=0, mode="constant", cval=0)
    h = correlate1d(h, w, axis=1, mode="constant", cval=0)

    # Refine estimate of displacement field
    d = np.linalg.solve(G + mu * np.eye(2), h + mu * d_avg)
    return d


def estimate_displacement_without_regularization(A, S_T, S, delB, w):
    # Pre-calculate quantities recommended by paper
    A_T = A.swapaxes(-1, -2)
    ATA = S_T @ A_T @ A @ S
    ATb = (S_T @ A_T @ delB[..., None])[..., 0]
    # btb = delB.swapaxes(-1, -2) @ delB

    # If mu is 0, it means the global/average parametrized warp should not be
    # calculated, and the parametrization should apply to the local calculations
    # if mu == 0: # page 132
    # Apply separable cross-correlation to calculate linear equation
    # for each pixel: G*d = h
    G = correlate1d(ATA, w, axis=0, mode="constant", cval=0)
    G = correlate1d(G, w, axis=1, mode="constant", cval=0)

    h = correlate1d(ATb, w, axis=0, mode="constant", cval=0)
    h = correlate1d(h, w, axis=1, mode="constant", cval=0)

    d = (S @ np.linalg.solve(G, h)[..., None])[..., 0]
    return d


def flow_iterative(
    f1, f2, sigma, sigma_flow, num_iter=1, d=None, model="constant", mu=None
):

    # TODO: add initial warp parameters as optional input?

    height, width, *_ = f1.shape
    c1 = generate_certainty(height, width, 5)  # (height, width) float64 0.0 1.0
    c2 = generate_certainty(height, width, 5)  # (height, width) float64 0.0 1.0

    # Calculate the polynomial expansion at each point in the images
    A1, B1, C1 = poly_exp(f1, c1, sigma)
    A2, B2, C2 = poly_exp(f2, c2, sigma)

    # Pixel coordinates of each point in the images, (height, width, 2)
    x = np.stack(
        np.broadcast_arrays(np.arange(f1.shape[0])[:, None], np.arange(f1.shape[1])),
        axis=-1,
    ).astype(int)

    # Initialize displacement field
    if d is None:
        d = np.zeros(list(f1.shape) + [2])  # (height, width, 2)

    # Set up applicability convolution window
    n_flow = int(4 * sigma_flow + 1)
    xw = np.arange(-n_flow, n_flow + 1)
    w = np.exp(-(xw**2) / (2 * sigma_flow**2))

    S = motion_model(x, model)

    S_T = S.swapaxes(-1, -2)

    # Iterate convolutions to estimate the optical flow
    for _ in range(num_iter):
        # Set d~ as displacement field fit to nearest pixel (and constrain to not
        # being off image). Note we are setting certainty to 0 for points that
        # would have been off-image had we not constrained them
        d_ = d.astype(int)  # priori displacement
        x_ = x + d_

        # x_ = np.maximum(np.minimum(x_, np.array(f1.shape) - 1), 0)

        # Constrain d~ to be on-image, and find points that would have
        # been off-image
        x_2 = np.maximum(np.minimum(x_, np.array(f1.shape) - 1), 0)
        off_f = np.any(x_ != x_2, axis=-1)
        x_ = x_2

        # Set certainty to 0 for off-image points
        c_ = c1[x_[..., 0], x_[..., 1]]
        c_[off_f] = 0

        # Calculate A and delB for each point, according to paper
        A = (A1 + A2[x_[..., 0], x_[..., 1]]) / 2
        A *= c_[
            ..., None, None
        ]  # recommendation in paper: add in certainty by applying to A and delB

        delB = -1 / 2 * (B2[x_[..., 0], x_[..., 1]] - B1) + (A @ d_[..., None])[..., 0]
        delB *= c_[
            ..., None
        ]  # recommendation in paper: add in certainty by applying to A and delB

        if mu == 0:
            d = estimate_displacement_without_regularization(A, S_T, S, delB, w)
        else:
            d = estimate_displacement_with_regularization(A, S_T, S, delB, w, mu)

    return d

In [None]:
def calc_optical_flow_farneback(img1, img2):
    assert img1.shape == img2.shape

    n_pyr = 3
    opts = dict(
        sigma=4.0,
        sigma_flow=4.0,
        num_iter=3,
        model="constant",
        mu=None,  # if mu is zero, there will be singularity error in linalg error. Use None and let it be computed automatically
    )

    d = None  # optical flow field

    gaussion_pyramids = gen_gaussian_pyramids([img1, img2], n_pyr=n_pyr)
    for pyr1, pyr2 in gaussion_pyramids:
        if d is not None:
            # TODO: account for shapes not quite matching
            d = skimage.transform.pyramid_expand(d, channel_axis=-1)
            d = d[: pyr1.shape[0], : pyr2.shape[1]] * 2

        d = flow_iterative(pyr1, pyr2, d=d, **opts)

    return d

In [None]:
import cv2

cap = cv2.VideoCapture(cv2.samples.findFile("input/snatch.mp4"))  # 25 fps
ret, frame1 = cap.read()
prev = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
# prev: (1080, 1920) uint8 0 255
hsv = np.zeros_like(frame1)
hsv[..., 1] = 255

fourcc = cv2.VideoWriter_fourcc(*"mp4v")  # Codec
print("cap.get(cv2.CAP_PROP_FPS)", cap.get(cv2.CAP_PROP_FPS))
out = cv2.VideoWriter(
    "./output/snatch_optical_flow.mp4",
    fourcc,
    fps=cap.get(cv2.CAP_PROP_FPS),
    frameSize=(frame1.shape[1], frame1.shape[0]),
)

stop = 1
while 1:
    ret, frame2 = cap.read()
    if not ret or stop == 0:
        print("No frames grabbed!")
        break
    next = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)
    # next: (1080, 1920) uint8 0 255

    flow = calc_optical_flow_farneback(prev, next)
    # flow: (1080, 1920, 2) float64

    bgr = my_utils.flow_to_color(flow, hsv)

    # out.write(bgr)
    cv2.imwrite("./output_farneback.png", bgr)

    # cv2.imshow("frame2", bgr)
    # k = cv2.waitKey(30) & 0xFF
    # if k == 27:
    #     break
    # elif k == ord("s"):
    #     cv2.imwrite("opticalfb.png", frame2)
    #     cv2.imwrite("opticalhsv.png", bgr)

    prev = next
    stop -= 1
    print("stop: ", stop)

cap.release()
out.release()
cv2.destroyAllWindows()