1. **Definitions**:
    - Let $ s_t $ represent the image frame at time step $ t $.
    - Let $ s_{t+k} $ represent the image frame at time step $ t+k $, which is $ k $ time steps after $ s_t $.
2. **Problem Statement**:
    - **a)** Compute the geodesic distance between $ s_t $ and $ s_{t+k} $ in a suitable image embedding space.
    - **b)** Sample intermediate image frames that lie on the geodesic path between $ s_t $ and $ s_{t+k} $.

3. **Proposed Solution**:
    - **a)** Using a pretrained network's second-to-last layer as an embedding space, as embedding should meaningfully capture the content of the images due to the network's prior training on a potentially large and diverse dataset
    - **b)** Intermediate Frames:
        - i) interpolate and generate intermediate frames by gradient-based optimization w.r.t. target image in pixel space (Note: could be computationally intensive and might not always produce visually coherent transitions, especially if $ s_{\text{start}} $ and $ s_{\text{target}} $ are substantially different).
        - ii) Image morphing techniques
        - iii) Variational Autoencodes (VAE)
        - iv) Manifold Learning

4. **Mathematical Formulation**:
    - Let $ f $ be a neural network function that maps an image to a lower-dimensional embedding space, i.e., $ f: \mathcal{I} \rightarrow \mathbb{R}^d $ where $ \mathcal{I} $ is the space of images.
    - Define the geodesic distance $ d $ between  $s_t$  and $s_{t+k}$ as $ d(s_t, s_{t+k}) = \| f(s_t) - f(s_{t+k}) \| $, where $ \| \cdot \| $ denotes the Euclidean norm.
    - Intermediate frames  $s_{t+i}$  for $ i = 1, 2, \ldots, k-1 $ are generated by $ s_{t+i} = g(\theta_{t+i}) $, where $ \theta_{t+i} $ is a parameter vector obtained by interpolating between parameters representing $ s_t $ and $ s_{t+k} $ in the embedding space.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
# so that relative sizes of axes created by po.imshow and others look right
plt.rcParams['figure.dpi'] = 72
%matplotlib inline

import pyrtools as pt
import plenoptic as po
from plenoptic.tools import to_numpy
%load_ext autoreload
%autoreload 2

import torch
import torch.nn as nn
# this notebook uses torchvision, which is an optional dependency.
# if this fails, install torchvision in your plenoptic environment 
# and restart the notebook kernel.
try:
    import torchvision
except ModuleNotFoundError:
    raise ModuleNotFoundError("optional dependency torchvision not found!"
                              " please install it in your plenoptic environment "
                              "and restart the notebook kernel")
import torchvision.transforms as transforms
from torchvision import models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype  = torch.float32
torch.__version__

'2.2.2'