# Understand the model

## General idea

The input data is given and it is expected to get the output data. The input data is video data which is high-demensional time series while the output is new video data generated from learned dynamic texture.

The model consists of two essential stages:
1. Dynamic texture modeling
2. Dynamic texture syntesis

## Dynamic texture modeling

Dynamical texture modeling consists of two steps:
- Dimensionality reduction
- Dinamic texture learning

![title](data/graph.jpg "ShowMyImage")

It can be expressed as:
$$x_{t+1} = f(x_t,A) + n_{x,t}$$
$$y_t = g(x_t,B) + n_{y,t}$$

$y_t$ - column vector unfolded from the frame at time t, $y_t \in R^D$, $D$ - large

$xt$ - latent variable which affects dynamic behaviour, $x_t \in R^Q$, $Q << D$

$g()$ - dimensionality reduction function $R^D -> R^Q$

$f()$ - dynamic modelling function $R^Q -> R^Q$

$n_{x,t}, n_{y,t}$ - represent the noise

$A,B$ - input parameters for functions $f()$ and $g()$

### Dimensionality reduction

Initial dimensionality is N x M where N x M - resolution of an input video file. For instance, it can be 160 x 120 pixels or 19200 values of intensity (in black and white) for every video frame. In such a case curse of dimensionality problem will be faced (data becomes too sparce and distance functions become not accurate). To prevent it dimensionality reduction process must be aplied.

As an reduction algorithm it can be linear aproach (like PCA) or nonlinear. Linear algorithms cannot capture complex dynamic textures. Some nonlinear algorithms produce irreversible mapping and/or different coordinate systems. It is essential to find an algorithm which is free of these weak points. **In this work the reduction function infers by using Gaussian process.**

It is assumed that the dynamic texture sequence $y_i$ is a multivariate Gaussian process indexed by $x_i$ : $P(Y|X,\theta) = f(Y, K_Y, D, N)$, where $Y$ - observed dynamic texture sequence, $X$ - latent variable, $K_Y$ - kernel matrix of latent mapping g(), $D$ - dimensionality, $N$ - number of examples (frames).

To achive nonlinear mapping special covariance function is used (squared exponential):
$$K_Y = k_Y(x_i,x_j) = \theta_1 \exp(-\frac{\theta_2}{2} (x_i - x_j)(x_i - x_j)^T) + \theta_3 \delta_{x_i,x_j}$$

### Dinamic texture learning

The dinamic texture algorithm cannot be linear, because most of dynamic textures are not linear. It can be switching or piecewise linear, but it doesn't work for all dynamic textures. Thus, it is essential to find a more flexible model.

**In this work dynamic texture is modeled using first-order Markov model based on Gaussian process.** The kernel function used in Gaussian process detects the dynamic behavior of the latent variables. The latent dynamic behavior changes among different types of dynamic textures. Thus, a multi-kernel dynamic model can be used: $P(X|\lambda,W) = f(x_1, X, K_X, Q, N, W)$, where $X$ - latent variable, $Q$ - dimensionality of latent variable, $N$ - number of examples (frames), $W$ - kernel functions, $K_X$ - kernel matrix of latent mapping constructed by multi-kernel function:
$$K_X = k_X(x_i,x_j) = 
\sum_{\substack{l = 1}} w_l k_l(x_i,x_j) + w_{\delta} \delta_{x_i,x_j}$$
where $i,j \in [1, N-1]$, $k_l$ - different kernel functions, $l \in [1, M]$, $M$ - number of kernel functions, $w_l$ - weights of kernel functions.

## Dynamic texture syntesis

The goal of this stage is to generate new video data using learned dynamic texture. It can be done by estimating necessary parameters (latent variable vector, observed dynamic texture vector, kernel matrix mapping hyperparameter, weights for kernel functions and different kernel parameters) and then predicting new sequence of dynamic textures.

The generative model is:
$$P(X,Y,\theta,\lambda|W) = P(Y|X,\theta)P(X|\theta,W)P(\theta)P(\lambda)$$

**In this work adopted mean-prediction method based on first-order Markov model using Gaussian prediction is used to syntesise new data.**