# Model Overview

We consider a set of $N$ vector-time series $\{X_1, \dots, X_N\}$ where each $X_n \in \mathbb{R}^{T \times D}$ contains $D$ features measured over $T$ time points. 
We assume that each data $X_n$ is generated according to a set of *discrete* dynamical states $\mathcal{S}_n=\{\mathcal{S}_{n,t}\}_{t=1}^T$ where $\mathcal{S}_{n,t}\in\{1,\cdots,S\}$ and their corresponding low-dimensional *continuous* temporal latents $Z_n = \{Z_{n,t}\in \mathbb{R}^{K}\}_{t=1}^T$ as follows:

\begin{align}
    X_{n} &\sim p_\theta(X_{n}\,|\,Z_{n}),\\
    Z_n &\sim p_\theta(Z_n\,|\,\mathcal{S}_n),\\
    \mathcal{S}_n &\sim p_\theta(\mathcal{S}),
    \label{eqn:pgen}
\end{align}

where $p_\theta(\mathcal{S})$ is a generative Markovian prior over $\mathcal{S}_n$:

\begin{equation}
    p_\theta(\mathcal{S}_{n,t}|\mathcal{S}_{n,t-1}) = \text{Cat}\left(\textit{softmax}\left(\mathbf{\Phi}_{\boldsymbol{\theta}}\Pi_{\mathcal{S}_{n,t-1}}\right)\right),
\end{equation}

where $\mathbf{\Phi}_{\boldsymbol{\theta}}$ is a state transition matrix and $\Pi_{\mathcal{S}_{n,t-1}}$ is the posterior parameter vector of $\mathcal{S}_{n,t-1}$. 

And $p_\theta(Z_n|\mathcal{S}_n)$ is a switching dynamical autoregressive prior over $Z_n$ (a.k.a. a transition model):
\begin{align}
   p_\theta(Z_{n,t}|Z_{n,t-\ell},\mathcal{S}_{n,t} = s) =
   \text{Norm}\Big(\boldsymbol{\mu}_{\boldsymbol{\theta}}^{\textbf{Z},s}(Z_{n,t-\ell}),
   \boldsymbol{\sigma}_{\boldsymbol{\theta}}^{\textbf{Z},s}(Z_{n,t-\ell})\Big),
\end{align}
where $\ell$ denotes a lag set (e.g., $\ell=\{1,2\}$ for a second-order Markov model), and state-specific $\boldsymbol{\mu}_{\boldsymbol{\theta}}^{\textbf{Z},s}(\cdot)$ and $\boldsymbol{\sigma}_{\boldsymbol{\theta}}^{\textbf{Z},s}(\cdot)$ are parameterized by MLPs.

Finally, we consider a Gaussian distribution for our emission model such that:
\begin{align}
   p_\theta(X_{n,t}|Z_{n,t}) =
   \text{Norm}\big(\boldsymbol{\mu}_{\boldsymbol{\theta}}^{\textbf{X}}(Z_{n,t}),
   \boldsymbol{\sigma}^{\textbf{X}} \text{I}\big),
\end{align}
where $\boldsymbol{\mu}_{\boldsymbol{\theta}}^{\textbf{x}}(\cdot)$ is a linear mapping (for factorization) or nonlinear mapping parameterized by a neural network.

# Documentation

**CLASS** `dsarf.DSARF(D, factor_dim, L, S, transition_dim=None,
                     VI = {'rnn_dim': None, 'combine': False, 'S': False},
                     recurrent = False, recursive_state = False, factorization = True,
                     lr = 1e-2, batch_size = 20)`
> ### Arguments:
> `D` &ndash; feature dimension of data $X_n \in \mathcal{R}^{T\times D}$.\
> `factor_dim` &ndash; dimension of continous latent $Z_n \in \mathcal{R}^{K}$.\
> `L` &ndash; a tuple containing temporal lags $\ell$.\
> `S` &ndash; number of discrete states.\
> `transition_dim` &ndash; hidden dimension of transition model network $\boldsymbol{\mu}_{\boldsymbol{\theta}}^{\textbf{Z},s}$. Default is `None` and is set to `factor_dim`.\
> `VI` &ndash; activates *amortized* variational inference (VI) for $Z_n$ using an LSTM if `VI['rnn_dim']` (hidden dimension of LSTM) is set. activates structured VI if `VI['comibne']` is set. activates amortized VI for $\mathcal{S}_n$ if `VI['S']` is set. Default is a *non-amortized* inference.\
> `recurrent` &ndash; if set, conditions discrete state $\mathcal{S}_{n,t}$ on its preceding continous latent $Z_{n,t-1}$. Default is `False`.\
> `recursive_state` &ndash; if set `True`, estimates prior and posterior of states recursively over time (slower), otherwise, parallelizes their estimations. Default is `False`.\
> `factorization` &ndash; whether to have a linear (factorization) or nonlinear emission model $\boldsymbol{\mu}_{\boldsymbol{\theta}}^{\textbf{X}}$. Default is `True`.\
> `lr` &ndash; learning rate. Default is `1e-2`.\
> `batch_size`&ndash; batch size. set it to 1 if data has varying time lenghts. Default is 20.
> ### Function Attributes:
> **CLASS** `fit(data, epoch_num= 500)`\
> This attribute fits time series training data on the DSARF model.
>> #### Arguments:
>> `data` &ndash; A list or ndarray containing $N$ data arrays of size $T\times D$ ($T$ can be of different lenghts across data).\
>> `epoch_num` &ndash; number of epochs for training.
>> #### Variables
>> `~.q_s` &ndash; tensor of size $N \times T_{\text{max}}\times S$ containing posterior of state variables.\
>> `~.q_z_mu` &ndash; tensor of size $N \times T_{\text{max}}\times \text{factor_dim}$ containing posterior mean of continuous latents.\
>> `~.q_z_sig` &ndash; tensor of size $N \times T_{\text{max}}\times \text{factor_dim}$ containing posterior log-sigma of continuous latents.
>> #### Function Attributes:
>> **def** `report_stats(data)`
>>> **Arguments**\
>>> `data` &ndash; ground truth data $X$.\
\
>>> **Returns**\
>>> `NRMSE` &ndash; a dict containing NRMSE of reconstruction and short-term prediction.
>> #### 
>> **def** `short_predict()`
>>> **Returns**\
>>> `y_pred`, `y_pred_n`, `y_pred_p` &ndash; short-term predicted data and their $\pm\sigma$ uncertainty intervals.
>> #### 
>> **def** `long_predict(steps, s=None)`
>>> **Arguments**\
>>> `steps` &ndash; number of time steps to predict ahead.\
>>> `s` &ndash; specifies a dynamical state to use for prediction (can be used to explore each state individually). Default is set to `None` and all states are employed according to the state transition model.\
\
>>> **Returns**\
>>> `y_pred`, `y_pred_n`, `y_pred_p` &ndash; long-term predicted data and their $\pm\sigma$ uncertainty intervals.
>> #### 
>> **def** `plot_predict(data, steps = None, path = './plots/')`\
>> Plots short- or long-term predicted data along their groundtruth for few data instances and dimensions.
>>> **Arguments**\
>>> `data` &ndash; ground truth data $X$.\
>>> `steps` &ndash; number of time steps to predict ahead for long-term prediction. Default is `None` and short-term prediction is plotted.\
>>> `path` &ndash; relative path to save plotted figures.
>> #### 
>> **def** `plot_states(index = None, k_smooth = None, path = './plots/')`\
>> Plots inferred states for few data instances and dimensions.
>>> **Arguments**\
>>> `index` &ndash; data index (number) for which to plot inferred states. Default is `None` and states are plotted for few data instances.\
>>> `k_smooth` &ndash; kernel size (odd number) for smoothing states over time. Default is `None` and no smoothing is applied.\
>>> `path` &ndash; relative path to save plotted figure.
> ### 
> **CLASS** `infer(data, epoch_num= 500)`\
> This attribute runs inference on test data to infer their latents values (model parameters of `dsarf.DSARF` class are kept frozen during inference).
>> #### Arguments:
>> `data` &ndash; A list or ndarray containing $N$ data arrays of size $T\times D$ ($T$ can be of different lenghts across data).\
>> `epoch_num` &ndash; number of epochs to run inference. In case of amortized VI, this argument is ignored.
>> #### Variables / Function Attributes:
>> This class supports all variables and function attributes of **CLASS** `fit(data, epoch_num= 500)` (see above).