# About
Estimation of distribution of dynamical and diffusional parameters of Ito's 
processes. This approach don't imply any functional representation of parameters
as in most of the know models.

### Brief overview

$S(t) = S_t$ is called an Ito process if it satisfies next equality:
$$ 
dS_t = a(t, S_t) \cdot dt + b(t, S_t) \cdot dW 
$$ (Ito_proc)


We can create estimate $a(t, S_t)$ and $b(t, S_t)$ in two ways:
1. By their distribution
2. Point-to-point


### 1. Estimation of coefficients distribution

Let $S(t_i) = S_i, \quad a(t_i, S_i) = a_i, \quad b(t_i, S_i) = b_i$ and
$ i \in \{1, ..., T\}$. Then
$$
\Delta S = S_i - S_{i-1} \approx a(t_i, S_i) \Delta t + b(t_i, S_i) \Delta W 
$$ (my_label)

$$
\Rightarrow
\mathbb{P}(\Delta S < x) \approx \mathbb{P}(\Delta W < \frac{x - a_i \Delta t}{b_i}),
\quad \text{where } \Delta W \sim \mathcal{N}(0,\Delta t)\,
$$

Here $\Delta t$ denotes time discretization step.

$\frac{x - a_i \Delta t}{b_i} = \xi$ is a r.v. 
Therefore $\mathbb{P}(\Delta W < \xi) \equiv \Phi(\frac{\xi - 0}{\Delta t})$


In [None]:
from finito.simulator import generateGeneralWiener
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate process
dummyS = generateGeneralWiener(
    a=3, b=0.4, dt=np.timedelta64(1, "ms"), T=np.timedelta64(10, "s")
)
print("Shape of data:", dummyS.shape)

#### 1.1. EM-algorithm

In [None]:
# Take differences
deltaS = np.diff(dummyS)

# Take a look on result
f, ax = plt.subplots(3, 1, figsize=(10, 10))
ax[0].plot(dummyS)
ax[0].set_title("Original process")
ax[1].plot(deltaS)
ax[1].set_title("Differences")
sns.histplot(deltaS, kde=True, ax=ax[2])
ax[2].set_title("Distribution of differences")

plt.show()

In [None]:
from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(
    n_components=1,
)

gmm.fit(deltaS.reshape(-1, 1))

print(
    "Model parameters after fitting:",
    f"Means: {gmm.means_[0]}",
    f"Covariances: {gmm.covariances_.flatten()}",
    sep="\n",
)
__time_delta = np.timedelta64(1, "s") / np.timedelta64(1, "ms")
print(
    "As we can see it differs from initial a and b parameters on time delta:",
    __time_delta,
)
print(
    f"Means: {gmm.means_ * __time_delta}",
    f"Covariances: {gmm.covariances_ * __time_delta}",
    f"Standard deviation: {np.sqrt(gmm.covariances_ * __time_delta)}",
    sep="\n",
)

#### 1.2. Minimization problem on distribution

Discrete approximation $F_{\Delta S, T}(x)$ of the continuous distribution 
$F_{dS}(x)$ can be estimated in two ways: theoretical and empirical. 

On one hand:
$$
F^{(theor)}_{\Delta S, T}(x) = \mathbb{P}(\Delta S < x) \approx 
\sum_{k=1}^{K}{p_k \cdot \Phi \left( \frac{x - a_k}{b_k} \right)} 
\xrightarrow{K \rightarrow \infty} F_{dS}
$$

But on the other we can estimate empirical distribution of $\Delta S$ as:

$$
F^{(emp)}_{\Delta S, T}(x) = 
\frac{1}{T} \cdot \sum_{j=1}^{T} \mathbb{I} \left(\Delta S < x\right)
\xrightarrow{\Delta t \rightarrow 0} F_{dS}
$$

For evaluation of parameters $V_k = \{a_k, b_k, p_k\}$ we can set an 
minimization problem on distance (in some metric $\rho(f,g)$) between
theoretical and empirical distributions:

$$
\min_{\{V_k\}_{k=1}^{K}} \rho \left(F^{(emp)}_{\Delta S, T}, F^{(theor)}_{\Delta S, T}\right)
$$

1.2.1. Empirical distribution

In [None]:
# TODO: Optimize cycle. Currently O(M*N), can be O(N), where M is len(x), N is len(data)
def empiricalDistribution(data, x):
    res = []
    nSamples = len(data)
    for xi in x:
        res.append(np.sum(data < xi) / nSamples)
    return np.array(res)

In [None]:
import scipy.stats as stats

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].set_title("Binomial distribution")
ax[1].set_title("Normal distribution")
nSamples = 15
binomRV = stats.binom(n=10, p=0.5)
X = np.arange(-5, nSamples)
sns.pointplot(x=X, y=binomRV.cdf(X), ax=ax[0], color="blue")

binomSamples = binomRV.rvs(size=nSamples)
sns.pointplot(x=X, y=empiricalDistribution(binomSamples, X), ax=ax[0], color="orange")

nSamples = 20
normRV = stats.norm(loc=0, scale=1)
X = np.linspace(-3, 3, nSamples)
sns.pointplot(x=X, y=normRV.cdf(X), ax=ax[1], color="blue")

normSamples = normRV.rvs(size=nSamples)
sns.pointplot(x=X, y=empiricalDistribution(normSamples, X), ax=ax[1], color="orange")
ax[1].tick_params(axis="x", which="both", labelsize=0)