### 1.2. Minimization problem on distribution

Discrete approximation $F_{\Delta S, T}(x)$ of the continuous distribution 
$F_{dS}(x)$ can be estimated in two ways: theoretical and empirical. 

On one hand:
$$
F^{(theor)}_{\Delta S, T}(s) =
\sum_{k=1}^{K}{p_k \cdot \Phi \left( \frac{s - y_k \Delta t}{z_k \Delta t} \right)} 
\xrightarrow{K \rightarrow \infty , \,\Delta t \rightarrow 0} F_{dS}
$$

But on the other we can estimate empirical distribution of $\Delta S$ as:

$$
F^{(emp)}_{\Delta S, T}(s) = 
\frac{1}{T} \cdot \sum_{j=1}^{T} \mathbb{I} \left(\Delta S < s\right)
\xrightarrow{\Delta t \rightarrow 0} F_{dS}
$$

For evaluation of parameters $V_k = \{y_k, z_k, p_k\}$ we can set an 
minimization problem on distance (in some metric $\rho(f,g)$) between
theoretical and empirical distributions:


$$ 
\begin{equation}
\left\{ \begin{aligned} 

    \min_{\{V_k\}_{k=1}^{K}} \rho \left(F^{(emp)}_{\Delta S, T}, F^{(theor)}_{\Delta S, T}\right) \\
    p_k \geq 0 \; \wedge \; \sum_{k=1}^{K}p_k = 1 \\
    0 \leq z_k \in B \subset \mathbb{R^+} \\
    y_k \in A \subset \mathbb{R}



\end{aligned} \right.
\end{equation}
$$


## Neural network

We have a problem of approximation of known function - empirical distribution.

<!-- Let's denote observed and calculated values as next:
$$
\mathbf{x} = \left[x_1, x_2, \dots, x_N\right] \text{ - input values,} \\
\mathbf{a} = \Delta t \cdot \left[y_1, y_2, \dots, y_K\right] \text{- expectation values,}\\
\mathbf{b} = \Delta t \cdot \left[z_1, z_2, \dots, z_K\right] \text{- standard deviation values,}\\
\mathbf{p} = \left[p_1, p_2, \dots, p_K\right] \text{- joint probability density values}\\
\mathbf{y} = F^{(emp)}_{\Delta S, T}(x) = \bigl\{F^{(emp)}_{\Delta S, T} = f\bigr\} = \left[f(x_1), f(x_2), \dots, f(x_M)\right] \text{- targets} \\
\^{y} = p \cdot \Phi{\left( \right)} \\ 
$$ -->

$$
\^{y} = p_1 \cdot \Phi{\left( c_1\right)} + \dots + p_K \cdot \Phi{\left( c_K\right)} \\
c_n = w_n \cdot x + b_n
$$

<img src="../src/docs/static/nn_gaussian_mixture.svg" align="center" width="700"/>


\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}



https://docs.pytorch.org/tutorials/intermediate/parametrizations.html
https://medium.com/@minh.hoque/demystifying-neural-network-normalization-techniques-4a21d35b14f8
https://towardsdatascience.com/deep-dive-into-softmax-regression-62deea103cb8/
https://www.geeksforgeeks.org/deep-learning/what-is-layer-normalization/
https://medium.com/@aya_hesham/function-approximation-with-deep-learning-a-practical-guide-with-code-examples-b817a59755af


```{python}
import torch
import torch.nn as nn

class ConstrainedLinear(nn.Module):
    def __init__(self, in_features, out_features, eps=1e-7):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.eps = eps
        # Define raw weights as a learnable parameter
        self.weight_raw = nn.Parameter(torch.randn(out_features, in_features))
        self.bias = nn.Parameter(torch.randn(out_features))

    def forward(self, x):
        # Normalize the raw weights to sum to 1 along a specific dimension
        # The .clamp(min=self.eps) prevents division by zero if the sum is zero
        normalized_weight = self.weight_raw / self.weight_raw.sum(dim=1, keepdim=True).clamp(min=self.eps)
        return torch.nn.functional.linear(x, normalized_weight, self.bias)

# Example usage
model = ConstrainedLinear(in_features=10, out_features=5)
input_tensor = torch.randn(1, 10)
output = model(input_tensor)
```

In [None]:
import tensorflow as tf
from scipy import stats


c = tf.Tensor([1.0])
c * stats.norm.cdf(c)

In [None]:
import tensorflow as tf
from scipy import stats
import pandas as pd
import numpy as np
import plotly.express as px

from finito.gauss import GaussianMixture

from tensorflow.keras import backend


class SumToOne(tf.keras.constraints.Constraint):
    def __call__(self, w):
        # Calculate the sum of the absolute values of the weights
        # Using absolute values ensures positivity before summing
        sum_w = backend.sum(backend.abs(w))
        # If the sum is zero, prevent division by zero and return original weights
        # Otherwise, normalize the weights by dividing by their sum
        return w / (sum_w + backend.epsilon())

    def get_config(self):
        return super().get_config()


window_size = 100
K = 3


def norm_cdf(x):
    # Φ(x) = 0.5 * (1 + erf(x / sqrt(2)))
    return 0.5 * (1.0 + tf.math.erf(x / tf.sqrt(2.0)))


model = tf.keras.models.Sequential(
    [
        tf.keras.Input(shape=(1,)),
        #   tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(K, activation=norm_cdf, use_bias=True),
        # tf.keras.layers.Dense(K, kernel_constraint=SumToOne(), use_bias=False),
        # tf.keras.layers.Dense(1, kernel_constraint=SumToOne(), use_bias=False),
        tf.keras.layers.Dense(1),
        #   tf.keras.layers.Dropout(0.2),
    ]
)

model.summary()

In [None]:
a = [-1, 3, 5]
b = [1, 0.5, 0.2]
p = [0.2, 0.5, 0.3]

gm = GaussianMixture(a=a, b=b, p=p)
samples = gm.sample(1000)
x_train = np.sort(samples)
# x_train = np.linspace(-3, 7, 100)
y_train = gm.cdf(x_train)
# px.line(x=x_train, y=y_train)

In [None]:
model.compile(optimizer="adam", loss="mse")
model.fit(x_train, y_train, epochs=100)  # Train for a sufficient number of epochs

In [None]:
# Print info about weights
model.weights

In [None]:
x_test = np.sort(gm.sample(1000))
y_test = gm.cdf(x_test)
x = np.expand_dims(x_test, axis=1)

predictions = model.predict(x)

In [None]:
df = pd.DataFrame({"x": x_test, "true": y_test, "nn": predictions.flatten()})

In [None]:
px.line(df, x=x_test, y=["true", "nn"])