### Logistic regression models of $C_t$ given $\textbf{X}$.


In [81]:
from experiments.simulation import forward_simulation
from sklearn.linear_model import LogisticRegression
from hmm.types import FloatArray

import numpy as np

Simple function for getting simulated data as sample pairs $((C_1, ..., C_T), \textbf{X})$

In [82]:
def simulated_c_and_x_data(
    n: int,
    t: int,
    amount_of_data: int
) -> tuple[list[list[int]], list[FloatArray]]:
    # Datas.
    cs, _, xs = forward_simulation(n, t)

    for _ in range(1, amount_of_data):
        c, _, x = forward_simulation(n, t)

        cs += c
        xs = np.vstack([xs, x])

    return cs, xs

Fit LR model.

In [83]:
def fit_logistic_regression_model(n: int, t: int, amount_of_data: int, print_stuff: bool = True, **kwargs) -> tuple[LogisticRegression, float]:
    c, x = simulated_c_and_x_data(
        n=n,
        t=t,
        amount_of_data=amount_of_data
    )

    clf = LogisticRegression(**kwargs).fit(x, c)
    accuracy = clf.score(x, c)

    if print_stuff:
        print(f"Training accuracy of model (t={t}, n={n}, amount_of_data={amount_of_data}):", accuracy)

    return clf, accuracy

We train a few models. Specifically 10.

In [None]:
models = [fit_logistic_regression_model(n=8, t=100, amount_of_data=1000, print_stuff=False) for _ in range(10)]

What might the mean accuracy be?

In [91]:
mean_train_accuarcy: float = sum(m[1] for m in models) / 10

How about test data?

In [102]:
c, x = simulated_c_and_x_data(n=8, t=100, amount_of_data=1000)
mean_test_accuracy = sum(model.score(x, c) for model, _ in models) / len(models)

In [103]:
print("Mean accuracy, training data:", mean_train_accuarcy)
print("Mean accuracy, test data:", mean_test_accuracy)

Mean accuracy, training data: 0.956446
Mean accuracy, test data: 0.956095
