# Comparing two regression models using `stambo`

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Oulu-IMEDS/stambo/main?labpath=notebooks%2FRegression.ipynb)

V1.1.3: © Aleksei Tiulpin, PhD, 2025

This notebook shows an end-to-end example on how one can take a dataset, train two machine learning models, and conduct a statistical test to assess whether the two models are different. 

## Import of necessary libraries

In [1]:
import stambo

from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

SEED = 2025

stambo.__version__

'0.1.4'

## Loading the diabetes dataset and creating train-test split

In [2]:
X, y = load_diabetes(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.5, random_state=SEED)

scaler = StandardScaler()
scaler.fit(Xtr)

Xtr = scaler.transform(Xtr)
Xte = scaler.transform(Xte)

## Training the models

We train a kNN and a logistic regression. Here, we can see that the logistic regression outperformes the kNN. 

In [3]:
model = KNeighborsRegressor(n_neighbors=3)
model.fit(Xtr, ytr)
preds_knn = model.predict(Xte)

model = LinearRegression()
model.fit(Xtr, ytr)
preds_lr = model.predict(Xte)


mae_knn, mae_lr = mean_absolute_error(yte, preds_knn), mean_absolute_error(yte, preds_lr)
print(f"kNN MAE: {mae_knn:.4f} / LR MAE: {mae_lr:.4f}")

kNN MAE: 49.8643 / LR MAE: 46.0043


## Statistical testing

As stated in the documentation, the testing routine returns the `dict` of `tuple`. The keys in the dict are the metric tags, and the values are tuples that store the data in the following format:

* p-value ($H_0: model_1 = model_2$)
* Empirical value (model 1)
* CI low (model 1)
* CI high (model 1)
* Empirical value (model 2)
* CI low (model 2)
* CI high (model 2)

If you launch the code in Binder, decrease the number of bootstrap iterations (`10000` by default).

**Important to note:** Regression metrics are *errors*, which means that the lower value is better (contrary to classification metrics). Therefore, we actually ask a question whether the kNN has a larger MAE than the linear regression. 

So, model 1 is here is actually the *improved* model (linear regression).

In [4]:
testing_result = stambo.compare_models(yte, preds_lr, preds_knn, metrics=("MAE", "MSE"), seed=SEED)

Bootstrapping: 100%|██████████| 10000/10000 [00:05<00:00, 1974.59it/s]


If we want to visualize the testing results, they are available in a dict in the format we have described above:

In [5]:
testing_result

{'MAE': array([ 0.06019398,  3.85996728, -0.13865207,  7.89012645, 46.00428611,
        41.71098241, 50.5036501 , 49.86425339, 44.71489442, 55.10746606]),
 'MSE': array([5.59944006e-03, 8.02532883e+02, 2.21743600e+02, 1.41137391e+03,
        3.22590754e+03, 2.68874421e+03, 3.81527655e+03, 4.02844042e+03,
        3.25164827e+03, 4.87204549e+03])}

Most commonly, we though want to visualize them in a report, paper, or a presentation. For that, we can use a function `to_latex`, and get a cut-and-paste `tabular`. To use it in a LaTeX document, one needs to not forget to import booktabs

In [6]:
print(stambo.to_latex(testing_result, m2_name="kNN", m1_name="LR"))

% \usepackage{booktabs} <-- do not forget to have this imported. 
\begin{tabular}{lll} \\ 
\toprule 
\textbf{Model} & \textbf{MAE} & \textbf{MSE} \\ 
\midrule 
LR & $46.00$ [$41.71$-$50.50$] & $3225.91$ [$2688.74$-$3815.28$] \\ 
kNN & $49.86$ [$44.71$-$55.11$] & $4028.44$ [$3251.65$-$4872.05$] \\ 
\midrule
Effect size & $3.86$ [$-0.14$-$7.89]$ & $802.53$ [$221.74$-$1411.37]$ \\ 
\midrule
$p$-value & $0.06$ & $0.01$ \\ 
\bottomrule
\end{tabular}
