# Modelling

In this notebook, we build a predictive model. For assessing the quality of our model, we select the true positive rate. This choosing is warranted by the following substantial arguments.

## Setup

The following cell enumerates all modules requisite for the modelling tasks.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix

from ipynb_utils.cfg import CFG
from ipynb_utils import dump_df, load_data

Let us reload the resulting data from the preceding notebook.

In [None]:
df = load_data("1--df_retrieved.pkl")

Firstly, we shall split the set of data points into feature and training as well as training and test parts, respectively.

In [None]:
X_cols_blacklist = [
    "target",
]
X_cols = [col for col in df.columns if col not in X_cols_blacklist]

y_cols_whitelist = [
    "target",
]
y_cols = [col for col in df.columns if col in y_cols_whitelist]

X = df[X_cols].values
y = df[y_cols].values.ravel()

X_0, X_1, y_0, y_1 = train_test_split(
    X, y, test_size=0.2, random_state=CFG["RSEED"],
)

The subsequent cell block creates a model fitted to the training set.

In [None]:
model = LogisticRegression()
model.fit(X_0, y_0)

As the final step, we shall evaluate the performance of the model.

In [None]:
z_1 = model.predict(X_1)

accuracy = accuracy_score(y_1, z_1)
precision = precision_score(y_1, z_1)
recall = recall_score(y_1, z_1)
f1 = f1_score(y_1, z_1)
cm = confusion_matrix(y_1, z_1)

print("Evaluation Metrics:")
print(f"  Accuracy:  {accuracy:.2f}")
print(f"  Precision: {precision:.2f}")
print(f"  Recall:    {recall:.2f}")
print(f"  f1 Score:  {f1:.2f}")
print("")
print("Confusion Matrix:")
print(cm)