# Logistic Regression Metrics Example
This example uses sklean's built-in `breast_cancer` dataset to create a binary logstic regression.

Metrics included in this example:
- Accuracy
- AUC-ROC
- Sensitivity (recall)
- Specificity
- Negative predictive value (NPV)
- Positive predictive value (PPV)

Metrics NOT included in this example:
- Neutrophil to Lymphocyte Ratio (NLR): the current dataset is not applicable.
- Platelet to Lymphocyte Ratio (PLR): the current dataset is not applicable.
- Confidence Interval (CI): statsmodel needs to be used for outputing the CIs.

In [65]:
import sklearn.linear_model
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score
import warnings
warnings.filterwarnings('ignore')

In [66]:
logisticRegr = None
X_train = None
y_train = None
X_test = None
y_test = None

# load dataset
print('Loading breast_cancer dataset...')
X, y = load_breast_cancer(return_X_y = True)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the numerical features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
print(f'Training data shape: {X_train_scaled.shape}')
print(f'Training label shape: {y_train.shape}')
print(f'Test data shape: {X_test_scaled.shape}')
print(f'Test label shape: {y_test.shape}')

# Initialize Logistic regression model
logisticRegr = sklearn.linear_model.LogisticRegression(
    penalty="l2",
    max_iter=1,  # local epoch
    warm_start=True,  # prevent refreshing weights when fitting
)

Loading breast_cancer dataset...
Training data shape: (455, 30)
Training label shape: (455,)
Test data shape: (114, 30)
Test label shape: (114,)


In [67]:
for i in range(1):
    print(f'Starting training for round [{i}]...')
    logisticRegr.fit(X_train_scaled, y_train)
    score = logisticRegr.score(X_test_scaled, y_test)
    print(f'Training complete. Model score: {score}')

Starting training for round [0]...
Training complete. Model score: 0.9649122807017544


In [68]:
# Accuracy metric
y_predict = logisticRegr.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_predict)
print(f'Accuracy: {accuracy}')
classification_report = classification_report(y_test, y_predict)
print(f'classifcation report : \n  {classification_report}')

Accuracy: 0.9649122807017544
classifcation report : 
                precision    recall  f1-score   support

           0       0.95      0.95      0.95        43
           1       0.97      0.97      0.97        71

    accuracy                           0.96       114
   macro avg       0.96      0.96      0.96       114
weighted avg       0.96      0.96      0.96       114



In [69]:
# Documentation: https://scikit-learn.org/stable/modules/model_evaluation.html
# ROC-AUC
auc = roc_auc_score(y_test, y_predict)
print(f'AUC: {auc}')

AUC: 0.9626596790042582


In [70]:
# Use confusion matrix to calculate the metrics
tn, fp, fn, tp = confusion_matrix(y_test, y_predict).ravel()

accuracy = (tp + tn) / (tn + fp + fn + tp)
print(f'Accuracy: {accuracy}')

sensitivity = tp / (tp + fn)
print(f'Sensitivity: {sensitivity}')

specificity = tn / (tn + fp)
print(f'Specificity: {specificity}')

npv = tn / (tn + fn)
print(f'NPV: {npv}')

ppv = tp / (tp + fp)
print(f'PPV: {ppv}')

Accuracy: 0.9649122807017544
Sensitivity: 0.971830985915493
Specificity: 0.9534883720930233
NPV: 0.9534883720930233
PPV: 0.971830985915493


In [None]:
# CI needs to use statsmodels for the Logistic Regression model. Skipping it for now.