# Confusion Matrices and Expected Value

This notebook demonstrates how to use `ConfusionMatrix` in CUAnalytics to:

- inspect confusion matrix layouts (`normal` and `inverted`)
- retrieve confusion-matrix metrics
- compute expected value from a cost/benefit matrix


In [1]:
import pandas as pd
import cuanalytics as ca


## 1) Build a ConfusionMatrix from labels

In [2]:
y_true = ['No', 'No', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes']
y_pred = ['No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes']

cm = ca.ConfusionMatrix(y_true, y_pred, labels=['No', 'Yes'])
cm.to_dataframe(display='normal')

Unnamed: 0,Pred No,Pred Yes
Actual No,4,1
Actual Yes,2,3


In [3]:
# Inverted display swaps the table orientation for presentation only
cm.to_dataframe(display='inverted')

Unnamed: 0,Actual Yes,Actual No
Pred Yes,3,1
Pred No,2,4


## 1b) Print a readable summary

Use `summary()` for a console-style confusion matrix report.

In [12]:
cm.summary()


Confusion Matrix and Statistics

Confusion Matrix:
          Actual Yes  Actual No
Pred Yes           3          1
Pred No            2          4

Overall Statistics:
Accuracy            0.7000
Kappa               0.4000
Macro Precision     0.7083
Macro Recall        0.7000
Macro Specificity   0.7000
Macro F1            0.6970
Precision           0.7500
Recall              0.6000
Sensitivity         0.6000
Specificity         0.8000
FPR                 0.2000
FNR                 0.4000
F-measure           0.6667

Binary Counts (Positive = 'Yes'):
TP: 3  FP: 1  FN: 2  TN: 4

Statistics by Class:
     precision  recall  sensitivity  specificity     f1
No      0.6667  0.8000       0.8000       0.6000 0.7273
Yes     0.7500  0.6000       0.6000       0.8000 0.6667


Unnamed: 0,Actual Yes,Actual No
Pred Yes,3,1
Pred No,2,4


## 2) Retrieve confusion matrix metrics

In [5]:
metrics = cm.get_metrics()

summary_fields = {
    'accuracy': metrics['accuracy'],
    'kappa': metrics['kappa'],
    'precision': metrics['precision'],
    'recall': metrics['recall'],
    'sensitivity': metrics['sensitivity'],
    'specificity': metrics['specificity'],
    'tpr': metrics['tpr'],
    'fpr': metrics['fpr'],
    'f_measure': metrics['f_measure'],
    'binary_counts': metrics['binary_counts']
}
pd.Series(summary_fields)

accuracy                                          0.7
kappa                                             0.4
precision                                        0.75
recall                                            0.6
sensitivity                                       0.6
specificity                                       0.8
tpr                                               0.6
fpr                                               0.2
f_measure                                    0.666667
binary_counts    {'tp': 3, 'fp': 1, 'fn': 2, 'tn': 4}
dtype: object

## 3) Compute expected value

`get_expected_value(costs)` uses the same layout as the default confusion-matrix display (`inverted`).

For binary labels `['No', 'Yes']`, the expected layout is:

- rows = predicted class
- columns = actual class
- top-left = TP
- top-right = FP
- bottom-left = FN
- bottom-right = TN


In [6]:
# Costs follow the default inverted layout:
# [[TP, FP],
#  [FN, TN]]
costs = [
    [8.0, -5.0],
    [-20.0, 1.0]
]

ev = cm.get_expected_value(costs)
print(f'Expected value per observation: {ev:.3f}')

Expected value per observation: -1.700


## 4) Use ConfusionMatrix from a fitted model

In [7]:
df = ca.load_breast_cancer_data()
train_df, test_df = ca.split_data(df, test_size=0.25, random_state=42)

model = ca.fit_logit(train_df, formula='diagnosis ~ .')
test_cm = model.get_confusion_matrix(test_df)

# Display with normal orientation
test_cm.to_dataframe(display='normal')


Logistic Regression fitted successfully!
  Classes: ['B', 'M']
  Features: 30
  Training samples: 426
  C parameter: 1.0
  Solver: lbfgs


Unnamed: 0,Pred B,Pred M
Actual B,87,2
Actual M,3,51


In [8]:
# Display with inverted orientation
test_cm.to_dataframe(display='inverted')

Unnamed: 0,Actual M,Actual B
Pred M,51,2
Pred B,3,87


In [9]:
test_metrics = test_cm.get_metrics()
pd.Series({
    'accuracy': test_metrics['accuracy'],
    'kappa': test_metrics['kappa'],
    'precision': test_metrics.get('precision', float('nan')),
    'recall': test_metrics.get('recall', float('nan')),
    'specificity': test_metrics.get('specificity', float('nan')),
    'f_measure': test_metrics.get('f_measure', float('nan')),
})

accuracy       0.965035
kappa          0.925342
precision      0.962264
recall         0.944444
specificity    0.977528
f_measure      0.953271
dtype: float64

In [10]:
# Example expected-value setup for this model's confusion matrix.
# Adjust these numbers to reflect your business context.
costs_model = [
    [1.0, -10.0],
    [-25.0, 12.0]
]

ev_model = test_cm.get_expected_value(costs_model)
print(f'Model expected value per observation: {ev_model:.3f}')

Model expected value per observation: 6.993


In [19]:
df = ca.load_breast_cancer_data()
print(df.head())
train_df, test_df = ca.split_data(df, test_size=0.25, random_state=42)
model = ca.fit_logit(train_df, formula='diagnosis ~ .')
model.summary()
test_cm = model.get_confusion_matrix()
test_cm.summary()

   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst texture  worst perimeter  worst area  \
0             

Unnamed: 0,Actual M,Actual B
Pred M,148,6
Pred B,10,262
