# **Measuring Efficacy in classification**


This notebook is a tutorial on auditing efficacy within a binary classification task. We will use the holisticai library **efficacy metrics** sections.
The sections are organised as follows :

1. Load the data : we load the law school dataset as a pandas DataFrame
2. Train a Model : we train a simple logistic regression model (sklearn)
3. Measure Efficacy : we compute a few efficacy metrics.

## **1. Load the data**

In [1]:
# Imports
import numpy as np
import pandas as pd
import sys
sys.path.append('../../')

We host a few example datasets on the holisticai library for quick loading and experimentation. Here we load and use the Law School dataset. The goal of this dataset is the prediction of the binary attribute 'bar' (whether a student passes the law school bar). The protected attributes are race and gender. We pay special attention to race in this case, because preliminary exploration hints there is strong inequality on that sensitive attribute.

In [2]:
# Get data
from holisticai.datasets import load_law_school
df = load_law_school()['frame']
df

  warn(


Unnamed: 0,age,decile1,decile3,fam_inc,lsat,ugpa,gender,race1,cluster,fulltime,bar,ugpagt3
0,62.0,10.0,10.0,5.0,44.0,3.5,female,white,1,1,TRUE,1.0
1,62.0,5.0,4.0,4.0,29.0,3.5,female,white,2,1,TRUE,1.0
2,61.0,8.0,7.0,3.0,37.0,3.4,male,white,1,1,TRUE,1.0
3,60.0,8.0,7.0,4.0,43.0,3.3,female,white,1,1,TRUE,1.0
4,57.0,3.0,2.0,4.0,41.0,3.3,female,white,4,1,TRUE,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...
20795,60.0,9.0,8.0,4.0,42.0,3.0,male,white,5,1,TRUE,0.0
20796,61.0,4.0,9.0,4.0,29.5,3.5,male,white,3,1,TRUE,1.0
20797,62.0,1.0,1.0,3.0,33.0,3.1,male,non-white,3,1,FALSE,1.0
20798,65.0,4.0,5.0,3.0,32.0,3.0,male,white,3,2,TRUE,0.0


## **2. Train a model**

encode our dataframe categorical columns

In [6]:
from sklearn.model_selection import train_test_split

# simple preprocessing before training.
df_enc = df.copy()
df_enc['bar'] = df_enc['bar'].replace({'FALSE':0, 'TRUE':1})

# split features and target, then train test split
X = df_enc.drop(columns=['bar', 'ugpagt3'])
y = df_enc['bar']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Here we train a Logistic Regression classifier.

In [8]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# train a model, do not forget to standard scale data
scaler = StandardScaler()
X_train_t = scaler.fit_transform(X_train.drop(columns=['race1', 'gender']))
LR = LogisticRegression(random_state=42, max_iter=500)
LR.fit(X_train_t, y_train)
X_test_t = scaler.transform(X_test.drop(columns=['race1', 'gender']))
y_pred = LR.predict(X_test_t)
y_proba = LR.predict_proba(X_test_t)

## **3. Measure Efficacy**

In [9]:
from holisticai.efficacy.metrics import classification_efficacy_metrics
classification_efficacy_metrics(y_pred, y_test, y_proba)

Unnamed: 0_level_0,Value,Reference
Metric,Unnamed: 1_level_1,Unnamed: 2_level_1
Accuracy,0.902724,1
Balanced accuracy,0.775417,1
Precision,0.984372,1
Recall,0.913333,1
F1-Score,0.947523,1
AUC,0.775417,1
Log Loss,3.506169,0
