# Credit Pipeline Usage

This notebook will present some details of the used of the pipeline.

In [1]:
from sklearn.linear_model import LogisticRegression
from lightgbm import LGBMClassifier
import credit_pipeline as cp

import sys
sys.path.append("../scripts")
import experiments

pip install 'aif360[LawSchoolGPA]'
2024-02-20 17:23:42.375718: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 17:23:42.410357: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-20 17:23:42.410373: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-20 17:23:42.411291: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-20 17:23:42.416823: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 17:23:42.418031: I tensorflow/c

In [2]:
# configuration of reproducibility
N_FOLDS = 10
FOLD = 0
SEED = 0
DATASET = "german" # ["german", "taiwan", "homecredit"]

In [3]:
for fold in range(N_FOLDS):
    if fold == FOLD:
        X_train, Y_train, X_val, Y_val, X_test, Y_test = experiments.load_split(DATASET, fold, SEED)
        break

A_test = X_test["Gender"].apply(lambda x : 1 if x == "Female" else 0)

One of the main functionalities of the package is the pipeline for data pre-processing. We can call the Pipeline with an head classifier or not.

In [4]:
pipeline = cp.training.create_pipeline(X_train, Y_train)
pipeline.fit(X_train, Y_train);
pipeline[:-1]

With a selected classifier head, we can perform hyper-param optimization using packages functionalities.

In [5]:
experiments.MODEL_CLASS_LIST

[sklearn.linear_model._logistic.LogisticRegression,
 credit_pipeline.models.MLPClassifier,
 sklearn.ensemble._forest.RandomForestClassifier,
 lightgbm.sklearn.LGBMClassifier]

In [6]:
experiments.FAIRNESS_CLASS_LIST

['Reweighing',
 'DemographicParityClassifier',
 'EqualOpportunityClassifier',
 'FairGBMClassifier',
 'ThresholdOptimizer']

In [7]:
study, model = cp.training.optimize_model_fast(
    model_class = LogisticRegression,
    param_space = "suggest",
    X_train = X_train,
    y_train = Y_train,
    X_val = X_val,
    y_val = Y_val,
    n_trials = 100
)

  0%|          | 0/100 [00:00<?, ?it/s]

In [8]:
model_dict = {}

In [9]:
Y_pred = model.predict_proba(X_train)[:, 1]
threshold = cp.training.ks_threshold(Y_train, Y_pred)
model_dict["LogisticRegression"] = [model, threshold]

In [10]:
study, model = cp.training.optimize_model_fast(
    model_class = LGBMClassifier,
    param_space = "suggest",
    X_train = X_train,
    y_train = Y_train,
    X_val = X_val,
    y_val = Y_val,
    n_trials = 100
)

  0%|          | 0/100 [00:00<?, ?it/s]

In [11]:
Y_pred = model.predict_proba(X_train)[:, 1]
threshold = cp.training.ks_threshold(Y_train, Y_pred)
model_dict["LGBMClassifier"] = [model, threshold]

By build an dict with the information of the models, we can evaluate with functions from the package. (If it is necessary to use different thresholds for each model, set the values of the dict as a tuple with the model and the threshold value.)

In [12]:
cp.evaluate.get_metrics(model_dict, X_test, Y_test)

Unnamed: 0,model,AUC,Brier Score,Balanced Accuracy,Accuracy,Precision,Recall,F1
0,LogisticRegression,0.742657,0.202639,0.672453,0.7,0.577465,0.577465,0.577465
1,LGBMClassifier,0.715034,0.218764,0.642537,0.69,0.576271,0.478873,0.523077


In [13]:
cp.evaluate.get_fairness_metrics(model_dict, X_test, Y_test, A_test)

Unnamed: 0,model,DPD,EOD,AOD,APVD,GMA,balanced_accuracy
0,LogisticRegression,0.13115,0.057734,0.134626,-0.134633,0.671465,0.691451
1,LGBMClassifier,-0.041345,-0.153595,-0.057977,-0.147733,0.682789,0.614041
