# Fairlearn 오픈소스 패키지를 사용한 ML 모델 fairness 확인 + 애저 머신러닝 서비스와의 연동

- 참고 문서
 - Docs (영문): https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-fairness-aml
   - Docs (한글, 기계 번역): https://docs.microsoft.com/ko-kr/azure/machine-learning/how-to-machine-learning-fairness-aml
 - Fairlearn 퀵스타트 (영문): https://fairlearn.github.io/v0.5.0/quickstart.html


## 예제 모델 학습

- 데이터셋: OpenML에 업로드된 성인 인구조사 활용 (URL: https://www.openml.org/d/1590)
- 모델: DecisionTreeClassifier를 사용하여 연간 소득이 > 5만 달러 여부를 예측

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import fetch_openml

# 인구조사 데이터셋 불러오기
data = fetch_openml(data_id=1590, as_frame=True)

# 성별 및 인종과 같이 민감한 feature를 모델 트레이닝에서 제외
X_raw = data.data
y_true = (data.target == ">50K") * 1
A = X_raw[["race", "sex"]]
X_raw = pd.get_dummies(X_raw.drop(labels=['sex', 'race'],axis = 1))

In [2]:
# 성별(sex) 데이터셋 확인
sex = data.data['sex']
sex.value_counts()

Male      32650
Female    16192
Name: sex, dtype: int64

In [3]:
# 인종(race) 데이터셋 확인
race = data.data['race']
race.value_counts()

White                 41762
Black                  4685
Asian-Pac-Islander     1519
Amer-Indian-Eskimo      470
Other                   406
Name: race, dtype: int64

In [4]:
from sklearn.model_selection import train_test_split

# 데이터를 "train" (트레이닝) 및 "test" (테스트) 셋으로 분리
(X_train, X_test, y_train, y_test, A_train, A_test) = train_test_split(
    X_raw, y_true, A, test_size=0.3, random_state=12345, stratify=y_true
)

In [5]:
# Ensure indices are aligned between X, y and A,
# after all the slicing and splitting of DataFrames
# and Series
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)
y_test = y_test.reset_index(drop=True)
A_train = A_train.reset_index(drop=True)
A_test = A_test.reset_index(drop=True)

## 분류 모델에 대한 메트릭 확인
- Docs (영문): https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/evaluate-model#metrics-for-classification-models
- Docs (한글, 기계번역): https://docs.microsoft.com/ko-kr/azure/machine-learning/algorithm-module-reference/evaluate-model#metrics-for-classification-models

In [6]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
classifier.fit(X_train, y_train)

y_pred_tr=classifier.predict(X_test)
print('Accuracy: %.3f' % accuracy_score(y_test, y_pred_tr))
print('Precision: %.3f' % precision_score(y_test, y_pred_tr))
print('Recall: %.3f' % recall_score(y_test, y_pred_tr))
print('F1 score: %.3f' % f1_score(y_test, y_pred_tr))
print('AUC: %.3f' % roc_auc_score(y_test, y_pred_tr))

Accuracy: 0.840
Precision: 0.723
Recall: 0.535
F1 score: 0.615
AUC: 0.735


## Fairlearn 라이브러리: Jupyter에서 대시보드 직접 확인 가능

In [7]:
# View this model in Fairlearn's fairness dashboard, and see the disparities which appear:
from fairlearn.widget import FairlearnDashboard
FairlearnDashboard(sensitive_features=A_test, 
                   sensitive_feature_names=['Race', 'Sex'],
                   y_true=y_test,
                   y_pred={"model": y_pred_tr})

FairlearnWidget(value={'true_y': [0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1…

<fairlearn.widget._fairlearn_dashboard.FairlearnDashboard at 0x7fd963d79e48>

## 애저 머신 러닝 서비스에 연결: MLOps 연계 가능

In [8]:
# 애저 머신 러닝 서비스 연결에 필요한 정보를 가져옴
from azureml.core import Workspace, Experiment, Model
import joblib
import os

# config.json 파일에서 설정을 가져옴
# 참고: https://docs.microsoft.com/ko-kr/azure/machine-learning/how-to-configure-environment#workspace
ws = Workspace.from_config()
ws.get_details()

os.makedirs('models', exist_ok=True)

In [14]:
# 사용한 모델 등록이 필요함 (여러 번 할 필요가 없으며, 이미 모델을 등록하였다면 기존 모델을 가져오도록 변경해야 함)

# Function to register models into Azure Machine Learning
def register_model(name, model):
    print("Registering ", name)
    model_path = "models/{0}.pkl".format(name)
    joblib.dump(value=model, filename=model_path)
    registered_model = Model.register(model_path=model_path,
                                    model_name=name,
                                    workspace=ws)
    print("Registered ", registered_model.id)
    return registered_model.id

# Call the register_model function 
dt_classifier_id = register_model("fairness_DecisionTreeClassifier", classifier)

# This example code shows to use an existing registered model id
#dt_classifier_id = Model.list(workspace=ws)[0].id

Registering  fairness_DecisionTreeClassifier
Registering model fairness_DecisionTreeClassifier
Registered  fairness_DecisionTreeClassifier:2


In [15]:
# 공정성에 대한 메트릭을 미리 계산

#  Create a dictionary of model(s) you want to assess for fairness 
sf = { 'Race': A_test.race, 'Sex': A_test.sex}
ys_pred = { dt_classifier_id:y_pred_tr }

from fairlearn.metrics._group_metric_set import _create_group_metric_set

dash_dict = _create_group_metric_set(y_true=y_test,
                                    predictions=ys_pred,
                                    sensitive_features=sf,
                                    prediction_type='binary_classification')

In [16]:
# 미리 계산된 공정성 메트릭을 애저 머신 러닝 서비스에 업로드

from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id

exp = Experiment(ws, "Test_Fairness_Census_Demo-testset")
print(exp)

run = exp.start_logging()

# Upload the dashboard to Azure Machine Learning
try:
    dashboard_title = "Fairness insights of Decision Tree Classifier"
    # Set validate_model_ids parameter of upload_dashboard_dictionary to False if you have not registered your model(s)
    upload_id = upload_dashboard_dictionary(run,
                                            dash_dict,
                                            dashboard_name=dashboard_title)
    print("\nUploaded to id: {0}\n".format(upload_id))

    # To test the dashboard, you can download it back and ensure it contains the right information
    downloaded_dict = download_dashboard_by_upload_id(run, upload_id)
finally:
    run.complete()
    
# 애저 머신 러닝 서비스 내에서 확인 가능

Experiment(Name: Test_Fairness_Census_Demo-testset,
Workspace: fair-ml)


INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_dashboard_validation.py:Starting validation of dashboard dictionary
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_dashboard_validation.py:Validation of dashboard dictionary successful
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_azureml_validation.py:Validating model ids exist
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_azureml_validation.py:Checking fairness_DecisionTreeClassifier:2
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_azureml_validation.py:Validation of model ids complete
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_fairness_client.py:Uploading y_true
INFO:azureml.FairnessArtifactClient:Uploading to azureml.fairness/dashboard.metrics/7f84dcf1-4c2b-4382-b84f-887cf315b894/y_true/86b4b5c


Uploaded to id: 7f84dcf1-4c2b-4382-b84f-887cf315b894



INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_fairness_client.py:Populating y_pred
INFO:azureml.FairnessArtifactClient:Downloading from azureml.fairness/dashboard.metrics/7f84dcf1-4c2b-4382-b84f-887cf315b894/y_pred/54cdde09-65d6-4b2a-b73a-7c08bb130834.json
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_fairness_client.py:Populating sensitive features
INFO:azureml.FairnessArtifactClient:Downloading from azureml.fairness/dashboard.metrics/7f84dcf1-4c2b-4382-b84f-887cf315b894/sensitive_features_column/323854ea-fba8-485e-b163-f2c3c5757cfa.json
INFO:azureml.FairnessArtifactClient:Downloading from azureml.fairness/dashboard.metrics/7f84dcf1-4c2b-4382-b84f-887cf315b894/sensitive_features_column/cda2641c-6fc2-4a32-92a0-54029a452fdf.json
INFO:/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/fairness/_fairness_client.py:Populating metrics
INFO:azureml.FairnessArtifactClient:Downloading from az