# 📘 PyCaret 기반 분류 모델 성능 비교 및 F2-score 최적화 분석 설명서

본 노트북은 PyCaret을 활용하여 정기예금 마케팅 데이터에서 다양한 분류 모델을 비교 평가하고,  
**F2-score 중심의 모델 선택 및 Threshold 최적화**를 수행하는 통합 분석 파이프라인입니다.

---

## 📌 목적
- 은행 마케팅 캠페인에서 **가입 고객을 놓치는 손실(FN)**이 **불필요한 콜(FP)**보다 훨씬 크다는 특성에 따라,  
  **Recall 중심의 평가 지표(F2-score)**를 활용해 **실질적인 비즈니스 손실 최소화**를 목표로 합니다.

---

## 📊 전체 분석 절차 요약

### 1. 라이브러리 임포트
- `pandas`, `numpy`, `matplotlib`: 데이터 처리 및 시각화
- `sklearn.metrics`: F2-score 및 Precision-Recall Curve 계산
- `pycaret.classification`: 모델 설정, 학습, 비교, 해석, 시각화 전반 지원

---

### 2. 데이터 로드
- `data.csv` 파일로부터 마케팅 데이터를 불러오고, 초기 5행 미리보기 출력

---

### 3. PyCaret 환경 설정 (`setup`)
- `target='y'`: 종속 변수 지정
- `train_size=0.8`: 훈련/검증 데이터 비율 8:2
- `session_id=123`: 실험 재현성 확보
- `preprocess=False`: 사용자 사전 전처리 유지

---

### 4. 모델 성능 비교 (`compare_models`)
- PyCaret 내장 분류 모델을 대상으로 기본 성능 비교 수행
- Accuracy, AUC, Recall, Precision, F1 등 주요 지표 확인
- `pull()`로 결과 DataFrame 저장 후 출력

---

### 5. 후보 모델 필터링
- `models()`를 통해 현재 환경에서 사용 가능한 모델 ID 조회
- 관심 모델 목록(`lightgbm`, `xgboost`, `rf`, `et`, `gbc`, `nb`)과 교집합으로 후보 모델 선정

---

### 6. F2-score 기반 모델별 성능 분석
- 각 후보 모델에 대해:
  - 모델 학습 (`create_model`)
  - Hold-out 데이터 예측 (`predict_model`)
  - Precision-Recall Curve 기반 F2-score 계산
  - **최적 Threshold 및 해당 F2-score 저장**

- **F2-score (β=2)**:
  → Recall을 Precision보다 4배 중요하게 반영

---

### 7. 모델별 최적 F2-score 결과 비교
- 모델별 `Best_F2`, `Best_Threshold`를 DataFrame으로 정리
- F2-score 기준으로 **비즈니스 최적화 관점에서 우수 모델 선별 가능**

---


In [27]:
# ▒▒ 1. 라이브러리 임포트 ▒▒
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import fbeta_score, precision_recall_curve
from IPython.display import display

# PyCaret 분류 모듈
from pycaret.classification import (
    setup, create_model, predict_model, plot_model,
    interpret_model, evaluate_model, models,
    compare_models, pull
)

# ▒▒ 2. 데이터 로드 ▒▒
df = pd.read_csv('C:/ITStudy/bank-marketing/data.csv')
display(df.head())

# ▒▒ 3. PyCaret 환경 설정 ▒▒
clf_setup = setup(
    data=df,
    target='y',
    session_id=123,        # 재현성 확보
    train_size=0.8,        # 학습:검증 = 8:2
    preprocess=False,      # 사용자 전처리 유지
    html=False,
    verbose=False
)

# ▒▒ 4. 기본 모델 성능 비교 ▒▒
# compare_models 결과 → pull로 저장
compare_models(n_select=10)
baseline_result_df = pull()
display(baseline_result_df)

# ▒▒ 5. F2-score 기반 후보 모델 정의 ▒▒
available_model_ids = models().index.tolist()
desired_models = ['lightgbm', 'xgboost', 'rf', 'et', 'gbc', 'nb']
candidate_models = [m for m in desired_models if m in available_model_ids]

# ▒▒ 6. 각 모델별 F2-score 계산 및 최적 Threshold 탐색 ▒▒
results = []

for model_name in candidate_models:
    model = create_model(model_name, fold=5)
    preds = predict_model(model)

    # 실제값 및 예측 확률
    y_true = preds['y']
    y_score = preds['prediction_score']

    # Precision-Recall Curve → F2-score 계산
    precision, recall, thresholds = precision_recall_curve(y_true, y_score)
    f2_scores = [
        (1 + 2**2) * (p * r) / ((2**2 * p) + r + 1e-8)
        for p, r in zip(precision[:-1], recall[:-1])
    ]
    
    best_idx = np.argmax(f2_scores)
    best_thresh = thresholds[best_idx]
    best_f2 = f2_scores[best_idx]

    results.append({
        'Model': model_name,
        'Best_F2': best_f2,
        'Best_Threshold': best_thresh,
        'Model_Object': model
    })

# ▒▒ 7. 모델별 F2-score 비교 결과 출력 ▒▒
f2_df = pd.DataFrame(results).sort_values(by='Best_F2', ascending=False)
display(f2_df[['Model', 'Best_F2', 'Best_Threshold']])

Unnamed: 0,age,education,contact,month,campaign,previous,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,...,loan_yes,day_of_week_fri,day_of_week_mon,day_of_week_thu,day_of_week_tue,day_of_week_wed,contacted_before,poutcome_failure,poutcome_nonexistent,poutcome_success
0,1.533034,2,0,5,-0.565922,-0.349494,0.648092,0.722722,0.886447,0.71246,...,False,False,True,False,False,False,0,False,True,False
1,1.628993,5,0,5,-0.565922,-0.349494,0.648092,0.722722,0.886447,0.71246,...,False,False,True,False,False,False,0,False,True,False
2,-0.290186,5,0,5,-0.565922,-0.349494,0.648092,0.722722,0.886447,0.71246,...,False,False,True,False,False,False,0,False,True,False
3,-0.002309,3,0,5,-0.565922,-0.349494,0.648092,0.722722,0.886447,0.71246,...,False,False,True,False,False,False,0,False,True,False
4,1.533034,5,0,5,-0.565922,-0.349494,0.648092,0.722722,0.886447,0.71246,...,True,False,True,False,False,False,0,False,True,False


                                                                      

                                    Model  Accuracy     AUC  Recall   Prec.  \
gbc          Gradient Boosting Classifier    0.9012  0.8020  0.2513  0.6625   
lightgbm  Light Gradient Boosting Machine    0.9012  0.8017  0.2691  0.6486   
lr                    Logistic Regression    0.8999  0.7793  0.2190  0.6724   
ada                  Ada Boost Classifier    0.8996  0.7942  0.2204  0.6655   
ridge                    Ridge Classifier    0.8995  0.7736  0.1883  0.7037   
svm                   SVM - Linear Kernel    0.8968  0.7498  0.1848  0.6629   
rf               Random Forest Classifier    0.8937  0.7718  0.2969  0.5540   
lda          Linear Discriminant Analysis    0.8931  0.7736  0.3456  0.5407   
knn                K Neighbors Classifier    0.8902  0.7174  0.2659  0.5264   
dummy                    Dummy Classifier    0.8873  0.5000  0.0000  0.0000   
et                 Extra Trees Classifier    0.8811  0.7465  0.3023  0.4582   
dt               Decision Tree Classifier    0.8395 



Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.9012,0.802,0.2513,0.6625,0.364,0.322,0.3672,0.97
lightgbm,Light Gradient Boosting Machine,0.9012,0.8017,0.2691,0.6486,0.3801,0.3363,0.3752,0.236
lr,Logistic Regression,0.8999,0.7793,0.219,0.6724,0.3301,0.2908,0.3453,0.11
ada,Ada Boost Classifier,0.8996,0.7942,0.2204,0.6655,0.3304,0.2907,0.3436,0.301
ridge,Ridge Classifier,0.8995,0.7736,0.1883,0.7037,0.2969,0.2617,0.3293,0.021
svm,SVM - Linear Kernel,0.8968,0.7498,0.1848,0.6629,0.2859,0.2489,0.3097,0.116
rf,Random Forest Classifier,0.8937,0.7718,0.2969,0.554,0.3863,0.3339,0.3536,0.804
lda,Linear Discriminant Analysis,0.8931,0.7736,0.3456,0.5407,0.4215,0.3658,0.377,0.034
knn,K Neighbors Classifier,0.8902,0.7174,0.2659,0.5264,0.3526,0.2997,0.321,0.339
dummy,Dummy Classifier,0.8873,0.5,0.0,0.0,0.0,0.0,0.0,0.016


                                                                      

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.9015  0.7939  0.2709  0.6505  0.3825  0.3387  0.3774
1       0.9012  0.8090  0.2803  0.6400  0.3899  0.3449  0.3800
2       0.8980  0.8074  0.2695  0.6061  0.3731  0.3264  0.3584
3       0.8992  0.8027  0.2705  0.6223  0.3771  0.3314  0.3657
4       0.9027  0.8114  0.2651  0.6747  0.3807  0.3386  0.3825
Mean    0.9005  0.8049  0.2713  0.6387  0.3807  0.3360  0.3728
Std     0.0017  0.0062  0.0050  0.0235  0.0056  0.0064  0.0092
                             Model  Accuracy    AUC  Recall   Prec.     F1  \
0  Light Gradient Boosting Machine    0.8996  0.797  0.2651  0.6292  0.373   

    Kappa     MCC  
0  0.3281  0.3647  


                                                                      

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.8950  0.7654  0.3073  0.5616  0.3972  0.3451  0.3639
1       0.8933  0.7834  0.2898  0.5499  0.3795  0.3272  0.3474
2       0.8930  0.7816  0.3019  0.5450  0.3886  0.3352  0.3528
3       0.8894  0.7653  0.2799  0.5174  0.3633  0.3086  0.3261
4       0.8959  0.7745  0.2826  0.5785  0.3797  0.3302  0.3556
Mean    0.8933  0.7740  0.2923  0.5505  0.3817  0.3292  0.3492
Std     0.0022  0.0077  0.0107  0.0202  0.0113  0.0120  0.0127
                      Model  Accuracy     AUC  Recall   Prec.      F1  Kappa  \
0  Random Forest Classifier      0.89  0.7667  0.2823  0.5219  0.3664  0.312   

      MCC  
0  0.3298  


                                                                      

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.8821  0.7313  0.3046  0.4641  0.3678  0.3058  0.3141
1       0.8830  0.7617  0.3086  0.4702  0.3727  0.3112  0.3196
2       0.8816  0.7542  0.3113  0.4620  0.3720  0.3094  0.3167
3       0.8778  0.7414  0.2880  0.4367  0.3471  0.2829  0.2903
4       0.8809  0.7426  0.2948  0.4562  0.3581  0.2958  0.3044
Mean    0.8811  0.7462  0.3015  0.4579  0.3635  0.3010  0.3090
Std     0.0018  0.0106  0.0088  0.0115  0.0097  0.0105  0.0107
                    Model  Accuracy     AUC  Recall   Prec.      F1   Kappa  \
0  Extra Trees Classifier    0.8773  0.7413  0.2931  0.4338  0.3498  0.2849   

      MCC  
0  0.2916  


                                                                      

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.9032  0.7876  0.2574  0.6871  0.3745  0.3336  0.3814
1       0.8995  0.8068  0.2493  0.6379  0.3585  0.3152  0.3566
2       0.9012  0.8023  0.2601  0.6542  0.3722  0.3293  0.3709
3       0.8989  0.8045  0.2463  0.6332  0.3547  0.3112  0.3524
4       0.9026  0.8095  0.2396  0.6980  0.3567  0.3174  0.3713
Mean    0.9011  0.8022  0.2505  0.6621  0.3633  0.3213  0.3665
Std     0.0017  0.0076  0.0075  0.0261  0.0083  0.0086  0.0106
                          Model  Accuracy     AUC  Recall   Prec.     F1  \
0  Gradient Boosting Classifier    0.9012  0.7975  0.2554  0.6583  0.368   

    Kappa    MCC  
0  0.3255  0.369  


                                                                      

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.8288  0.7439  0.4367  0.3133  0.3649  0.2690  0.2740
1       0.8264  0.7629  0.4677  0.3166  0.3776  0.2810  0.2883
2       0.8229  0.7491  0.4380  0.3023  0.3577  0.2590  0.2650
3       0.8250  0.7525  0.4401  0.3073  0.3619  0.2642  0.2700
4       0.8347  0.7511  0.4293  0.3242  0.3694  0.2765  0.2801
Mean    0.8276  0.7519  0.4424  0.3128  0.3663  0.2700  0.2755
Std     0.0041  0.0062  0.0132  0.0075  0.0068  0.0080  0.0081
         Model  Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
0  Naive Bayes    0.8288  0.7549   0.458  0.3191  0.3761  0.2806  0.2867


Unnamed: 0,Model,Best_F2,Best_Threshold
4,nb,0.388773,0.5107
0,lightgbm,0.38835,0.5005
1,rf,0.388285,0.5
2,et,0.388285,0.5
3,gbc,0.388285,0.5001
