### 적용할 Model (총 8 가지)
- Logistic Regression
- LDA
- QDA
- KNN
- Decision Tree
- Random Forest
- XGBoost
- Light_GBM

### 적용할 Oversampling 방법 (총 4 가지)
- Oversampling하지 않은 원래 데이터
- SMOTE
- ADASYN
- Distribution-SMOTE

### ▶ 총 8*4=32가지의 Oversampling + modeling 경우에 대해서 평가지표를 출력

---

## (2) Modeling 목차
1. **평가함수 정의:** 평가지표를 출력하기 위한 함수 생성
    

2. **각 Oversampling 방법에 따른 모델 적용:** 각각의 32가지 경우 출력<br>
    **2.(1) Oversampling 하지 않음**<br>
    2.(1)-1. 데이터 가공 : 해당 Oversampling 기법을 적용한 데이터를 불러와 모델 적합에 맞게 가공<br>
    2.(1)-2. 모델 적합 : 해당 Oversampling 기법에 대해 8가지 모델을 적용<br>
    2.(1)-3. 성능 평가 : 각 적합 결과에 대해 혼동행렬, 정확도, 정밀도, 재현율, F1-score, AUC, 기하 평균 값 산출<br>
    
    **2.(2) SMOTE**<br>
    2.(2)-1. 데이터 가공<br>
    2.(2)-2. 모델 적합<br>
    2.(2)-3. 성능 평가<br>
    
    **2.(3) ADASYN**<br>
    2.(3)-1. 데이터 가공<br>
    2.(3)-2. 모델 적합<br>
    2.(3)-3. 성능 평가<br>
    
    **2.(4) Distribution-SMOTE**<br>
    2.(4)-1. 데이터 가공<br>
    2.(4)-2. 모델 적합<br>
    2.(4)-3. 성능 평가<br>


---

In [2]:
import warnings
warnings.filterwarnings(action='ignore')

## 1. 평가함수 정의

In [3]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from imblearn.metrics import geometric_mean_score

def get_clf_eval(y_test, y_pred):
    confmat=pd.DataFrame(confusion_matrix(y_test, y_pred),
                    index=['True[0]', 'True[1]'],
                    columns=['Predict[0]', 'Predict[1]'])
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    AUC = roc_auc_score(y_test, y_pred)
    g_means = geometric_mean_score(y_test, y_pred)
    print(confmat)
    print("\n정확도 : {:.3f} \n정밀도 : {:.3f} \n재현율 : {:.3f} \nf1-score : {:.3f} \nAUC : {:.3f} \n기하평균 : {:.3f} \n".format(accuracy,
                                        precision, recall, f1, AUC, g_means))

## 2. 각 Oversampling 방법에 따른 모델 적용

## 2.(1) 불균형 데이터 처리 하지 않음

### 2.(1)-1. 데이터 가공

In [13]:
import pandas as pd
df = pd.read_csv("Loan_data.csv")

In [14]:
df

Unnamed: 0,Id,Income,Age,Experience,Married/Single,House_Ownership,Car_Ownership,Profession,CITY,STATE,CURRENT_JOB_YRS,CURRENT_HOUSE_YRS,Risk_Flag
0,1,1303834,23,3,single,rented,no,Mechanical_engineer,Rewa,Madhya_Pradesh,3,13,0
1,2,7574516,40,10,single,rented,no,Software_Developer,Parbhani,Maharashtra,9,13,0
2,3,3991815,66,4,married,rented,no,Technical_writer,Alappuzha,Kerala,4,10,0
3,4,6256451,41,2,single,rented,yes,Software_Developer,Bhubaneswar,Odisha,2,12,1
4,5,5768871,47,11,single,rented,no,Civil_servant,Tiruchirappalli[10],Tamil_Nadu,3,14,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
251995,251996,8154883,43,13,single,rented,no,Surgeon,Kolkata,West_Bengal,6,11,0
251996,251997,2843572,26,10,single,rented,no,Army_officer,Rewa,Madhya_Pradesh,6,11,0
251997,251998,4522448,46,7,single,rented,no,Design_Engineer,Kalyan-Dombivli,Maharashtra,7,12,0
251998,251999,6507128,45,0,single,rented,no,Graphic_Designer,Pondicherry,Puducherry,0,10,0


In [15]:
# Feature, Target 나누기
X = df.drop(['Id','Risk_Flag'], axis=1)
y = df.Risk_Flag

In [16]:
# 범주형 변수 Labeling하기

from sklearn.preprocessing import LabelEncoder

en = LabelEncoder()
category_cols = ['Married/Single','House_Ownership','Car_Ownership', 'Profession', 'CITY', 'STATE']
for cols in category_cols:
    X[cols] = en.fit_transform(X[cols])

In [17]:
# Train & Test 데이터셋 나누기
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101)

### 2.(1)-2. 모델 적합

In [30]:
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
import xgboost as xgb
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
import time

models_X = []
models_X.append(('LR', LogisticRegression())) # 로지스틱 분류기 
models_X.append(('LDA', LinearDiscriminantAnalysis()))  # LDA 모델
models_X.append(('QDA', QuadraticDiscriminantAnalysis()))  # QDA 모델
models_X.append(('KNN', KNeighborsClassifier())) # KNN 모델
models_X.append(('DT', DecisionTreeClassifier()))  # 의사결정나무 모델
models_X.append(('RF', RandomForestClassifier()))  # 랜덤포레스트 모델
models_X.append(('XGB', XGBClassifier()))  # XGB 모델
models_X.append(('Light_GBM', LGBMClassifier(boost_from_average=False))) # Light_GBM 모델

for name, model in models_X:
    start = time.time()
    model.fit(X_train, y_train)
    end = time.time() - start
    msg = "%s - train_score : %.3f, test score : %.3f, time : %.5f 초" % (name, model.score(X_train, y_train), model.score(X_test, y_test), end)
    print(msg)

LR - train_score : 0.877, test score : 0.877, time : 0.97798 초
LDA - train_score : 0.877, test score : 0.877, time : 0.65543 초
QDA - train_score : 0.877, test score : 0.877, time : 0.23437 초
KNN - train_score : 0.901, test score : 0.889, time : 2.71192 초
DT - train_score : 0.937, test score : 0.879, time : 2.28047 초
RF - train_score : 0.937, test score : 0.899, time : 50.35796 초
XGB - train_score : 0.894, test score : 0.887, time : 16.81503 초
Light_GBM - train_score : 0.880, test score : 0.878, time : 1.83809 초


### 2.(1)-3. 성능 평가

In [31]:
# 모델 갯수
a = list(range(0,len(models_X)))

for i in a:
    print("----------OverSampling 하지 않음 + %s 모델 적용----------" % (models_X[i][0]))
    get_clf_eval(y_test, models_X[i][1].predict(X_test))

----------OverSampling 하지 않음 + LR 모델 적용----------
         Predict[0]  Predict[1]
True[0]       66292           0
True[1]        9308           0

정확도 : 0.877 
정밀도 : 0.000 
재현율 : 0.000 
f1-score : 0.000 
AUC : 0.500 
기하평균 : 0.000 

----------OverSampling 하지 않음 + LDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       66292           0
True[1]        9308           0

정확도 : 0.877 
정밀도 : 0.000 
재현율 : 0.000 
f1-score : 0.000 
AUC : 0.500 
기하평균 : 0.000 

----------OverSampling 하지 않음 + QDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       66292           0
True[1]        9308           0

정확도 : 0.877 
정밀도 : 0.000 
재현율 : 0.000 
f1-score : 0.000 
AUC : 0.500 
기하평균 : 0.000 

----------OverSampling 하지 않음 + KNN 모델 적용----------
         Predict[0]  Predict[1]
True[0]       62474        3818
True[1]        4573        4735

정확도 : 0.889 
정밀도 : 0.554 
재현율 : 0.509 
f1-score : 0.530 
AUC : 0.726 
기하평균 : 0.692 

----------OverSampling 하지 않음 + DT 모델 적용----------
         Predict[0]

## 2.(2) SMOTE

### 2.(2)-1. 데이터 가공

In [25]:
X_s = pd.read_csv("x_smote.csv")
y_s = pd.read_csv("y_smote.csv")

In [26]:
X_s.drop(['Unnamed: 0'], axis=1, inplace=True)
X_s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,8196231,37,14,1,2,1,20,63,1,7,13
1,2869872,56,19,1,2,0,19,86,1,11,12
2,7361429,58,2,1,2,0,50,292,12,2,12
3,5921974,25,3,0,2,0,50,194,6,3,11
4,1583605,45,13,1,2,0,5,249,13,13,14
...,...,...,...,...,...,...,...,...,...,...,...
309419,8928667,38,15,1,2,0,9,256,2,3,10
309420,7761009,38,3,1,2,0,33,164,4,3,12
309421,4002405,68,6,1,2,0,31,65,13,3,13
309422,8517978,58,7,1,2,0,8,160,12,4,10


In [27]:
y_s.drop(['Unnamed: 0'], axis=1, inplace=True)
y_s

Unnamed: 0,0
0,0
1,0
2,1
3,0
4,0
...,...
309419,1
309420,1
309421,1
309422,1


### 2.(2)-2. 모델 적합

In [43]:
models_s = []
models_s.append(('LR', LogisticRegression())) # 로지스틱 분류기 
models_s.append(('LDA', LinearDiscriminantAnalysis()))  # LDA 모델
models_s.append(('QDA', QuadraticDiscriminantAnalysis()))  # QDA 모델
models_s.append(('KNN', KNeighborsClassifier())) # KNN 모델
models_s.append(('DT', DecisionTreeClassifier()))  # 의사결정나무 모델
models_s.append(('RF', RandomForestClassifier()))  # 랜덤포레스트 모델
models_s.append(('XGB', XGBClassifier()))  # XGB 모델
models_s.append(('Light_GBM', LGBMClassifier(boost_from_average=False))) # Light_GBM 모델

for name, model in models_s:
    start = time.time()
    model.fit(X_s, y_s)
    end = time.time() - start
    msg = "%s - train_score : %.3f, test score : %.3f, time : %.5f 초" % (name, model.score(X_s, y_s), model.score(X_test, y_test), end)
    print(msg)

LR - train_score : 0.500, test score : 0.877, time : 0.88835 초
LDA - train_score : 0.551, test score : 0.514, time : 1.07414 초
QDA - train_score : 0.558, test score : 0.470, time : 0.39894 초
KNN - train_score : 0.913, test score : 0.860, time : 4.93093 초
DT - train_score : 0.958, test score : 0.862, time : 3.91880 초
RF - train_score : 0.958, test score : 0.873, time : 98.57001 초
XGB - train_score : 0.889, test score : 0.843, time : 32.76039 초
Light_GBM - train_score : 0.804, test score : 0.747, time : 3.32928 초


### 2.(2)-3. 성능 평가

In [44]:
# 모델 갯수
a = list(range(0,len(models_s)))

for i in a:
    print("----------SMOTE + %s 모델 적용----------" % (models_s[i][0]))
    get_clf_eval(y_test, models_s[i][1].predict(X_test))

----------SMOTE + LR 모델 적용----------
         Predict[0]  Predict[1]
True[0]       66292           0
True[1]        9308           0

정확도 : 0.877 
정밀도 : 0.000 
재현율 : 0.000 
f1-score : 0.000 
AUC : 0.500 
기하평균 : 0.000 

----------SMOTE + LDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       33796       32496
True[1]        4259        5049

정확도 : 0.514 
정밀도 : 0.134 
재현율 : 0.542 
f1-score : 0.216 
AUC : 0.526 
기하평균 : 0.526 

----------SMOTE + QDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       29605       36687
True[1]        3409        5899

정확도 : 0.470 
정밀도 : 0.139 
재현율 : 0.634 
f1-score : 0.227 
AUC : 0.540 
기하평균 : 0.532 

----------SMOTE + KNN 모델 적용----------
         Predict[0]  Predict[1]
True[0]       56966        9326
True[1]        1294        8014

정확도 : 0.860 
정밀도 : 0.462 
재현율 : 0.861 
f1-score : 0.601 
AUC : 0.860 
기하평균 : 0.860 

----------SMOTE + DT 모델 적용----------
         Predict[0]  Predict[1]
True[0]       57156        9136
True[1]        1315 

## 2.(3) ADASYN

### 2.(3)-1. 데이터 가공

In [32]:
X_a = pd.read_csv("x_adasyn.csv")
y_a = pd.read_csv("y_adasyn.csv")

In [33]:
X_a

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,0,8196231,37,14,1,2,1,20,63,1,7,13
1,1,2869872,56,19,1,2,0,19,86,1,11,12
2,2,7361429,58,2,1,2,0,50,292,12,2,12
3,3,5921974,25,3,0,2,0,50,194,6,3,11
4,4,1583605,45,13,1,2,0,5,249,13,13,14
...,...,...,...,...,...,...,...,...,...,...,...,...
307024,307024,9768782,61,10,1,2,1,46,113,2,5,11
307025,307025,9768782,61,10,1,2,1,46,113,2,5,11
307026,307026,9768782,61,10,1,2,1,46,113,2,5,11
307027,307027,9768782,61,10,1,2,1,46,113,2,5,11


In [34]:
X_a.drop(['Unnamed: 0'], axis=1, inplace=True)
X_a

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,8196231,37,14,1,2,1,20,63,1,7,13
1,2869872,56,19,1,2,0,19,86,1,11,12
2,7361429,58,2,1,2,0,50,292,12,2,12
3,5921974,25,3,0,2,0,50,194,6,3,11
4,1583605,45,13,1,2,0,5,249,13,13,14
...,...,...,...,...,...,...,...,...,...,...,...
307024,9768782,61,10,1,2,1,46,113,2,5,11
307025,9768782,61,10,1,2,1,46,113,2,5,11
307026,9768782,61,10,1,2,1,46,113,2,5,11
307027,9768782,61,10,1,2,1,46,113,2,5,11


In [35]:
y_a.drop(['Unnamed: 0'], axis=1, inplace=True)
y_a

Unnamed: 0,0
0,0
1,0
2,1
3,0
4,0
...,...
307024,1
307025,1
307026,1
307027,1


### 2.(3)-2. 모델 적합

In [45]:
models_a = []
models_a.append(('LR', LogisticRegression())) # 로지스틱 분류기 
models_a.append(('LDA', LinearDiscriminantAnalysis()))  # LDA 모델
models_a.append(('QDA', QuadraticDiscriminantAnalysis()))  # QDA 모델
models_a.append(('KNN', KNeighborsClassifier())) # KNN 모델
models_a.append(('DT', DecisionTreeClassifier()))  # 의사결정나무 모델
models_a.append(('RF', RandomForestClassifier()))  # 랜덤포레스트 모델
models_a.append(('XGB', XGBClassifier()))  # XGB 모델
models_a.append(('Light_GBM', LGBMClassifier(boost_from_average = False))) # Light_GBM 모델

for name, model in models_a:
    start = time.time()
    model.fit(X_a, y_a)
    end = time.time() - start
    msg = "%s - train_score : %.3f, test score : %.3f, time : %.5f 초" % (name, model.score(X_a, y_a), model.score(X_test, y_test), end)
    print(msg)

LR - train_score : 0.504, test score : 0.877, time : 0.59398 초
LDA - train_score : 0.546, test score : 0.540, time : 1.02734 초
QDA - train_score : 0.572, test score : 0.591, time : 0.41605 초
KNN - train_score : 0.899, test score : 0.848, time : 4.69969 초
DT - train_score : 0.957, test score : 0.860, time : 3.79192 초
RF - train_score : 0.957, test score : 0.866, time : 108.34988 초
XGB - train_score : 0.878, test score : 0.835, time : 32.26624 초
Light_GBM - train_score : 0.800, test score : 0.758, time : 3.60067 초


### 2.(3)-3. 성능 평가

In [46]:
# 모델 갯수
a = list(range(0,len(models_a)))

for i in a:
    print("----------ADASYN + %s 모델 적용----------" % (models_a[i][0]))
    get_clf_eval(y_test, models_a[i][1].predict(X_test))

----------ADASYN + LR 모델 적용----------
         Predict[0]  Predict[1]
True[0]       66292           0
True[1]        9308           0

정확도 : 0.877 
정밀도 : 0.000 
재현율 : 0.000 
f1-score : 0.000 
AUC : 0.500 
기하평균 : 0.000 

----------ADASYN + LDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       36424       29868
True[1]        4918        4390

정확도 : 0.540 
정밀도 : 0.128 
재현율 : 0.472 
f1-score : 0.202 
AUC : 0.511 
기하평균 : 0.509 

----------ADASYN + QDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       41017       25275
True[1]        5677        3631

정확도 : 0.591 
정밀도 : 0.126 
재현율 : 0.390 
f1-score : 0.190 
AUC : 0.504 
기하평균 : 0.491 

----------ADASYN + KNN 모델 적용----------
         Predict[0]  Predict[1]
True[0]       56171       10121
True[1]        1341        7967

정확도 : 0.848 
정밀도 : 0.440 
재현율 : 0.856 
f1-score : 0.582 
AUC : 0.852 
기하평균 : 0.852 

----------ADASYN + DT 모델 적용----------
         Predict[0]  Predict[1]
True[0]       57035        9257
True[1]        

## 2.(4) Distribution-SMOTE

### 2.(4)-1. 데이터 가공

In [47]:
X_d = pd.read_csv("X_smova.csv")
y_d = pd.read_csv("y_smova.csv")

In [48]:
X_d

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,0,8.196231e+06,37.000000,14.000000,1.000000,2.000000,1.000000,20.000000,63.000000,1.000000,7.000000,13.000000
1,1,2.869872e+06,56.000000,19.000000,1.000000,2.000000,0.000000,19.000000,86.000000,1.000000,11.000000,12.000000
2,2,7.361429e+06,58.000000,2.000000,1.000000,2.000000,0.000000,50.000000,292.000000,12.000000,2.000000,12.000000
3,3,5.921974e+06,25.000000,3.000000,0.000000,2.000000,0.000000,50.000000,194.000000,6.000000,3.000000,11.000000
4,4,1.583605e+06,45.000000,13.000000,1.000000,2.000000,0.000000,5.000000,249.000000,13.000000,13.000000,14.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
309419,309419,7.826657e+06,30.000000,3.000000,1.000000,2.000000,0.000000,16.000000,66.000000,2.000000,3.000000,10.000000
309420,309420,8.981043e+06,57.000000,8.000000,1.000000,2.000000,1.000000,47.000000,201.000000,11.000000,8.000000,12.000000
309421,309421,6.299623e+05,48.145033,15.698746,0.433751,2.000000,0.000000,24.963742,173.990159,19.794986,12.132498,13.433751
309422,309422,2.117133e+06,67.555042,4.304676,0.217626,0.435252,0.174101,39.730221,300.983823,10.569062,3.348201,10.696402


In [49]:
X_d.drop(['Unnamed: 0'], axis=1, inplace=True)
X_d

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,8.196231e+06,37.000000,14.000000,1.000000,2.000000,1.000000,20.000000,63.000000,1.000000,7.000000,13.000000
1,2.869872e+06,56.000000,19.000000,1.000000,2.000000,0.000000,19.000000,86.000000,1.000000,11.000000,12.000000
2,7.361429e+06,58.000000,2.000000,1.000000,2.000000,0.000000,50.000000,292.000000,12.000000,2.000000,12.000000
3,5.921974e+06,25.000000,3.000000,0.000000,2.000000,0.000000,50.000000,194.000000,6.000000,3.000000,11.000000
4,1.583605e+06,45.000000,13.000000,1.000000,2.000000,0.000000,5.000000,249.000000,13.000000,13.000000,14.000000
...,...,...,...,...,...,...,...,...,...,...,...
309419,7.826657e+06,30.000000,3.000000,1.000000,2.000000,0.000000,16.000000,66.000000,2.000000,3.000000,10.000000
309420,8.981043e+06,57.000000,8.000000,1.000000,2.000000,1.000000,47.000000,201.000000,11.000000,8.000000,12.000000
309421,6.299623e+05,48.145033,15.698746,0.433751,2.000000,0.000000,24.963742,173.990159,19.794986,12.132498,13.433751
309422,2.117133e+06,67.555042,4.304676,0.217626,0.435252,0.174101,39.730221,300.983823,10.569062,3.348201,10.696402


In [50]:
y_d.drop(['Unnamed: 0'], axis=1, inplace=True)
y_d

Unnamed: 0,0
0,0
1,0
2,1
3,0
4,0
...,...
309419,1
309420,1
309421,1
309422,1


### 2.(4)-2. 모델 적합

In [51]:
models_d = []
models_d.append(('LR', LogisticRegression())) # 로지스틱 분류기 
models_d.append(('LDA', LinearDiscriminantAnalysis()))  # LDA 모델
models_d.append(('QDA', QuadraticDiscriminantAnalysis()))  # QDA 모델
models_d.append(('KNN', KNeighborsClassifier())) # KNN 모델
models_d.append(('DT', DecisionTreeClassifier()))  # 의사결정나무 모델
models_d.append(('RF', RandomForestClassifier()))  # 랜덤포레스트 모델
models_d.append(('XGB', XGBClassifier()))  # XGB 모델
models_d.append(('Light_GBM', LGBMClassifier(boost_from_average=False))) # Light_GBM 모델

for name, model in models_d:
    start = time.time()
    model.fit(X_d, y_d)
    end = time.time() - start
    msg = "%s - train_score : %.3f, test score : %.3f, time : %.5f 초" % (name, model.score(X_d, y_d), model.score(X_test, y_test), end)
    print(msg)

LR - train_score : 0.500, test score : 0.877, time : 0.88376 초
LDA - train_score : 0.536, test score : 0.514, time : 1.46114 초
QDA - train_score : 0.559, test score : 0.401, time : 0.44552 초
KNN - train_score : 0.920, test score : 0.861, time : 4.46038 초
DT - train_score : 0.958, test score : 0.875, time : 3.63129 초
RF - train_score : 0.958, test score : 0.897, time : 105.32473 초
XGB - train_score : 0.923, test score : 0.880, time : 40.34094 초
Light_GBM - train_score : 0.857, test score : 0.854, time : 4.15872 초


### 2.(4)-3. 성능 평가

In [52]:
# 모델 갯수
a = list(range(0,len(models_d)))

for i in a:
    print("----------ADASYN + %s 모델 적용----------" % (models_d[i][0]))
    get_clf_eval(y_test, models_d[i][1].predict(X_test))

----------ADASYN + LR 모델 적용----------
         Predict[0]  Predict[1]
True[0]       66292           0
True[1]        9308           0

정확도 : 0.877 
정밀도 : 0.000 
재현율 : 0.000 
f1-score : 0.000 
AUC : 0.500 
기하평균 : 0.000 

----------ADASYN + LDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       33566       32726
True[1]        3980        5328

정확도 : 0.514 
정밀도 : 0.140 
재현율 : 0.572 
f1-score : 0.225 
AUC : 0.539 
기하평균 : 0.538 

----------ADASYN + QDA 모델 적용----------
         Predict[0]  Predict[1]
True[0]       23562       42730
True[1]        2538        6770

정확도 : 0.401 
정밀도 : 0.137 
재현율 : 0.727 
f1-score : 0.230 
AUC : 0.541 
기하평균 : 0.508 

----------ADASYN + KNN 모델 적용----------
         Predict[0]  Predict[1]
True[0]       57170        9122
True[1]        1403        7905

정확도 : 0.861 
정밀도 : 0.464 
재현율 : 0.849 
f1-score : 0.600 
AUC : 0.856 
기하평균 : 0.856 

----------ADASYN + DT 모델 적용----------
         Predict[0]  Predict[1]
True[0]       58539        7753
True[1]        