# WORK for P05

## [Work-1]
不均衡データへの対応処理のため、imbalanced-learnを以下ページ参照しインストールして下さい。<br>
https://imbalanced-learn.readthedocs.io/en/stable/install.html

## [Work-2]
Pipelineをsklearnからimbalanced-learnへ切り替えることで、SMOTEをPipelineに組み込めることを確認せよ。<br>
https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.pipeline.Pipeline.html?highlight=pipeline

In [None]:
import pandas as pd
from sklearn.impute import SimpleImputer

# import data
df = pd.read_csv('./data/av_loan_u6lujuX_CVtuZ9i.csv', header=0)
X = df.iloc[:,:-1]           # 最終列が審査結果のため最終列以前を特徴量Xとして読込
X = X.drop('Loan_ID',axis=1) # 1列目のLoan_IDはローン審査のID情報のため特徴量ベクトルから削除
y = df.iloc[:,-1]            # 最終列を正解データとして読込

# ローン審査でNOとなったサンプルを1に変換
class_mapping = {'N':1, 'Y':0}
y = y.map(class_mapping)

# one-hot エンコーディング
ohe_columns = ['Dependents','Gender','Married','Education','Self_Employed','Property_Area']
X_ohe = pd.get_dummies(X, dummy_na=True, columns=ohe_columns)

# 欠損値補完
imp = SimpleImputer()
imp.fit(X_ohe)
X_ohe_columns = X_ohe.columns.values
X_ohe = pd.DataFrame(imp.transform(X_ohe), columns=X_ohe_columns)

# 結果表示
print('X_new_shape:(%i,%i)' % X_ohe.shape)
print(y.value_counts())

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import f1_score
from imblearn.pipeline import Pipeline   # pipelineの読み込み方法を変更していることに留意
from imblearn.over_sampling import SMOTE

# holdout
X_train, X_test, y_train, y_test = train_test_split(X_ohe, 
                                                    y,
                                                    test_size=0.3)
# set pipeline
pipe_gb = Pipeline([('sm', SMOTE(random_state=0)),
                    ('scl',StandardScaler()),
                    ('est',GradientBoostingClassifier(random_state=1))])
# Modeling
pipe_gb.fit(X_train, y_train)

# Modeling & Evaluation
print('F1 @train', f1_score(y_train, pipe_gb.predict(X_train)))
print('F1 @test',  f1_score(y_test, pipe_gb.predict(X_test)))