<a href="https://colab.research.google.com/github/hikaru122700/kaggle-pub/blob/GCI-%E3%82%B3%E3%83%B3%E3%83%9A%EF%BC%92-Home-Credit/006.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Home Credit Default Risk
このnotebookでは、予測結果を作成するまでの流れと基本的な手法についての紹介を行います。


まずは、今回のタスクについて確認しましょう（詳細はREADME.ipynbをご覧ください）。
- **目的**： 顧客データから債務不履行になる確率を予測する。
- **評価指標**： AUC（Area Under the Curve）

## 目次
0. ライブラリ・データの読み込み
1. データの可視化と分析
2. 前処理と特徴量作成
3. 機械学習モデルの作成
4. 予測結果の作成

## 0. ライブラリ・データの読み込み

基本的なライブラリを読み込みます。
他の必要なライブラリについては、説明をする際に読み込みます。
- numpy：数値計算を効率的に行うライブラリ
- pandas：データ分析に便利なライブラリ
- matplotlib：グラフ描画ライブラリ
- seaborn：グラフ描画ライブラリ

In [32]:

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [33]:
!pip install optuna
!pip install catboost



In [34]:
# ライブラリの読み込み
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
from sklearn.metrics import mean_squared_log_error
from sklearn.model_selection import KFold
from itertools import combinations
import gc

from sklearn.model_selection import KFold
from sklearn.metrics import *
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import matplotlib.pyplot as plt
import gc
gc.collect()
import optuna
import matplotlib.pyplot as plt

warnings.filterwarnings('ignore')

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import LabelEncoder
import xgboost as xgb
from sklearn.metrics import log_loss
import time

In [35]:

# すべての列を表示するように設定
pd.set_option('display.max_columns', None)


必要なデータの読み込みを行います。GCIの教材フォルダの構成を想定して、読み込んでいます。

In [36]:
# データの読み込み
# INPUT_DIRにtrain.csvなどのデータを置いているディレクトリを指定してください。

path  = "/content/drive/My Drive/松尾研/GCI/コンペ２/"

train = pd.read_csv(path + "train.csv")
test = pd.read_csv(path + "test.csv")
sample_sub = pd.read_csv(path + "sample_submission.csv")

## 1. データの可視化と分析

### 1.1 データの概観
本格的な分析を行う前に、まずは簡単にデータの概観を確認します。

In [37]:
# trainデータの確認
print(f"train shape: {train.shape}")
train.head(3)

train shape: (171202, 51)


Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,ORGANIZATION_TYPE,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,0,0,Cash loans,F,N,N,0,112500.0,755190.0,36328.5,675000.0,Unaccompanied,Working,Higher education,Married,House / apartment,0.010032,-9233,-878,-333.0,-522,,1,1,1,1,0,0,Core staff,2.0,2,2,0,1,1,0,1,1,School,,0.372591,,0.0,0.0,0.0,0.0,-292.0,,,,
1,1,0,Cash loans,F,N,Y,0,225000.0,585000.0,16893.0,585000.0,Unaccompanied,Pensioner,Secondary / secondary special,Married,House / apartment,0.008019,-20148,365243,-4469.0,-3436,,1,0,0,1,0,0,,2.0,2,2,0,0,0,0,0,0,XNA,,0.449567,0.553165,0.0,0.0,0.0,0.0,-617.0,0.0,0.0,0.0,1.0
2,2,0,Cash loans,F,N,Y,0,54000.0,334152.0,18256.5,270000.0,Family,State servant,Secondary / secondary special,Married,House / apartment,0.00496,-18496,-523,-3640.0,-2050,,1,1,1,1,1,0,Core staff,2.0,2,2,0,0,0,0,0,0,Postal,,0.569503,,4.0,0.0,4.0,0.0,-542.0,,,,


In [38]:
# testデータの確認
print(f"test shape: {test.shape}")
test.head(3)

test shape: (61500, 50)


Unnamed: 0,SK_ID_CURR,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,ORGANIZATION_TYPE,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,171202,Cash loans,F,N,N,1,144000.0,961146.0,28233.0,688500.0,Unaccompanied,Working,Higher education,Married,House / apartment,0.025164,-12108,-2372,-2446.0,-3022,,1,1,0,1,1,0,Medicine staff,3.0,2,2,0,0,0,0,0,0,Kindergarten,,0.720416,,2.0,0.0,2.0,0.0,-1.0,,,,
1,171203,Cash loans,F,N,N,0,103500.0,296280.0,16069.5,225000.0,Unaccompanied,Working,Secondary / secondary special,Married,House / apartment,0.00702,-17907,-1712,-10450.0,-253,,1,1,1,1,0,0,Cleaning staff,2.0,2,2,0,0,0,0,0,0,School,,0.287306,,5.0,0.0,5.0,0.0,-212.0,,,,
2,171204,Cash loans,F,N,Y,1,180000.0,183694.5,11236.5,139500.0,Children,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,0.006852,-15221,-553,-1056.0,-4495,,1,1,0,1,0,0,,2.0,3,3,0,0,0,1,1,0,Trade: type 7,,0.352456,0.389339,7.0,0.0,7.0,0.0,-428.0,0.0,1.0,1.0,1.0


In [39]:
train["ORGANIZATION_TYPE"].describe()

Unnamed: 0,ORGANIZATION_TYPE
count,171202
unique,58
top,Business Entity Type 3
freq,37943


In [40]:
all_df = pd.concat([train, test], axis=0)
all_df.drop("SK_ID_CURR", axis=1, inplace=True)


In [41]:
# 各種商（比率）特徴量
all_df['CREDIT_INCOME_RATIO'] = all_df['AMT_CREDIT'] / (all_df['AMT_INCOME_TOTAL'] + 1e-6)   # クレジット額 / 総収入
all_df['CREDIT_ANNUITY_RATIO'] = all_df['AMT_CREDIT'] / (all_df['AMT_ANNUITY'] + 1e-6)       # クレジット額 / 年金返済額
all_df['CREDIT_GOODS_RATIO'] = all_df['AMT_CREDIT'] / (all_df['AMT_GOODS_PRICE'] + 1e-6)     # クレジット額 / 商品価格
all_df['ANNUITY_INCOME_RATIO'] = all_df['AMT_ANNUITY'] / (all_df['AMT_INCOME_TOTAL'] + 1e-6) # 年金返済額 / 総収入
all_df['GOODS_INCOME_RATIO'] = all_df['AMT_GOODS_PRICE'] / (all_df['AMT_INCOME_TOTAL'] + 1e-6) # 商品価格 / 総収入
all_df['ANNUITY_GOODS_RATIO'] = all_df['AMT_ANNUITY'] / (all_df['AMT_GOODS_PRICE'] + 1e-6)   # 年金返済額 / 商品価格
all_df['EMPLOYED_BIRTH_RATIO'] = all_df['DAYS_EMPLOYED'] / (all_df['DAYS_BIRTH'] + 1e-6)     # 就業日数 / 年齢（日）

# 各種積（プロダクト）特徴量
all_df['CREDIT_INCOME_PRODUCT'] = all_df['AMT_CREDIT'] * all_df['AMT_INCOME_TOTAL']           # クレジット額 × 総収入
all_df['CREDIT_ANNUITY_PRODUCT'] = all_df['AMT_CREDIT'] * all_df['AMT_ANNUITY']               # クレジット額 × 年金返済額
all_df['CREDIT_GOODS_PRODUCT'] = all_df['AMT_CREDIT'] * all_df['AMT_GOODS_PRICE']             # クレジット額 × 商品価格
all_df['INCOME_ANNUITY_PRODUCT'] = all_df['AMT_INCOME_TOTAL'] * all_df['AMT_ANNUITY']         # 総収入 × 年金返済額
all_df['INCOME_GOODS_PRODUCT'] = all_df['AMT_INCOME_TOTAL'] * all_df['AMT_GOODS_PRICE']       # 総収入 × 商品価格
all_df['ANNUITY_GOODS_PRODUCT'] = all_df['AMT_ANNUITY'] * all_df['AMT_GOODS_PRICE']           # 年金返済額 × 商品価格

# DAYS_BIRTH, DAYS_EMPLOYEDを用いた積
# 日数の符号は負となるが、そのまま特徴として使うことで就業期間と年齢関係を非線形的に捉えられる可能性がある
all_df['EMPLOYED_BIRTH_PRODUCT'] = all_df['DAYS_EMPLOYED'] * all_df['DAYS_BIRTH']            # 就業日数 × 年齢（日）

# EXT_SOURCE_1, EXT_SOURCE_2, EXT_SOURCE_3との組み合わせ特徴量
# EXT_SOURCE_xが存在する場合にのみ処理を実行
for i in range(1, 4):
    ext_source_col = f'EXT_SOURCE_{i}'
    if ext_source_col in all_df.columns:
        all_df[f'CREDIT_EXT_SOURCE_{i}_RATIO'] = all_df['AMT_CREDIT'] / (all_df[ext_source_col] + 1e-6)
        all_df[f'INCOME_EXT_SOURCE_{i}_RATIO'] = all_df['AMT_INCOME_TOTAL'] / (all_df[ext_source_col] + 1e-6)
        all_df[f'CREDIT_EXT_SOURCE_{i}_PRODUCT'] = all_df['AMT_CREDIT'] * all_df[ext_source_col]
        all_df[f'INCOME_EXT_SOURCE_{i}_PRODUCT'] = all_df['AMT_INCOME_TOTAL'] * all_df[ext_source_col]

# 必要に応じて、他の日数系特徴量（DAYS_REGISTRATION, DAYS_ID_PUBLISHなど）との組み合わせや、
# 他の数値特徴量との積・商も追加で生成できます。

# 例として、DAYS_REGISTRATIONとAMT_CREDITの組み合わせを追加する場合：
if 'DAYS_REGISTRATION' in all_df.columns:
    all_df['CREDIT_REGISTRATION_RATIO'] = all_df['AMT_CREDIT'] / (all_df['DAYS_REGISTRATION'].abs() + 1e-6)
    all_df['CREDIT_REGISTRATION_PRODUCT'] = all_df['AMT_CREDIT'] * all_df['DAYS_REGISTRATION']


In [42]:
all_df = pd.get_dummies(all_df)
train = all_df[:train.shape[0]]
test = all_df[train.shape[0]:]

In [43]:
# 特殊文字を含むカラム名を検出
import re

special_chars = r'["\\/\b\f\n\r\t]'
invalid_columns = [col for col in all_df.columns if re.search(special_chars, col)]
print("特殊文字を含むカラム:", invalid_columns)

# 特殊文字をアンダースコアに置換
all_df.columns = [re.sub(special_chars, '_', col) for col in all_df.columns]
all_df.columns = [re.sub(special_chars, '_', col) for col in all_df.columns]


特殊文字を含むカラム: ['NAME_EDUCATION_TYPE_Secondary / secondary special', 'NAME_FAMILY_STATUS_Single / not married', 'NAME_HOUSING_TYPE_House / apartment', 'OCCUPATION_TYPE_Waiters/barmen staff']


In [44]:
# trainデータの確認
print(f"train shape: {train.shape}")
train.head(3)

train shape: (171202, 184)


Unnamed: 0,TARGET,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,CREDIT_INCOME_RATIO,CREDIT_ANNUITY_RATIO,CREDIT_GOODS_RATIO,ANNUITY_INCOME_RATIO,GOODS_INCOME_RATIO,ANNUITY_GOODS_RATIO,EMPLOYED_BIRTH_RATIO,CREDIT_INCOME_PRODUCT,CREDIT_ANNUITY_PRODUCT,CREDIT_GOODS_PRODUCT,INCOME_ANNUITY_PRODUCT,INCOME_GOODS_PRODUCT,ANNUITY_GOODS_PRODUCT,EMPLOYED_BIRTH_PRODUCT,CREDIT_EXT_SOURCE_1_RATIO,INCOME_EXT_SOURCE_1_RATIO,CREDIT_EXT_SOURCE_1_PRODUCT,INCOME_EXT_SOURCE_1_PRODUCT,CREDIT_EXT_SOURCE_2_RATIO,INCOME_EXT_SOURCE_2_RATIO,CREDIT_EXT_SOURCE_2_PRODUCT,INCOME_EXT_SOURCE_2_PRODUCT,CREDIT_EXT_SOURCE_3_RATIO,INCOME_EXT_SOURCE_3_RATIO,CREDIT_EXT_SOURCE_3_PRODUCT,INCOME_EXT_SOURCE_3_PRODUCT,CREDIT_REGISTRATION_RATIO,CREDIT_REGISTRATION_PRODUCT,NAME_CONTRACT_TYPE_Cash loans,NAME_CONTRACT_TYPE_Revolving loans,CODE_GENDER_F,CODE_GENDER_M,CODE_GENDER_XNA,FLAG_OWN_CAR_N,FLAG_OWN_CAR_Y,FLAG_OWN_REALTY_N,FLAG_OWN_REALTY_Y,NAME_TYPE_SUITE_Children,NAME_TYPE_SUITE_Family,NAME_TYPE_SUITE_Group of people,NAME_TYPE_SUITE_Other_A,NAME_TYPE_SUITE_Other_B,"NAME_TYPE_SUITE_Spouse, partner",NAME_TYPE_SUITE_Unaccompanied,NAME_INCOME_TYPE_Businessman,NAME_INCOME_TYPE_Commercial associate,NAME_INCOME_TYPE_Maternity leave,NAME_INCOME_TYPE_Pensioner,NAME_INCOME_TYPE_State servant,NAME_INCOME_TYPE_Student,NAME_INCOME_TYPE_Unemployed,NAME_INCOME_TYPE_Working,NAME_EDUCATION_TYPE_Academic degree,NAME_EDUCATION_TYPE_Higher education,NAME_EDUCATION_TYPE_Incomplete higher,NAME_EDUCATION_TYPE_Lower secondary,NAME_EDUCATION_TYPE_Secondary / secondary special,NAME_FAMILY_STATUS_Civil marriage,NAME_FAMILY_STATUS_Married,NAME_FAMILY_STATUS_Separated,NAME_FAMILY_STATUS_Single / not married,NAME_FAMILY_STATUS_Unknown,NAME_FAMILY_STATUS_Widow,NAME_HOUSING_TYPE_Co-op apartment,NAME_HOUSING_TYPE_House / apartment,NAME_HOUSING_TYPE_Municipal apartment,NAME_HOUSING_TYPE_Office apartment,NAME_HOUSING_TYPE_Rented apartment,NAME_HOUSING_TYPE_With parents,OCCUPATION_TYPE_Accountants,OCCUPATION_TYPE_Cleaning staff,OCCUPATION_TYPE_Cooking staff,OCCUPATION_TYPE_Core staff,OCCUPATION_TYPE_Drivers,OCCUPATION_TYPE_HR staff,OCCUPATION_TYPE_High skill tech staff,OCCUPATION_TYPE_IT staff,OCCUPATION_TYPE_Laborers,OCCUPATION_TYPE_Low-skill Laborers,OCCUPATION_TYPE_Managers,OCCUPATION_TYPE_Medicine staff,OCCUPATION_TYPE_Private service staff,OCCUPATION_TYPE_Realty agents,OCCUPATION_TYPE_Sales staff,OCCUPATION_TYPE_Secretaries,OCCUPATION_TYPE_Security staff,OCCUPATION_TYPE_Waiters/barmen staff,ORGANIZATION_TYPE_Advertising,ORGANIZATION_TYPE_Agriculture,ORGANIZATION_TYPE_Bank,ORGANIZATION_TYPE_Business Entity Type 1,ORGANIZATION_TYPE_Business Entity Type 2,ORGANIZATION_TYPE_Business Entity Type 3,ORGANIZATION_TYPE_Cleaning,ORGANIZATION_TYPE_Construction,ORGANIZATION_TYPE_Culture,ORGANIZATION_TYPE_Electricity,ORGANIZATION_TYPE_Emergency,ORGANIZATION_TYPE_Government,ORGANIZATION_TYPE_Hotel,ORGANIZATION_TYPE_Housing,ORGANIZATION_TYPE_Industry: type 1,ORGANIZATION_TYPE_Industry: type 10,ORGANIZATION_TYPE_Industry: type 11,ORGANIZATION_TYPE_Industry: type 12,ORGANIZATION_TYPE_Industry: type 13,ORGANIZATION_TYPE_Industry: type 2,ORGANIZATION_TYPE_Industry: type 3,ORGANIZATION_TYPE_Industry: type 4,ORGANIZATION_TYPE_Industry: type 5,ORGANIZATION_TYPE_Industry: type 6,ORGANIZATION_TYPE_Industry: type 7,ORGANIZATION_TYPE_Industry: type 8,ORGANIZATION_TYPE_Industry: type 9,ORGANIZATION_TYPE_Insurance,ORGANIZATION_TYPE_Kindergarten,ORGANIZATION_TYPE_Legal Services,ORGANIZATION_TYPE_Medicine,ORGANIZATION_TYPE_Military,ORGANIZATION_TYPE_Mobile,ORGANIZATION_TYPE_Other,ORGANIZATION_TYPE_Police,ORGANIZATION_TYPE_Postal,ORGANIZATION_TYPE_Realtor,ORGANIZATION_TYPE_Religion,ORGANIZATION_TYPE_Restaurant,ORGANIZATION_TYPE_School,ORGANIZATION_TYPE_Security,ORGANIZATION_TYPE_Security Ministries,ORGANIZATION_TYPE_Self-employed,ORGANIZATION_TYPE_Services,ORGANIZATION_TYPE_Telecom,ORGANIZATION_TYPE_Trade: type 1,ORGANIZATION_TYPE_Trade: type 2,ORGANIZATION_TYPE_Trade: type 3,ORGANIZATION_TYPE_Trade: type 4,ORGANIZATION_TYPE_Trade: type 5,ORGANIZATION_TYPE_Trade: type 6,ORGANIZATION_TYPE_Trade: type 7,ORGANIZATION_TYPE_Transport: type 1,ORGANIZATION_TYPE_Transport: type 2,ORGANIZATION_TYPE_Transport: type 3,ORGANIZATION_TYPE_Transport: type 4,ORGANIZATION_TYPE_University,ORGANIZATION_TYPE_XNA
0,0.0,0,112500.0,755190.0,36328.5,675000.0,0.010032,-9233,-878,-333.0,-522,,1,1,1,1,0,0,2.0,2,2,0,1,1,0,1,1,,0.372591,,0.0,0.0,0.0,0.0,-292.0,,,,,6.7128,20.787811,1.1188,0.32292,6.0,0.05382,0.095094,84958880000.0,27434920000.0,509753200000.0,4086956000.0,75937500000.0,24521740000.0,8106574,,,,,2026857.0,301939.131163,281376.735276,41916.448468,,,,,2267.837831,-251478300.0,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,0.0,0,225000.0,585000.0,16893.0,585000.0,0.008019,-20148,365243,-4469.0,-3436,,1,0,0,1,0,0,2.0,2,2,0,0,0,0,0,0,,0.449567,0.553165,0.0,0.0,0.0,0.0,-617.0,0.0,0.0,0.0,1.0,2.6,34.629728,1.0,0.07508,2.6,0.028877,-18.128003,131625000000.0,9882405000.0,342225000000.0,3800925000.0,131625000000.0,9882405000.0,-7358915964,,,,,1301249.0,500480.552704,262996.646938,101152.556515,1057549.0,406749.732494,323601.348781,124462.057223,130.901768,-2614365000.0,True,False,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
2,0.0,0,54000.0,334152.0,18256.5,270000.0,0.00496,-18496,-523,-3640.0,-2050,,1,1,1,1,1,0,2.0,2,2,0,0,0,0,0,0,,0.569503,,4.0,0.0,4.0,0.0,-542.0,,,,,6.188,18.30318,1.2376,0.338083,5.0,0.067617,0.028276,18044210000.0,6100446000.0,90221040000.0,985851000.0,14580000000.0,4929255000.0,9673408,,,,,586741.8,94819.2935,190300.683015,30753.180836,,,,,91.8,-1216313000.0,True,False,True,False,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [45]:
train["TARGET"].value_counts()

Unnamed: 0_level_0,count
TARGET,Unnamed: 1_level_1
0.0,157381
1.0,13821


In [46]:
# testデータの確認
print(f"test shape: {test.shape}")
test.head(3)

test shape: (61500, 184)


Unnamed: 0,TARGET,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,CREDIT_INCOME_RATIO,CREDIT_ANNUITY_RATIO,CREDIT_GOODS_RATIO,ANNUITY_INCOME_RATIO,GOODS_INCOME_RATIO,ANNUITY_GOODS_RATIO,EMPLOYED_BIRTH_RATIO,CREDIT_INCOME_PRODUCT,CREDIT_ANNUITY_PRODUCT,CREDIT_GOODS_PRODUCT,INCOME_ANNUITY_PRODUCT,INCOME_GOODS_PRODUCT,ANNUITY_GOODS_PRODUCT,EMPLOYED_BIRTH_PRODUCT,CREDIT_EXT_SOURCE_1_RATIO,INCOME_EXT_SOURCE_1_RATIO,CREDIT_EXT_SOURCE_1_PRODUCT,INCOME_EXT_SOURCE_1_PRODUCT,CREDIT_EXT_SOURCE_2_RATIO,INCOME_EXT_SOURCE_2_RATIO,CREDIT_EXT_SOURCE_2_PRODUCT,INCOME_EXT_SOURCE_2_PRODUCT,CREDIT_EXT_SOURCE_3_RATIO,INCOME_EXT_SOURCE_3_RATIO,CREDIT_EXT_SOURCE_3_PRODUCT,INCOME_EXT_SOURCE_3_PRODUCT,CREDIT_REGISTRATION_RATIO,CREDIT_REGISTRATION_PRODUCT,NAME_CONTRACT_TYPE_Cash loans,NAME_CONTRACT_TYPE_Revolving loans,CODE_GENDER_F,CODE_GENDER_M,CODE_GENDER_XNA,FLAG_OWN_CAR_N,FLAG_OWN_CAR_Y,FLAG_OWN_REALTY_N,FLAG_OWN_REALTY_Y,NAME_TYPE_SUITE_Children,NAME_TYPE_SUITE_Family,NAME_TYPE_SUITE_Group of people,NAME_TYPE_SUITE_Other_A,NAME_TYPE_SUITE_Other_B,"NAME_TYPE_SUITE_Spouse, partner",NAME_TYPE_SUITE_Unaccompanied,NAME_INCOME_TYPE_Businessman,NAME_INCOME_TYPE_Commercial associate,NAME_INCOME_TYPE_Maternity leave,NAME_INCOME_TYPE_Pensioner,NAME_INCOME_TYPE_State servant,NAME_INCOME_TYPE_Student,NAME_INCOME_TYPE_Unemployed,NAME_INCOME_TYPE_Working,NAME_EDUCATION_TYPE_Academic degree,NAME_EDUCATION_TYPE_Higher education,NAME_EDUCATION_TYPE_Incomplete higher,NAME_EDUCATION_TYPE_Lower secondary,NAME_EDUCATION_TYPE_Secondary / secondary special,NAME_FAMILY_STATUS_Civil marriage,NAME_FAMILY_STATUS_Married,NAME_FAMILY_STATUS_Separated,NAME_FAMILY_STATUS_Single / not married,NAME_FAMILY_STATUS_Unknown,NAME_FAMILY_STATUS_Widow,NAME_HOUSING_TYPE_Co-op apartment,NAME_HOUSING_TYPE_House / apartment,NAME_HOUSING_TYPE_Municipal apartment,NAME_HOUSING_TYPE_Office apartment,NAME_HOUSING_TYPE_Rented apartment,NAME_HOUSING_TYPE_With parents,OCCUPATION_TYPE_Accountants,OCCUPATION_TYPE_Cleaning staff,OCCUPATION_TYPE_Cooking staff,OCCUPATION_TYPE_Core staff,OCCUPATION_TYPE_Drivers,OCCUPATION_TYPE_HR staff,OCCUPATION_TYPE_High skill tech staff,OCCUPATION_TYPE_IT staff,OCCUPATION_TYPE_Laborers,OCCUPATION_TYPE_Low-skill Laborers,OCCUPATION_TYPE_Managers,OCCUPATION_TYPE_Medicine staff,OCCUPATION_TYPE_Private service staff,OCCUPATION_TYPE_Realty agents,OCCUPATION_TYPE_Sales staff,OCCUPATION_TYPE_Secretaries,OCCUPATION_TYPE_Security staff,OCCUPATION_TYPE_Waiters/barmen staff,ORGANIZATION_TYPE_Advertising,ORGANIZATION_TYPE_Agriculture,ORGANIZATION_TYPE_Bank,ORGANIZATION_TYPE_Business Entity Type 1,ORGANIZATION_TYPE_Business Entity Type 2,ORGANIZATION_TYPE_Business Entity Type 3,ORGANIZATION_TYPE_Cleaning,ORGANIZATION_TYPE_Construction,ORGANIZATION_TYPE_Culture,ORGANIZATION_TYPE_Electricity,ORGANIZATION_TYPE_Emergency,ORGANIZATION_TYPE_Government,ORGANIZATION_TYPE_Hotel,ORGANIZATION_TYPE_Housing,ORGANIZATION_TYPE_Industry: type 1,ORGANIZATION_TYPE_Industry: type 10,ORGANIZATION_TYPE_Industry: type 11,ORGANIZATION_TYPE_Industry: type 12,ORGANIZATION_TYPE_Industry: type 13,ORGANIZATION_TYPE_Industry: type 2,ORGANIZATION_TYPE_Industry: type 3,ORGANIZATION_TYPE_Industry: type 4,ORGANIZATION_TYPE_Industry: type 5,ORGANIZATION_TYPE_Industry: type 6,ORGANIZATION_TYPE_Industry: type 7,ORGANIZATION_TYPE_Industry: type 8,ORGANIZATION_TYPE_Industry: type 9,ORGANIZATION_TYPE_Insurance,ORGANIZATION_TYPE_Kindergarten,ORGANIZATION_TYPE_Legal Services,ORGANIZATION_TYPE_Medicine,ORGANIZATION_TYPE_Military,ORGANIZATION_TYPE_Mobile,ORGANIZATION_TYPE_Other,ORGANIZATION_TYPE_Police,ORGANIZATION_TYPE_Postal,ORGANIZATION_TYPE_Realtor,ORGANIZATION_TYPE_Religion,ORGANIZATION_TYPE_Restaurant,ORGANIZATION_TYPE_School,ORGANIZATION_TYPE_Security,ORGANIZATION_TYPE_Security Ministries,ORGANIZATION_TYPE_Self-employed,ORGANIZATION_TYPE_Services,ORGANIZATION_TYPE_Telecom,ORGANIZATION_TYPE_Trade: type 1,ORGANIZATION_TYPE_Trade: type 2,ORGANIZATION_TYPE_Trade: type 3,ORGANIZATION_TYPE_Trade: type 4,ORGANIZATION_TYPE_Trade: type 5,ORGANIZATION_TYPE_Trade: type 6,ORGANIZATION_TYPE_Trade: type 7,ORGANIZATION_TYPE_Transport: type 1,ORGANIZATION_TYPE_Transport: type 2,ORGANIZATION_TYPE_Transport: type 3,ORGANIZATION_TYPE_Transport: type 4,ORGANIZATION_TYPE_University,ORGANIZATION_TYPE_XNA
0,,1,144000.0,961146.0,28233.0,688500.0,0.025164,-12108,-2372,-2446.0,-3022,,1,1,0,1,1,0,3.0,2,2,0,0,0,0,0,0,,0.720416,,2.0,0.0,2.0,0.0,-1.0,,,,,6.674625,34.043354,1.396,0.196062,4.78125,0.041007,0.195904,138405000000.0,27136040000.0,661749000000.0,4065552000.0,99144000000.0,19438420000.0,28720176,,,,,1334153.0,199884.349347,692424.556171,103739.843987,,,,,392.946034,-2350963000.0,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,,0,103500.0,296280.0,16069.5,225000.0,0.00702,-17907,-1712,-10450.0,-253,,1,1,1,1,0,0,2.0,2,2,0,0,0,0,0,0,,0.287306,,5.0,0.0,5.0,0.0,-212.0,,,,,2.862609,18.437412,1.3168,0.155261,2.173913,0.07142,0.095605,30664980000.0,4761071000.0,66663000000.0,1663193000.0,23287500000.0,3615638000.0,30656784,,,,,1031230.0,360241.507727,85123.098282,29736.19776,,,,,28.352153,-3096126000.0,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,,1,180000.0,183694.5,11236.5,139500.0,0.006852,-15221,-553,-1056.0,-4495,,1,1,0,1,0,0,2.0,3,3,0,0,0,1,1,0,,0.352456,0.389339,7.0,0.0,7.0,0.0,-428.0,0.0,1.0,1.0,1.0,1.020525,16.348018,1.316806,0.062425,0.775,0.080548,0.036331,33065010000.0,2064083000.0,25625380000.0,2022570000.0,25110000000.0,1567492000.0,8417213,,,,,521182.8,510700.65697,64744.220975,63442.072438,471810.238375,462321.097841,71519.394699,70080.982532,173.953125,-193981400.0,True,False,True,False,False,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False


In [55]:
train

Unnamed: 0,TARGET,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,CREDIT_INCOME_RATIO,CREDIT_ANNUITY_RATIO,CREDIT_GOODS_RATIO,ANNUITY_INCOME_RATIO,GOODS_INCOME_RATIO,ANNUITY_GOODS_RATIO,EMPLOYED_BIRTH_RATIO,CREDIT_INCOME_PRODUCT,CREDIT_ANNUITY_PRODUCT,CREDIT_GOODS_PRODUCT,INCOME_ANNUITY_PRODUCT,INCOME_GOODS_PRODUCT,ANNUITY_GOODS_PRODUCT,EMPLOYED_BIRTH_PRODUCT,CREDIT_EXT_SOURCE_1_RATIO,INCOME_EXT_SOURCE_1_RATIO,CREDIT_EXT_SOURCE_1_PRODUCT,INCOME_EXT_SOURCE_1_PRODUCT,CREDIT_EXT_SOURCE_2_RATIO,INCOME_EXT_SOURCE_2_RATIO,CREDIT_EXT_SOURCE_2_PRODUCT,INCOME_EXT_SOURCE_2_PRODUCT,CREDIT_EXT_SOURCE_3_RATIO,INCOME_EXT_SOURCE_3_RATIO,CREDIT_EXT_SOURCE_3_PRODUCT,INCOME_EXT_SOURCE_3_PRODUCT,CREDIT_REGISTRATION_RATIO,CREDIT_REGISTRATION_PRODUCT,NAME_CONTRACT_TYPE_Cash loans,NAME_CONTRACT_TYPE_Revolving loans,CODE_GENDER_F,CODE_GENDER_M,CODE_GENDER_XNA,FLAG_OWN_CAR_N,FLAG_OWN_CAR_Y,FLAG_OWN_REALTY_N,FLAG_OWN_REALTY_Y,NAME_TYPE_SUITE_Children,NAME_TYPE_SUITE_Family,NAME_TYPE_SUITE_Group of people,NAME_TYPE_SUITE_Other_A,NAME_TYPE_SUITE_Other_B,"NAME_TYPE_SUITE_Spouse, partner",NAME_TYPE_SUITE_Unaccompanied,NAME_INCOME_TYPE_Businessman,NAME_INCOME_TYPE_Commercial associate,NAME_INCOME_TYPE_Maternity leave,NAME_INCOME_TYPE_Pensioner,NAME_INCOME_TYPE_State servant,NAME_INCOME_TYPE_Student,NAME_INCOME_TYPE_Unemployed,NAME_INCOME_TYPE_Working,NAME_EDUCATION_TYPE_Academic degree,NAME_EDUCATION_TYPE_Higher education,NAME_EDUCATION_TYPE_Incomplete higher,NAME_EDUCATION_TYPE_Lower secondary,NAME_EDUCATION_TYPE_Secondary / secondary special,NAME_FAMILY_STATUS_Civil marriage,NAME_FAMILY_STATUS_Married,NAME_FAMILY_STATUS_Separated,NAME_FAMILY_STATUS_Single / not married,NAME_FAMILY_STATUS_Unknown,NAME_FAMILY_STATUS_Widow,NAME_HOUSING_TYPE_Co-op apartment,NAME_HOUSING_TYPE_House / apartment,NAME_HOUSING_TYPE_Municipal apartment,NAME_HOUSING_TYPE_Office apartment,NAME_HOUSING_TYPE_Rented apartment,NAME_HOUSING_TYPE_With parents,OCCUPATION_TYPE_Accountants,OCCUPATION_TYPE_Cleaning staff,OCCUPATION_TYPE_Cooking staff,OCCUPATION_TYPE_Core staff,OCCUPATION_TYPE_Drivers,OCCUPATION_TYPE_HR staff,OCCUPATION_TYPE_High skill tech staff,OCCUPATION_TYPE_IT staff,OCCUPATION_TYPE_Laborers,OCCUPATION_TYPE_Low-skill Laborers,OCCUPATION_TYPE_Managers,OCCUPATION_TYPE_Medicine staff,OCCUPATION_TYPE_Private service staff,OCCUPATION_TYPE_Realty agents,OCCUPATION_TYPE_Sales staff,OCCUPATION_TYPE_Secretaries,OCCUPATION_TYPE_Security staff,OCCUPATION_TYPE_Waiters/barmen staff,ORGANIZATION_TYPE_Advertising,ORGANIZATION_TYPE_Agriculture,ORGANIZATION_TYPE_Bank,ORGANIZATION_TYPE_Business Entity Type 1,ORGANIZATION_TYPE_Business Entity Type 2,ORGANIZATION_TYPE_Business Entity Type 3,ORGANIZATION_TYPE_Cleaning,ORGANIZATION_TYPE_Construction,ORGANIZATION_TYPE_Culture,ORGANIZATION_TYPE_Electricity,ORGANIZATION_TYPE_Emergency,ORGANIZATION_TYPE_Government,ORGANIZATION_TYPE_Hotel,ORGANIZATION_TYPE_Housing,ORGANIZATION_TYPE_Industry: type 1,ORGANIZATION_TYPE_Industry: type 10,ORGANIZATION_TYPE_Industry: type 11,ORGANIZATION_TYPE_Industry: type 12,ORGANIZATION_TYPE_Industry: type 13,ORGANIZATION_TYPE_Industry: type 2,ORGANIZATION_TYPE_Industry: type 3,ORGANIZATION_TYPE_Industry: type 4,ORGANIZATION_TYPE_Industry: type 5,ORGANIZATION_TYPE_Industry: type 6,ORGANIZATION_TYPE_Industry: type 7,ORGANIZATION_TYPE_Industry: type 8,ORGANIZATION_TYPE_Industry: type 9,ORGANIZATION_TYPE_Insurance,ORGANIZATION_TYPE_Kindergarten,ORGANIZATION_TYPE_Legal Services,ORGANIZATION_TYPE_Medicine,ORGANIZATION_TYPE_Military,ORGANIZATION_TYPE_Mobile,ORGANIZATION_TYPE_Other,ORGANIZATION_TYPE_Police,ORGANIZATION_TYPE_Postal,ORGANIZATION_TYPE_Realtor,ORGANIZATION_TYPE_Religion,ORGANIZATION_TYPE_Restaurant,ORGANIZATION_TYPE_School,ORGANIZATION_TYPE_Security,ORGANIZATION_TYPE_Security Ministries,ORGANIZATION_TYPE_Self-employed,ORGANIZATION_TYPE_Services,ORGANIZATION_TYPE_Telecom,ORGANIZATION_TYPE_Trade: type 1,ORGANIZATION_TYPE_Trade: type 2,ORGANIZATION_TYPE_Trade: type 3,ORGANIZATION_TYPE_Trade: type 4,ORGANIZATION_TYPE_Trade: type 5,ORGANIZATION_TYPE_Trade: type 6,ORGANIZATION_TYPE_Trade: type 7,ORGANIZATION_TYPE_Transport: type 1,ORGANIZATION_TYPE_Transport: type 2,ORGANIZATION_TYPE_Transport: type 3,ORGANIZATION_TYPE_Transport: type 4,ORGANIZATION_TYPE_University,ORGANIZATION_TYPE_XNA
0,0.0,0,112500.0,755190.0,36328.5,675000.0,0.010032,-9233,-878,-333.0,-522,,1,1,1,1,0,0,2.0,2,2,0,1,1,0,1,1,,0.372591,,0.0,0.0,0.0,0.0,-292.0,,,,,6.712800,20.787811,1.118800,0.322920,6.000000,0.053820,0.095094,8.495888e+10,2.743492e+10,5.097532e+11,4.086956e+09,7.593750e+10,2.452174e+10,8106574,,,,,2.026857e+06,301939.131163,281376.735276,41916.448468,,,,,2267.837831,-2.514783e+08,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,0.0,0,225000.0,585000.0,16893.0,585000.0,0.008019,-20148,365243,-4469.0,-3436,,1,0,0,1,0,0,2.0,2,2,0,0,0,0,0,0,,0.449567,0.553165,0.0,0.0,0.0,0.0,-617.0,0.0,0.0,0.0,1.0,2.600000,34.629728,1.000000,0.075080,2.600000,0.028877,-18.128003,1.316250e+11,9.882405e+09,3.422250e+11,3.800925e+09,1.316250e+11,9.882405e+09,-7358915964,,,,,1.301249e+06,500480.552704,262996.646938,101152.556515,1.057549e+06,406749.732494,323601.348781,124462.057223,130.901768,-2.614365e+09,True,False,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
2,0.0,0,54000.0,334152.0,18256.5,270000.0,0.004960,-18496,-523,-3640.0,-2050,,1,1,1,1,1,0,2.0,2,2,0,0,0,0,0,0,,0.569503,,4.0,0.0,4.0,0.0,-542.0,,,,,6.188000,18.303180,1.237600,0.338083,5.000000,0.067617,0.028276,1.804421e+10,6.100446e+09,9.022104e+10,9.858510e+08,1.458000e+10,4.929255e+09,9673408,,,,,5.867418e+05,94819.293500,190300.683015,30753.180836,,,,,91.800000,-1.216313e+09,True,False,True,False,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,0.0,0,67500.0,152820.0,8901.0,135000.0,0.005002,-24177,365243,-4950.0,-3951,,1,0,0,1,1,0,1.0,3,3,0,0,0,0,0,0,,0.105235,0.767523,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.264000,17.168857,1.132000,0.131867,2.000000,0.065933,-15.107044,1.031535e+10,1.360251e+09,2.063070e+10,6.008175e+08,9.112500e+09,1.201635e+09,-8830480011,,,,,1.452170e+06,641418.005730,16081.949503,7103.334586,1.991078e+05,87945.120669,117292.880853,51807.809564,30.872727,-7.564590e+08,True,False,True,False,False,True,False,False,True,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
4,1.0,0,157500.0,271066.5,21546.0,234000.0,0.006296,-10685,-697,-5101.0,-3226,,1,1,1,1,0,0,2.0,3,3,0,0,0,0,1,1,0.342344,0.202490,0.669057,0.0,0.0,0.0,0.0,-1243.0,0.0,0.0,0.0,4.0,1.721057,12.580827,1.158404,0.136800,1.485714,0.092077,0.065232,4.269297e+10,5.840399e+09,6.342956e+10,3.393495e+09,3.685500e+10,5.041764e+09,7447445,7.917935e+05,460062.305904,92797.972133,53919.169690,1.338661e+06,777813.421035,54888.179808,31892.130971,4.051467e+05,235405.707502,181358.856556,105376.429428,53.139875,-1.382710e+09,True,False,False,True,False,True,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
171197,0.0,0,83250.0,404325.0,20772.0,337500.0,0.031329,-20529,-3059,-11581.0,-3689,21.0,1,1,1,1,0,0,2.0,2,2,0,0,0,0,1,1,,0.404560,0.768808,0.0,0.0,0.0,0.0,-2341.0,0.0,0.0,1.0,0.0,4.856757,19.464905,1.198000,0.249514,4.054054,0.061547,0.149009,3.366006e+10,8.398639e+09,1.364597e+11,1.729269e+09,2.809688e+10,7.010550e+09,62798211,,,,,9.994175e+05,205778.782548,163573.584369,33679.591662,5.259111e+05,108284.432487,310848.121884,64003.230438,34.912788,-4.682488e+09,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
171198,0.0,0,247500.0,601470.0,29065.5,450000.0,0.010006,-22083,-129,-4629.0,-1773,1.0,1,1,0,1,0,0,1.0,2,1,0,0,0,0,0,0,,0.608542,,0.0,0.0,0.0,0.0,-1688.0,0.0,0.0,1.0,5.0,2.430182,20.693606,1.336600,0.117436,1.818182,0.064590,0.005842,1.488638e+11,1.748203e+10,2.706615e+11,7.193711e+09,1.113750e+11,1.307948e+10,2848707,,,,,9.883778e+05,406709.409349,366019.514067,150614.045142,,,,,129.935191,-2.784205e+09,True,False,True,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
171199,0.0,2,292500.0,1237684.5,49216.5,1138500.0,0.006629,-11053,-2536,-4858.0,-3393,,1,1,0,1,0,1,4.0,2,2,0,0,0,0,0,0,,0.664305,0.758393,2.0,1.0,2.0,1.0,-515.0,0.0,0.0,0.0,1.0,4.231400,25.147755,1.087119,0.168262,3.892308,0.043229,0.229440,3.620227e+11,6.091450e+10,1.409104e+12,1.439583e+10,3.330112e+11,5.603299e+10,28030408,,,,,1.863124e+06,440309.113433,822200.041383,194309.221861,1.631981e+06,385683.399655,938651.337391,221829.970551,254.772437,-6.012671e+09,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
171200,0.0,0,112500.0,239850.0,25186.5,225000.0,0.009657,-8505,-165,-3318.0,-1176,7.0,1,1,0,1,0,0,1.0,2,2,0,0,0,1,1,0,0.210918,0.627050,,0.0,0.0,0.0,0.0,-1133.0,,,,,2.132000,9.522959,1.066000,0.223880,2.000000,0.111940,0.019400,2.698312e+10,6.040982e+09,5.396625e+10,2.833481e+09,2.531250e+10,5.666962e+09,1403325,1.137166e+06,533380.083405,50588.685083,23728.276305,3.825049e+05,179411.286114,150397.907248,70543.108465,,,,,72.287523,-7.958223e+08,True,False,False,True,False,False,True,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [56]:
test

Unnamed: 0,TARGET,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,CREDIT_INCOME_RATIO,CREDIT_ANNUITY_RATIO,CREDIT_GOODS_RATIO,ANNUITY_INCOME_RATIO,GOODS_INCOME_RATIO,ANNUITY_GOODS_RATIO,EMPLOYED_BIRTH_RATIO,CREDIT_INCOME_PRODUCT,CREDIT_ANNUITY_PRODUCT,CREDIT_GOODS_PRODUCT,INCOME_ANNUITY_PRODUCT,INCOME_GOODS_PRODUCT,ANNUITY_GOODS_PRODUCT,EMPLOYED_BIRTH_PRODUCT,CREDIT_EXT_SOURCE_1_RATIO,INCOME_EXT_SOURCE_1_RATIO,CREDIT_EXT_SOURCE_1_PRODUCT,INCOME_EXT_SOURCE_1_PRODUCT,CREDIT_EXT_SOURCE_2_RATIO,INCOME_EXT_SOURCE_2_RATIO,CREDIT_EXT_SOURCE_2_PRODUCT,INCOME_EXT_SOURCE_2_PRODUCT,CREDIT_EXT_SOURCE_3_RATIO,INCOME_EXT_SOURCE_3_RATIO,CREDIT_EXT_SOURCE_3_PRODUCT,INCOME_EXT_SOURCE_3_PRODUCT,CREDIT_REGISTRATION_RATIO,CREDIT_REGISTRATION_PRODUCT,NAME_CONTRACT_TYPE_Cash loans,NAME_CONTRACT_TYPE_Revolving loans,CODE_GENDER_F,CODE_GENDER_M,CODE_GENDER_XNA,FLAG_OWN_CAR_N,FLAG_OWN_CAR_Y,FLAG_OWN_REALTY_N,FLAG_OWN_REALTY_Y,NAME_TYPE_SUITE_Children,NAME_TYPE_SUITE_Family,NAME_TYPE_SUITE_Group of people,NAME_TYPE_SUITE_Other_A,NAME_TYPE_SUITE_Other_B,"NAME_TYPE_SUITE_Spouse, partner",NAME_TYPE_SUITE_Unaccompanied,NAME_INCOME_TYPE_Businessman,NAME_INCOME_TYPE_Commercial associate,NAME_INCOME_TYPE_Maternity leave,NAME_INCOME_TYPE_Pensioner,NAME_INCOME_TYPE_State servant,NAME_INCOME_TYPE_Student,NAME_INCOME_TYPE_Unemployed,NAME_INCOME_TYPE_Working,NAME_EDUCATION_TYPE_Academic degree,NAME_EDUCATION_TYPE_Higher education,NAME_EDUCATION_TYPE_Incomplete higher,NAME_EDUCATION_TYPE_Lower secondary,NAME_EDUCATION_TYPE_Secondary / secondary special,NAME_FAMILY_STATUS_Civil marriage,NAME_FAMILY_STATUS_Married,NAME_FAMILY_STATUS_Separated,NAME_FAMILY_STATUS_Single / not married,NAME_FAMILY_STATUS_Unknown,NAME_FAMILY_STATUS_Widow,NAME_HOUSING_TYPE_Co-op apartment,NAME_HOUSING_TYPE_House / apartment,NAME_HOUSING_TYPE_Municipal apartment,NAME_HOUSING_TYPE_Office apartment,NAME_HOUSING_TYPE_Rented apartment,NAME_HOUSING_TYPE_With parents,OCCUPATION_TYPE_Accountants,OCCUPATION_TYPE_Cleaning staff,OCCUPATION_TYPE_Cooking staff,OCCUPATION_TYPE_Core staff,OCCUPATION_TYPE_Drivers,OCCUPATION_TYPE_HR staff,OCCUPATION_TYPE_High skill tech staff,OCCUPATION_TYPE_IT staff,OCCUPATION_TYPE_Laborers,OCCUPATION_TYPE_Low-skill Laborers,OCCUPATION_TYPE_Managers,OCCUPATION_TYPE_Medicine staff,OCCUPATION_TYPE_Private service staff,OCCUPATION_TYPE_Realty agents,OCCUPATION_TYPE_Sales staff,OCCUPATION_TYPE_Secretaries,OCCUPATION_TYPE_Security staff,OCCUPATION_TYPE_Waiters/barmen staff,ORGANIZATION_TYPE_Advertising,ORGANIZATION_TYPE_Agriculture,ORGANIZATION_TYPE_Bank,ORGANIZATION_TYPE_Business Entity Type 1,ORGANIZATION_TYPE_Business Entity Type 2,ORGANIZATION_TYPE_Business Entity Type 3,ORGANIZATION_TYPE_Cleaning,ORGANIZATION_TYPE_Construction,ORGANIZATION_TYPE_Culture,ORGANIZATION_TYPE_Electricity,ORGANIZATION_TYPE_Emergency,ORGANIZATION_TYPE_Government,ORGANIZATION_TYPE_Hotel,ORGANIZATION_TYPE_Housing,ORGANIZATION_TYPE_Industry: type 1,ORGANIZATION_TYPE_Industry: type 10,ORGANIZATION_TYPE_Industry: type 11,ORGANIZATION_TYPE_Industry: type 12,ORGANIZATION_TYPE_Industry: type 13,ORGANIZATION_TYPE_Industry: type 2,ORGANIZATION_TYPE_Industry: type 3,ORGANIZATION_TYPE_Industry: type 4,ORGANIZATION_TYPE_Industry: type 5,ORGANIZATION_TYPE_Industry: type 6,ORGANIZATION_TYPE_Industry: type 7,ORGANIZATION_TYPE_Industry: type 8,ORGANIZATION_TYPE_Industry: type 9,ORGANIZATION_TYPE_Insurance,ORGANIZATION_TYPE_Kindergarten,ORGANIZATION_TYPE_Legal Services,ORGANIZATION_TYPE_Medicine,ORGANIZATION_TYPE_Military,ORGANIZATION_TYPE_Mobile,ORGANIZATION_TYPE_Other,ORGANIZATION_TYPE_Police,ORGANIZATION_TYPE_Postal,ORGANIZATION_TYPE_Realtor,ORGANIZATION_TYPE_Religion,ORGANIZATION_TYPE_Restaurant,ORGANIZATION_TYPE_School,ORGANIZATION_TYPE_Security,ORGANIZATION_TYPE_Security Ministries,ORGANIZATION_TYPE_Self-employed,ORGANIZATION_TYPE_Services,ORGANIZATION_TYPE_Telecom,ORGANIZATION_TYPE_Trade: type 1,ORGANIZATION_TYPE_Trade: type 2,ORGANIZATION_TYPE_Trade: type 3,ORGANIZATION_TYPE_Trade: type 4,ORGANIZATION_TYPE_Trade: type 5,ORGANIZATION_TYPE_Trade: type 6,ORGANIZATION_TYPE_Trade: type 7,ORGANIZATION_TYPE_Transport: type 1,ORGANIZATION_TYPE_Transport: type 2,ORGANIZATION_TYPE_Transport: type 3,ORGANIZATION_TYPE_Transport: type 4,ORGANIZATION_TYPE_University,ORGANIZATION_TYPE_XNA
0,,1,144000.0,961146.0,28233.0,688500.0,0.025164,-12108,-2372,-2446.0,-3022,,1,1,0,1,1,0,3.0,2,2,0,0,0,0,0,0,,0.720416,,2.0,0.0,2.0,0.0,-1.0,,,,,6.674625,34.043354,1.396000,0.196062,4.781250,0.041007,0.195904,1.384050e+11,2.713604e+10,6.617490e+11,4.065552e+09,9.914400e+10,1.943842e+10,28720176,,,,,1.334153e+06,1.998843e+05,692424.556171,103739.843987,,,,,392.946034,-2.350963e+09,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,,0,103500.0,296280.0,16069.5,225000.0,0.007020,-17907,-1712,-10450.0,-253,,1,1,1,1,0,0,2.0,2,2,0,0,0,0,0,0,,0.287306,,5.0,0.0,5.0,0.0,-212.0,,,,,2.862609,18.437412,1.316800,0.155261,2.173913,0.071420,0.095605,3.066498e+10,4.761071e+09,6.666300e+10,1.663193e+09,2.328750e+10,3.615638e+09,30656784,,,,,1.031230e+06,3.602415e+05,85123.098282,29736.197760,,,,,28.352153,-3.096126e+09,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,,1,180000.0,183694.5,11236.5,139500.0,0.006852,-15221,-553,-1056.0,-4495,,1,1,0,1,0,0,2.0,3,3,0,0,0,1,1,0,,0.352456,0.389339,7.0,0.0,7.0,0.0,-428.0,0.0,1.0,1.0,1.0,1.020525,16.348018,1.316806,0.062425,0.775000,0.080548,0.036331,3.306501e+10,2.064083e+09,2.562538e+10,2.022570e+09,2.511000e+10,1.567492e+09,8417213,,,,,5.211828e+05,5.107007e+05,64744.220975,63442.072438,4.718102e+05,4.623211e+05,71519.394699,70080.982532,173.953125,-1.939814e+08,True,False,True,False,False,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
3,,2,225000.0,450000.0,22500.0,450000.0,0.035792,-11217,-1438,-6096.0,-1189,,1,1,0,1,0,0,4.0,2,2,0,0,0,0,0,0,,0.470384,0.217629,2.0,0.0,2.0,0.0,-442.0,0.0,0.0,0.0,3.0,2.000000,20.000000,1.000000,0.100000,2.000000,0.050000,0.128198,1.012500e+11,1.012500e+10,2.025000e+11,5.062500e+09,1.012500e+11,1.012500e+10,16130046,,,,,9.566629e+05,4.783314e+05,211672.867328,105836.433664,2.067734e+06,1.033867e+06,97932.834125,48966.417063,73.818898,-2.743200e+09,False,True,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,,2,144000.0,545040.0,26640.0,450000.0,0.020713,-11415,-2362,-3257.0,-1728,14.0,1,1,0,1,0,0,4.0,3,3,0,0,0,0,0,0,0.269931,0.373133,,2.0,0.0,2.0,0.0,-1333.0,0.0,0.0,0.0,3.0,3.785000,20.459459,1.211200,0.185000,3.125000,0.059200,0.206921,7.848576e+10,1.451987e+10,2.452680e+11,3.836160e+09,6.480000e+10,1.198800e+10,26962230,2.019176e+06,5.334678e+05,147123.159144,38870.055256,1.460710e+06,3.859207e+05,203372.233717,53731.105341,,,,,167.344182,-1.775195e+09,True,False,True,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61495,,0,315000.0,1288350.0,37800.0,1125000.0,0.007020,-11430,-792,-9772.0,-2705,12.0,1,1,0,1,0,0,2.0,2,2,0,0,0,0,0,0,0.263678,0.018172,0.307737,0.0,0.0,0.0,0.0,-1.0,0.0,1.0,1.0,1.0,4.090000,34.083333,1.145200,0.120000,3.571429,0.033600,0.069291,4.058302e+11,4.869963e+10,1.449394e+12,1.190700e+10,3.543750e+11,4.252500e+10,9052560,4.886047e+06,1.194632e+06,339710.101547,83058.704535,7.089224e+07,1.733307e+07,23412.355884,5724.292392,4.186520e+06,1.023599e+06,396472.572780,96937.059359,131.840974,-1.258976e+10,True,False,True,False,False,False,True,False,True,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
61496,,0,90000.0,273636.0,15408.0,247500.0,0.006671,-17181,-839,-5125.0,-668,,1,1,0,1,0,0,2.0,2,2,0,0,0,1,1,0,,0.668578,0.434733,0.0,0.0,0.0,0.0,-2732.0,0.0,0.0,0.0,0.0,3.040400,17.759346,1.105600,0.171200,2.750000,0.062255,0.048833,2.462724e+10,4.216183e+09,6.772491e+10,1.386720e+09,2.227500e+10,3.813480e+09,14414859,,,,,4.092800e+05,1.346139e+05,182946.981663,60172.010809,6.294328e+05,2.070230e+05,118958.667255,39125.992388,53.392390,-1.402384e+09,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
61497,,0,144000.0,291384.0,26725.5,270000.0,0.018801,-14515,-722,-7225.0,-4795,,1,1,1,1,0,0,1.0,2,2,0,0,0,0,0,0,0.510226,0.574151,,0.0,0.0,0.0,0.0,-615.0,0.0,0.0,1.0,0.0,2.023500,10.902846,1.079200,0.185594,1.875000,0.098983,0.049742,4.195930e+10,7.787383e+09,7.867368e+10,3.848472e+09,3.888000e+10,7.215885e+09,10479830,5.710864e+05,2.822270e+05,148671.837945,73472.615737,5.075035e+05,2.508048e+05,167298.313471,82677.693833,,,,,40.329965,-2.105249e+09,True,False,True,False,False,True,False,False,True,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
61498,,1,193500.0,746280.0,59094.0,675000.0,0.002042,-16914,-8756,-5233.0,-231,,1,1,0,1,0,0,3.0,3,3,0,0,0,0,1,1,0.353295,0.226714,,2.0,0.0,2.0,0.0,-1610.0,0.0,0.0,1.0,3.0,3.856744,12.628693,1.105600,0.305395,3.488372,0.087547,0.517678,1.444052e+11,4.410067e+10,5.037390e+11,1.143469e+10,1.306125e+11,3.988845e+10,148098984,2.112334e+06,5.476987e+05,263657.340583,68362.672727,3.291712e+06,8.534951e+05,169191.994474,43869.125436,,,,,142.610357,-3.905283e+09,True,False,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [57]:
test.drop("TARGET", axis=1, inplace=True)

In [58]:
# ライブラリの読み込み
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score

データを説明変数と目的変数に分割します。今回の目的変数は「TARGET」なので、それ以外が説明変数となります。

In [59]:
# 目的変数と説明変数に分割
train_x = train.drop("TARGET", axis=1)
train_y = train["TARGET"]
test_x = test

In [60]:
# 学習データを学習データとバリデーションデータに分ける
from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=True, random_state=71)
tr_idx, va_idx = list(kf.split(train_x))[0]
tr_x, va_x = train_x.iloc[tr_idx], train_x.iloc[va_idx]
tr_y, va_y = train_y.iloc[tr_idx], train_y.iloc[va_idx]

In [62]:
import optuna
import xgboost as xgb
from sklearn.metrics import roc_auc_score
import time
from sklearn.model_selection import StratifiedKFold

def evaluate(features):
    train_x = tr_x[features]
    train_y = tr_y
    valid_x = va_x[features]
    valid_y = va_y

    # CVパラメータ設定（fold数は適宜調整）
    n_splits = 5
    skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=13)

    params = {
        "random_state": 13,
        "verbose": 0
    }

    # クロスバリデーションを通じて算出したスコアを格納
    auc_scores = []

    # CVループ
    for train_index, valid_index in skf.split(train_x, train_y):
        X_train_fold, X_valid_fold = train_x.iloc[train_index], train_x.iloc[valid_index]
        y_train_fold, y_valid_fold = train_y.iloc[train_index], train_y.iloc[valid_index]

        model = CatBoostClassifier(**params)
        model.fit(X_train_fold, y_train_fold)

        fold_pred = model.predict_proba(X_valid_fold)[:, 1]
        fold_auc = roc_auc_score(y_valid_fold, fold_pred)
        auc_scores.append(fold_auc)

    # 全fold平均スコア
    mean_auc = sum(auc_scores) / len(auc_scores)

    return mean_auc

def objective(trial):
    # trial.suggest_categorical等で特徴量選択を行う
    # ここでは、特徴量を一つずつ「使用するか・使用しないか」を選択する例を示す
    selected_features = []
    for col in train_x.columns:
        use_feature = trial.suggest_categorical(col, [True, False])
        if use_feature:
            selected_features.append(col)

    # 特徴量が一つも選ばれないケースの回避（少なくとも一つは選ぶ）
    if len(selected_features) == 0:
        return 0.0

    # 評価用関数を使用してスコアを計算
    score = evaluate(selected_features)
    return score

# Optunaによる最適化
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)  # 試行回数は適宜調整
best_trial = study.best_trial

print("Best Score:", best_trial.value)
print("Best Feature Set:")
best_features = []
for k, v in best_trial.params.items():
    if v == True:
        best_features.append(k)
print(best_features)



[I 2024-12-10 04:49:43,862] A new study created in memory with name: no-name-5babaab0-1289-48f6-9c26-43a1c6164dec
[I 2024-12-10 04:53:27,309] Trial 0 finished with value: 0.754416469378179 and parameters: {'CNT_CHILDREN': False, 'AMT_INCOME_TOTAL': False, 'AMT_CREDIT': False, 'AMT_ANNUITY': False, 'AMT_GOODS_PRICE': True, 'REGION_POPULATION_RELATIVE': True, 'DAYS_BIRTH': True, 'DAYS_EMPLOYED': True, 'DAYS_REGISTRATION': False, 'DAYS_ID_PUBLISH': True, 'OWN_CAR_AGE': False, 'FLAG_MOBIL': True, 'FLAG_EMP_PHONE': True, 'FLAG_WORK_PHONE': True, 'FLAG_CONT_MOBILE': False, 'FLAG_PHONE': False, 'FLAG_EMAIL': True, 'CNT_FAM_MEMBERS': False, 'REGION_RATING_CLIENT': False, 'REGION_RATING_CLIENT_W_CITY': False, 'REG_REGION_NOT_LIVE_REGION': True, 'REG_REGION_NOT_WORK_REGION': True, 'LIVE_REGION_NOT_WORK_REGION': False, 'REG_CITY_NOT_LIVE_CITY': False, 'REG_CITY_NOT_WORK_CITY': True, 'LIVE_CITY_NOT_WORK_CITY': False, 'EXT_SOURCE_1': True, 'EXT_SOURCE_2': True, 'EXT_SOURCE_3': False, 'OBS_30_CNT_SO

Best Score: 0.7558534017076193
Best Feature Set:
['AMT_ANNUITY', 'AMT_GOODS_PRICE', 'REGION_POPULATION_RELATIVE', 'DAYS_BIRTH', 'DAYS_EMPLOYED', 'DAYS_REGISTRATION', 'DAYS_ID_PUBLISH', 'OWN_CAR_AGE', 'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE', 'FLAG_EMAIL', 'REG_REGION_NOT_LIVE_REGION', 'REG_CITY_NOT_WORK_CITY', 'EXT_SOURCE_1', 'EXT_SOURCE_2', 'OBS_30_CNT_SOCIAL_CIRCLE', 'DEF_30_CNT_SOCIAL_CIRCLE', 'DAYS_LAST_PHONE_CHANGE', 'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_QRT', 'AMT_REQ_CREDIT_BUREAU_YEAR', 'CREDIT_ANNUITY_RATIO', 'CREDIT_GOODS_RATIO', 'ANNUITY_INCOME_RATIO', 'ANNUITY_GOODS_RATIO', 'CREDIT_INCOME_PRODUCT', 'INCOME_ANNUITY_PRODUCT', 'INCOME_GOODS_PRODUCT', 'CREDIT_EXT_SOURCE_1_RATIO', 'INCOME_EXT_SOURCE_1_PRODUCT', 'CREDIT_EXT_SOURCE_2_PRODUCT', 'CREDIT_EXT_SOURCE_3_RATIO', 'CREDIT_EXT_SOURCE_3_PRODUCT', 'INCOME_EXT_SOURCE_3_PRODUCT', 'NAME_CONTRACT_TYPE_Revolving loans', 'CODE_GENDER_M', 'FLAG_OWN_CAR_Y', 'FLAG_OWN_REALTY_N', 'NAME_TYPE_SUITE_Family', 'NAME_

In [63]:
train[list(best_features)]

Unnamed: 0,AMT_ANNUITY,AMT_GOODS_PRICE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_EMAIL,REG_REGION_NOT_LIVE_REGION,REG_CITY_NOT_WORK_CITY,EXT_SOURCE_1,EXT_SOURCE_2,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,CREDIT_ANNUITY_RATIO,CREDIT_GOODS_RATIO,ANNUITY_INCOME_RATIO,ANNUITY_GOODS_RATIO,CREDIT_INCOME_PRODUCT,INCOME_ANNUITY_PRODUCT,INCOME_GOODS_PRODUCT,CREDIT_EXT_SOURCE_1_RATIO,INCOME_EXT_SOURCE_1_PRODUCT,CREDIT_EXT_SOURCE_2_PRODUCT,CREDIT_EXT_SOURCE_3_RATIO,CREDIT_EXT_SOURCE_3_PRODUCT,INCOME_EXT_SOURCE_3_PRODUCT,NAME_CONTRACT_TYPE_Revolving loans,CODE_GENDER_M,FLAG_OWN_CAR_Y,FLAG_OWN_REALTY_N,NAME_TYPE_SUITE_Family,NAME_TYPE_SUITE_Group of people,NAME_INCOME_TYPE_Commercial associate,NAME_INCOME_TYPE_Student,NAME_INCOME_TYPE_Unemployed,NAME_INCOME_TYPE_Working,NAME_EDUCATION_TYPE_Academic degree,NAME_EDUCATION_TYPE_Higher education,NAME_EDUCATION_TYPE_Incomplete higher,NAME_EDUCATION_TYPE_Secondary / secondary special,NAME_FAMILY_STATUS_Unknown,NAME_FAMILY_STATUS_Widow,NAME_HOUSING_TYPE_Co-op apartment,OCCUPATION_TYPE_Accountants,OCCUPATION_TYPE_Cooking staff,OCCUPATION_TYPE_Core staff,OCCUPATION_TYPE_HR staff,OCCUPATION_TYPE_High skill tech staff,OCCUPATION_TYPE_IT staff,OCCUPATION_TYPE_Laborers,OCCUPATION_TYPE_Managers,OCCUPATION_TYPE_Medicine staff,OCCUPATION_TYPE_Private service staff,OCCUPATION_TYPE_Realty agents,OCCUPATION_TYPE_Sales staff,OCCUPATION_TYPE_Secretaries,OCCUPATION_TYPE_Waiters/barmen staff,ORGANIZATION_TYPE_Advertising,ORGANIZATION_TYPE_Agriculture,ORGANIZATION_TYPE_Bank,ORGANIZATION_TYPE_Business Entity Type 2,ORGANIZATION_TYPE_Business Entity Type 3,ORGANIZATION_TYPE_Cleaning,ORGANIZATION_TYPE_Construction,ORGANIZATION_TYPE_Electricity,ORGANIZATION_TYPE_Government,ORGANIZATION_TYPE_Industry: type 3,ORGANIZATION_TYPE_Industry: type 5,ORGANIZATION_TYPE_Industry: type 6,ORGANIZATION_TYPE_Industry: type 7,ORGANIZATION_TYPE_Industry: type 9,ORGANIZATION_TYPE_Insurance,ORGANIZATION_TYPE_Medicine,ORGANIZATION_TYPE_Military,ORGANIZATION_TYPE_Mobile,ORGANIZATION_TYPE_Other,ORGANIZATION_TYPE_Police,ORGANIZATION_TYPE_Religion,ORGANIZATION_TYPE_Restaurant,ORGANIZATION_TYPE_School,ORGANIZATION_TYPE_Services,ORGANIZATION_TYPE_Trade: type 1,ORGANIZATION_TYPE_Trade: type 4,ORGANIZATION_TYPE_Transport: type 2,ORGANIZATION_TYPE_XNA
0,36328.5,675000.0,0.010032,-9233,-878,-333.0,-522,,1,1,1,0,0,1,,0.372591,0.0,0.0,-292.0,,,,20.787811,1.118800,0.322920,0.053820,8.495888e+10,4.086956e+09,7.593750e+10,,,281376.735276,,,,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False
1,16893.0,585000.0,0.008019,-20148,365243,-4469.0,-3436,,1,0,0,0,0,0,,0.449567,0.0,0.0,-617.0,0.0,0.0,1.0,34.629728,1.000000,0.075080,0.028877,1.316250e+11,3.800925e+09,1.316250e+11,,,262996.646938,1.057549e+06,323601.348781,124462.057223,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
2,18256.5,270000.0,0.004960,-18496,-523,-3640.0,-2050,,1,1,1,0,0,0,,0.569503,4.0,0.0,-542.0,,,,18.303180,1.237600,0.338083,0.067617,1.804421e+10,9.858510e+08,1.458000e+10,,,190300.683015,,,,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,8901.0,135000.0,0.005002,-24177,365243,-4950.0,-3951,,1,0,0,0,0,0,,0.105235,0.0,0.0,0.0,0.0,0.0,0.0,17.168857,1.132000,0.131867,0.065933,1.031535e+10,6.008175e+08,9.112500e+09,,,16081.949503,1.991078e+05,117292.880853,51807.809564,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
4,21546.0,234000.0,0.006296,-10685,-697,-5101.0,-3226,,1,1,1,0,0,1,0.342344,0.202490,0.0,0.0,-1243.0,0.0,0.0,4.0,12.580827,1.158404,0.136800,0.092077,4.269297e+10,3.393495e+09,3.685500e+10,7.917935e+05,53919.169690,54888.179808,4.051467e+05,181358.856556,105376.429428,False,True,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
171197,20772.0,337500.0,0.031329,-20529,-3059,-11581.0,-3689,21.0,1,1,1,0,0,1,,0.404560,0.0,0.0,-2341.0,0.0,1.0,0.0,19.464905,1.198000,0.249514,0.061547,3.366006e+10,1.729269e+09,2.809688e+10,,,163573.584369,5.259111e+05,310848.121884,64003.230438,False,True,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
171198,29065.5,450000.0,0.010006,-22083,-129,-4629.0,-1773,1.0,1,1,0,0,0,0,,0.608542,0.0,0.0,-1688.0,0.0,1.0,5.0,20.693606,1.336600,0.117436,0.064590,1.488638e+11,7.193711e+09,1.113750e+11,,,366019.514067,,,,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
171199,49216.5,1138500.0,0.006629,-11053,-2536,-4858.0,-3393,,1,1,0,1,0,0,,0.664305,2.0,1.0,-515.0,0.0,0.0,1.0,25.147755,1.087119,0.168262,0.043229,3.620227e+11,1.439583e+10,3.330112e+11,,,822200.041383,1.631981e+06,938651.337391,221829.970551,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False
171200,25186.5,225000.0,0.009657,-8505,-165,-3318.0,-1176,7.0,1,1,0,0,0,1,0.210918,0.627050,0.0,0.0,-1133.0,,,,9.522959,1.066000,0.223880,0.111940,2.698312e+10,2.833481e+09,2.531250e+10,1.137166e+06,23728.276305,150397.907248,,,,False,True,True,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


In [64]:
# @title デフォルトのタイトル テキスト
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier

import numpy as np
import gc
from itertools import combinations
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score
from xgboost import XGBClassifier
from catboost import CatBoostClassifier


def get_models_trained(train, test, target, num_folds, train_eval):
    kf = KFold(n_splits=num_folds, shuffle=True, random_state=13)

    oof_predictions = np.zeros(len(train))
    test_predictions = np.zeros(len(test))
    train_eval_predictions = np.zeros(len(train_eval))

    # エラーの累計を保持する辞書
    sum_error = {}
    # 各組み合わせの選択回数を保持する辞書
    combination_count = {}

    # 個別モデルの名前リスト
    model_names = ["model1", "model2", "model3", "model4"]

    # 全ての組み合わせを生成（1, 2, 3, 4モデル）
    all_combinations = []
    for r in range(1, 5):
        all_combinations.extend(combinations(model_names, r))

    # 初期化
    for comb in all_combinations:
        sum_error[comb] = 0
        combination_count[comb] = 0

    for fold, (train_index, valid_index) in enumerate(kf.split(train, target)):
        print(f"Starting Fold {fold + 1}")
        X_train, X_valid = train[train_index], train[valid_index]
        y_train, y_valid = target[train_index], target[valid_index]

        # モデルと損失の初期化
        loss_dict = {}
        valid_pred_dict = {}

        # モデル1: XGBClassifier
        params1 = {
            "n_estimators": 626,
            "max_depth": 3,
            "random_state": 13,
            "min_child_weight": 0.001190123543553736,
            "learning_rate": 0.010519736270936835,
            "subsample": 0.7304788478701394,
            "colsample_bylevel": 0.604447278915981,
            "colsample_bytree": 0.7616852136157319,
            "reg_alpha": 0.115175569924065,
            "reg_lambda": 0.07155347824929895
        }
        model1 = XGBClassifier(**params1)

        # モデル2: CatBoostClassifier
        params2 = {
            'iterations': 254,
            'depth': 5,
            'learning_rate': 0.08377009991199288,
            'l2_leaf_reg': 1,
            'bagging_temperature': 0.7204457890870082,
            'min_data_in_leaf': 4,
            "random_state": 13,
            "verbose": 0
        }
        model2 = CatBoostClassifier(**params2)

        # モデル3: XGBClassifier
        params3 = {
            "n_estimators": 897,
            "max_depth": 4,
            "min_child_weight": 1.9636282677053687,
            "learning_rate": 0.006151391207761763,
            "subsample": 0.8251910979922186,
            "colsample_bylevel": 0.6454805596196158,
            "colsample_bytree": 0.598042694363472,
            "reg_alpha": 2.1719824223479005,
            "reg_lambda": 0.009192236594241635,
        }
        model3 = XGBClassifier(**params3)

        # モデル4: CatBoostClassifier
        params4 = {
            "iterations": 940,
            "depth": 3,
            "learning_rate": 0.019908189422344794,
            "l2_leaf_reg": 1,
            "bagging_temperature": 0.5063893392618839,
            "min_data_in_leaf": 1,
            'random_state': 42
        }
        model4 = CatBoostClassifier(**params4, verbose=0)

        # 各モデルの訓練と予測
        models = {
            "model1": model1,
            "model2": model2,
            "model3": model3,
            "model4": model4
        }

        for name, model in models.items():
            model.fit(X_train, y_train)
            preds = model.predict_proba(X_valid)[:, 1]
            auc = roc_auc_score(y_valid, preds)
            print(f"Fold {fold + 1} AUC for {name} = {auc}")
            loss_dict[(name,)] = auc  # キーをタプルに変更
            valid_pred_dict[(name,)] = preds
            sum_error[(name,)] += auc

        # すべての2モデルおよび3モデルの組み合わせを評価
        for r in [2, 3]:
            for comb in combinations(model_names, r):
                # 平均予測
                preds_comb = np.mean([valid_pred_dict[(comb_part,)] for comb_part in comb], axis=0)
                auc_comb = roc_auc_score(y_valid, preds_comb)
                print(f"Fold {fold + 1} AUC for {comb} = {auc_comb}")
                loss_dict[comb] = auc_comb
                valid_pred_dict[comb] = preds_comb
                sum_error[comb] += auc_comb

        # 全モデルのアンサンブル
        preds_all = np.mean([valid_pred_dict[(model,)] for model in model_names], axis=0)
        auc_all = roc_auc_score(y_valid, preds_all)
        print(f"Fold {fold + 1} AUC for all models ensemble = {auc_all}")
        loss_dict[tuple(model_names)] = auc_all
        valid_pred_dict[tuple(model_names)] = preds_all
        sum_error[tuple(model_names)] += auc_all

        # 最適な組み合わせを選択
        best_combination = max(loss_dict, key=loss_dict.get)  # AUCが高い組み合わせを選択
        best_auc = loss_dict[best_combination]
        print(f"Fold {fold + 1} best combination: {best_combination} with AUC = {best_auc}")

        # OOF予測とテスト予測の更新
        if isinstance(best_combination, tuple):
            # 複数モデルのアンサンブル
            oof_predictions[valid_index] = valid_pred_dict[best_combination]
            test_preds = np.mean([models[model].predict_proba(test)[:, 1] for model in best_combination], axis=0)
            train_eval_preds = np.mean([models[model].predict_proba(train_eval)[:, 1] for model in best_combination], axis=0)
        else:
            # 単一モデル（この場合は常にタプルになるため不要）
            oof_predictions[valid_index] = valid_pred_dict[(best_combination,)]
            test_preds = models[best_combination].predict_proba(test)[:, 1]
            train_eval_preds = models[best_combination].predict_proba(train_eval)[:, 1]

        # 各組み合わせの選択回数をインクリメント
        combination_count[best_combination] += 1

        test_predictions += test_preds / kf.n_splits
        train_eval_predictions += train_eval_preds / kf.n_splits

        # メモリの解放
        del X_train, X_valid, y_train, y_valid, model1, model2, model3, model4
        gc.collect()

        print('---------------\n')

    final_AUC = roc_auc_score(target, oof_predictions)
    print(f"OOF AUC = {final_AUC}")

    # 各組み合わせの累積エラーと選択回数を表示
    print("Sum of AUC for each combination:")
    for comb, error in sum_error.items():
        print(f"{comb}: {error}")

    print("\nNumber of times each combination was selected:")
    for comb, count in combination_count.items():
        print(f"{comb}: {count}")

    return oof_predictions, test_predictions, train_eval_predictions


In [65]:
# 目的変数と説明変数に分割
X = train[list(best_features)].values
y = train["TARGET"].values
X_test = test[list(best_features)].values

In [66]:
oof_predictions, test_preds, train_preds = get_models_trained(X, X_test, y, 5, X)

Starting Fold 1
Fold 1 AUC for model1 = 0.743128519456653
Fold 1 AUC for model2 = 0.7527624788235782
Fold 1 AUC for model3 = 0.7453060421619297
Fold 1 AUC for model4 = 0.7468308898332493
Fold 1 AUC for ('model1', 'model2') = 0.7502719257002883
Fold 1 AUC for ('model1', 'model3') = 0.7443615531466973
Fold 1 AUC for ('model1', 'model4') = 0.7461578100132908
Fold 1 AUC for ('model2', 'model3') = 0.7509236188071304
Fold 1 AUC for ('model2', 'model4') = 0.750673993636522
Fold 1 AUC for ('model3', 'model4') = 0.7470671993910735
Fold 1 AUC for ('model1', 'model2', 'model3') = 0.7490996636496645
Fold 1 AUC for ('model1', 'model2', 'model4') = 0.7495094507817149
Fold 1 AUC for ('model1', 'model3', 'model4') = 0.7461458793150502
Fold 1 AUC for ('model2', 'model3', 'model4') = 0.7499748161236474
Fold 1 AUC for all models ensemble = 0.7488991007795276
Fold 1 best combination: ('model2',) with AUC = 0.7527624788235782
---------------

Starting Fold 2
Fold 2 AUC for model1 = 0.745528336790434
Fold 2

In [67]:
test_preds

array([0.04173568, 0.20366068, 0.10515281, ..., 0.05447447, 0.19043375,
       0.12228982])

In [68]:
# 予測結果を提出用のフォーマットに格納
sample_sub['TARGET'] = test_preds
sample_sub

Unnamed: 0,SK_ID_CURR,TARGET
0,171202,0.041736
1,171203,0.203661
2,171204,0.105153
3,171205,0.110303
4,171206,0.142513
...,...,...
61495,232697,0.164670
61496,232698,0.035766
61497,232699,0.054474
61498,232700,0.190434


In [70]:
# 提出用のcsvファイルを作成
sample_sub.to_csv('GCI-コンペ２_006.csv',index=False)

以上で、Home Credit Default Riskコンペのチュートリアルは終了です。今回は、50種類ある特徴量のうち5種類しか使用していないので、まだまだ改善の余地があります。この後は、このnotebookやこれまでの教材を参考にして、さらなるスコアの向上を目指してください！