# Tabular Playground Series - Jul 2021
Continued from [last time](https://www.kaggle.com/astashiro/tps-jul2021-01eda) .

## Examining the effects of Blend in PyCaret
PyCaretでのブレンドの効果を確認します。比較するために前回と同じデータセットを使用します。

In [None]:
!pip install pycaret==2.3.1

In [None]:
!pip install shap

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from pycaret.regression import setup, compare_models, create_model, tune_model, finalize_model, blend_models, predict_model, interpret_model
import shap

In [None]:
train = pd.read_csv('../input/tabular-playground-series-jul-2021/train.csv')
test = pd.read_csv('../input/tabular-playground-series-jul-2021/test.csv')
sub = pd.read_csv('../input/tabular-playground-series-jul-2021/sample_submission.csv')

### Carbon monoxide

In [None]:
train1 = train.drop(['date_time','target_benzene','target_nitrogen_oxides'], axis=1)
train1.head()

In [None]:
reg1 = setup(data=train1, target='target_carbon_monoxide', session_id=1)

In [None]:
best1 = compare_models()

In [None]:
catboost1 = create_model("catboost")
et1 = create_model("et")
lightgbm1 = create_model("lightgbm")
gbr1 = create_model("gbr")
rf1 = create_model("rf")
blend1 = blend_models(estimator_list= [catboost1, et1, lightgbm1, gbr1, rf1], fold=4)
pred_h1 = predict_model(blend1)
final1 = finalize_model(blend1)
pred1 = predict_model(final1, data=test)

In [None]:
interpret_model(catboost1)

In [None]:
interpret_model(lightgbm1)

In [None]:
interpret_model(rf1)

### Benzene

In [None]:
train2 = train.drop(['date_time','target_carbon_monoxide','target_nitrogen_oxides'], axis=1)
train2.head()

In [None]:
reg2 = setup(data=train2, target='target_benzene', session_id=2)

In [None]:
best2 = compare_models()

In [None]:
catboost2 = create_model("catboost")
et2 = create_model("et")
lightgbm2 = create_model("lightgbm")
gbr2 = create_model("gbr")
rf2 = create_model("rf")
blend2 = blend_models(estimator_list= [catboost2, et2, lightgbm2, gbr2, rf2], fold=4)
pred_h2 = predict_model(blend2)
final2 = finalize_model(blend2)
pred2 = predict_model(final2, data=test)

In [None]:
interpret_model(catboost2)

In [None]:
interpret_model(lightgbm2)

In [None]:
interpret_model(rf2)

## Nitrogen oxides

In [None]:
train3 = train.drop(['date_time','target_carbon_monoxide','target_benzene'], axis=1)
train3.head()

In [None]:
reg3 = setup(data=train3, target='target_nitrogen_oxides', session_id=3)

In [None]:
best3 = compare_models()

In [None]:
catboost3 = create_model("catboost")
et3 = create_model("et")
lightgbm3 = create_model("lightgbm")
gbr3 = create_model("gbr")
rf3 = create_model("rf")
blend3 = blend_models(estimator_list= [catboost3, et3, lightgbm3, gbr3, rf3], fold=4)
pred_h3 = predict_model(blend3)
final3 = finalize_model(blend3)
pred3 = predict_model(final3, data=test)

In [None]:
interpret_model(catboost3)

In [None]:
interpret_model(lightgbm3)

In [None]:
interpret_model(rf3)

SHAPが表示できるCatBoost、LightGBM、Random Forestを見てみると、モデルによって重要視されている特徴量が微妙に違うことがわかります。これらをブレンドするとスコアは上がるのでしょうか。

## Submission

5種類のモデルをブレンドしたものをサブミットしてみます。

In [None]:
sub.target_carbon_monoxide = pred1.Label
sub.target_benzene = pred2.Label
sub.target_nitrogen_oxides = pred3.Label
sub.to_csv('pycaretblend_submission.csv',index=False)
sub

**Public Score : 0.25441** (Only Lightgbm: 0.30025)   

混ぜるとスコアはよくなりました！