# Microsoft Interpret ML のよる決定木モデリング

[Microsoft Interpret ML(https://github.com/microsoft/interpret) に内蔵されている Decision Tree (決定木) のモデリングを行います。

## 1. データ準備

### データインポート

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("../data/Factory.csv")

In [5]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Quality","ID"],axis=1)
y = df["Quality"].values

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1,random_state=100,stratify=y)

### データ探索

In [6]:
from interpret import show
from interpret.data import ClassHistogram

hist = ClassHistogram().explain_data(X_train, y_train, name = 'Train Data')
show(hist)

## 2. モデル学習

In [14]:
from interpret.glassbox import ClassificationTree

seed=1234

# We have to transform categorical variables to use Logistic Regression and Decision Tree
X_enc = pd.get_dummies(X, prefix_sep='.')
feature_names = list(X_enc.columns)
X_train_enc, X_test_enc, y_train, y_test = train_test_split(X_enc, y, test_size=0.20, random_state=seed)

tree = ClassificationTree()
tree.fit(X_train_enc, y_train)

<interpret.glassbox.decisiontree.ClassificationTree at 0x11fa635f8>

## 3. モデルの解釈

### 精度の確認(ROC, 残差)

In [17]:
from interpret.perf import ROC

tree_perf = ROC(tree.predict_proba).explain_perf(X_test_enc, y_test, name='Classification Tree')
show(tree_perf)

### Global なモデル解釈

In [16]:
tree_global = tree.explain_global(name='Tree')
show(tree_global)