# Microsoft Interpret ML のよる決定木モデリング

[Microsoft Interpret ML](https://github.com/microsoft/interpret) に内蔵されている Decision Tree (決定木) のモデリングを行います。

## 1. データ準備

### データインポート

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("../data/Factory.csv")

In [2]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Quality","ID"],axis=1)
y = df["Quality"].values

### データ探索

In [3]:
from interpret import show
from interpret.data import ClassHistogram

hist = ClassHistogram().explain_data(X, y, name = 'Train Data')
show(hist)

## 2. モデル学習

In [4]:
from interpret.glassbox import ClassificationTree

seed=1234

# We have to transform categorical variables to use Logistic Regression and Decision Tree
X_enc = pd.get_dummies(X, prefix_sep='.')
feature_names = list(X_enc.columns)
print(feature_names)

['ProcessA-Pressure', 'ProcessA-Humidity', 'ProcessA-Vibration', 'ProcessB-Light', 'ProcessB-Skill', 'ProcessB-Temp', 'ProcessB-Rotation', 'ProcessC-Density', 'ProcessC-PH', 'ProcessC-skewness', 'ProcessC-Time']


In [5]:
# データ分割 (学習、検証)
X_train_enc, X_test_enc, y_train, y_test = train_test_split(X_enc, y, test_size=0.10, random_state=seed)

In [6]:
# モデル学習
tree = ClassificationTree()
tree.fit(X_train_enc, y_train)

<interpret.glassbox.decisiontree.ClassificationTree at 0x119e0e588>

## 3. モデルの解釈

### 精度の確認(ROC, 残差)

In [7]:
from interpret.perf import ROC

tree_perf = ROC(tree.predict_proba).explain_perf(X_test_enc, y_test, name='Classification Tree')
show(tree_perf)

### Global なモデル解釈

In [8]:
tree_global = tree.explain_global(name='Tree')
show(tree_global)

### Local なモデル解釈

In [10]:
tree_local = tree.explain_local(X_test_enc[:20], y_test[:20], name='Tree')
show(tree_local)