## Analyze quality defects using highly interpretable models

We will build a highly explainable model to predict quality defects using data such as sensor data collected from the manufacturing process and inspection process.

After building the model, we will explain the model to understand the causes of quality defects.

### ALgorithm to use
- Logistic regression
- Decision tree
- Generalized additive model
    - We will use Explainable Boosting Machine (EBM) as the estimation algorithm. EBM is included in [interpretml - interpret](https://github.com/interpretml/interpret)

### 0. Preparation
- Jupyter Kernel :  `factoryqc-glassbox` is selected
- IDE : VSCode or Jupyter Notebook is assumed.

---

### 1. Data preparation
Import data as Pandas DataFrame.

In [None]:
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

seed = 1234

# dummy data for manufacturing process
df = pd.read_csv("../data/Factory.csv")

In [None]:
df.head()

In [None]:
from sklearn.model_selection import train_test_split

# Select the features for training model
X = df.drop(columns=["Quality","ID"],axis=1)

# Select the target variable for training model
y = df["Quality"].values

# Divide the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=100, stratify=y)

---

### 2. Data exploration
InterpreML has methods for visualizing data.

In [None]:
from interpret import show
from interpret.data import ClassHistogram

hist = ClassHistogram().explain_data(X_train, y_train, name = 'Train Data')
show(hist)

---

### 3. Model training using logistic regression and decision tree
#### 3.1 Model training
We will train a logistic regression and decision tree model to interpret the model.

In [None]:
from interpret.glassbox import LogisticRegression, ClassificationTree

# X_enc = pd.get_dummies(X, prefix_sep='.')
# feature_names = list(X_enc.columns)
# X_train_enc, X_test_enc, y_train, y_test = train_test_split(X_enc, y, test_size=0.20, random_state=seed)

lr = LogisticRegression(random_state=seed, penalty='l2')
lr.fit(X_train, y_train)

tree = ClassificationTree()
tree.fit(X_train, y_train)

#### 3.2 Model interpretation (global)

In [None]:
lr_global = lr.explain_global(name='LR')
tree_global = tree.explain_global(name='Tree')

show(lr_global)
show(tree_global)

#### 3.3 Model Interpretation (local)

In [None]:
lr_local = lr.explain_local(X_test[:20], y_test[:20], name='LR')
tree_local = tree.explain_local(X_test[:20], y_test[:20], name='Tree')

show(lr_local)
show(tree_local)

#### 3.4. Model accuracy
Check the model accuracy using ROC.

In [None]:
from interpret.perf import ROC

lr_perf = ROC(lr.predict_proba).explain_perf(X_test, y_test, name='Logistic Regression')
tree_perf = ROC(tree.predict_proba).explain_perf(X_test, y_test, name='Classification Tree')

show(lr_perf)
show(tree_perf)

---

### 4. Model training using Explainable Boosting Machine (EBM)
#### 4.1 Model Training
We will use the EBM algorithm to estimate a generalized additive model. When you want to consider interaction terms, specify the combination number or combination of column indexes with `interactions`.

In [None]:
from interpret.glassbox import ExplainableBoostingClassifier, LogisticRegression, ClassificationTree, DecisionListClassifier

ebm = ExplainableBoostingClassifier(random_state=seed, interactions=4)
ebm.fit(X_train, y_train)

#### 4.2 Model interpretation (global)
Display a graph of the importance of the model and the contribution of each explanatory variable and interaction term to the prediction value.

In [None]:
ebm_global = ebm.explain_global(name='EBM')
show(ebm_global)

#### 4.3 Model interpretation (local)
Interpret the individual prediction values of the test data that have been calculated by the trained EBM model.

In [None]:
# 例として、テストデータの冒頭 20 個のデータを対象
ebm_local = ebm.explain_local(X_test[:20], y_test[:20], name='EBM')
show(ebm_local)

#### 4.4. Model accuracy
Check the model accuracy using ROC.

In [None]:
from interpret.perf import ROC

ebm_perf = ROC(ebm.predict_proba).explain_perf(X_test, y_test, name='EBM')
show(ebm_perf)

### 5. Model accuracy comparison on the dashboard
Compare the accuracy of the trained model using the dashboard.

#### Dashboard

In [None]:
# Dashboad for comparing EBM, Logistic Regression, and Decision Tree
show([hist, lr_global, lr_local, lr_perf, tree_global, tree_local, tree_perf, ebm_global, ebm_local, ebm_perf], share_tables=True)