## 機械学習モデルのデバッグ
中古車の価格を予測する回帰モデルを ランダムフォレスト を用いて構築し、Responsible AI Toolbox でデバッグします。

### 0. 事前準備
- Jupyter Kernel :  `rai-toolbox` を選択する。
    - [0-Setup.ipynb](./0-Setup.ipynb) の手順に従い構築しておくこと。

### 1. ライブラリ
必要な Python ライブラリをインポートします。

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

### 2. データ準備
自動車の価格に関するサンプルデータを Pandas DataFrame としてインポートします。

In [None]:
import pandas as pd
df = pd.read_csv("../data/automobile.csv")
df = df.dropna()
target_feature = 'price'
data_df = df.drop([target_feature], axis=1)

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
# 学習データとテストデータに分割
X_train_original, X_test_original, y_train, y_test = train_test_split(data_df, df[target_feature], test_size=0.2, random_state=1234)

train_data = X_train_original.copy()
test_data = X_test_original.copy()
train_data[target_feature] = y_train
test_data[target_feature] = y_test

### 3. モデル構築

#### scikit learn パイプラインの作成

In [None]:


def clean_data(X, y, target_feature):
    features = X.columns.values.tolist()
    pipe_cfg = {
        'num_cols': X.dtypes[(X.dtypes == 'int64') | (X.dtypes == "float64")].index.values.tolist(),
        'cat_cols': X.dtypes[X.dtypes == 'object'].index.values.tolist(),
    }
    num_pipe = Pipeline([
        ('num_imputer', SimpleImputer(strategy='median')),
        ('num_scaler', StandardScaler())
    ])
    cat_pipe = Pipeline([
        ('cat_imputer', SimpleImputer(strategy='constant', fill_value='?')),
        ('cat_encoder', OneHotEncoder(handle_unknown='ignore', sparse=False))
    ])
    feat_pipe = ColumnTransformer([
        ('num_pipe', num_pipe, pipe_cfg['num_cols']),
        ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])
    ])
    X = feat_pipe.fit_transform(X)
    print("categorical:", pipe_cfg['cat_cols'])
    print("numerical:", pipe_cfg['num_cols'])

    return X, feat_pipe, features, pipe_cfg

X_train, feat_pipe, features, pipe_cfg = clean_data(X_train_original, y_train, target_feature)


#### Random Forest モデル学習　

In [None]:
model = RandomForestRegressor()
model.fit(X_train, y_train)

### 4. Responsible AI Dashboard 構築

In [None]:
from raiwidgets import ResponsibleAIDashboard
from responsibleai import RAIInsights

In [None]:
dashboard_pipeline = Pipeline(steps=[('preprocess', feat_pipe), ('model', model)])

rai_insights = RAIInsights(dashboard_pipeline, train_data, test_data, target_feature, 'regression',
                               categorical_features=pipe_cfg['cat_cols'])

In [None]:
rai_insights.explainer.add()
rai_insights.compute()

In [None]:
ResponsibleAIDashboard(rai_insights)