# Regression with DL and tabular data

**Author**: Jonathan TRICARD

**Summary**: using a dataset propose by scikit-learn, we build a basic Neural Network model to predict the price of a house. We then try to explain your model with  the conditions tabular data in a case of a regression.

**ExplainDL**: create a HTML report to have visualizations to explain how deep learning model works.

## Import libraries

In [None]:
import os
import pandas as pd 

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics  import mean_absolute_error, mean_squared_error, r2_score

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

from readml.logger import ROOT_DIR
from readml.explainers.dl.explain_dl import ExplainDL

**WARNING**: You absolutely need to use this before the training of the model and the run of ExplainDL in notebooks, if not it would produce an error when you run ```ExplainDL().explain_tabular()```.

In [None]:
tf.compat.v1.disable_v2_behavior()

In [None]:
print(tf.executing_eagerly())
assert tf.executing_eagerly() == False

## Import data

In [None]:
def create_and_split_dataframe():
    dict_data = fetch_california_housing()
    X = pd.DataFrame(dict_data["data"], columns=dict_data["feature_names"])
    y = pd.DataFrame(dict_data["target"], columns=dict_data["target_names"])
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    df_train = X_train.copy()
    df_train[y_train.columns.values[0]] = y_train
    df_test = X_test.copy()
    df_test[y_test.columns.values[0]] = y_test
    return X_train, X_test, y_train, y_test, df_train, df_test

In [None]:
X_train, X_test, y_train, y_test, df_train, df_test = create_and_split_dataframe() 

## Train model

In [None]:
def baseline_model(input_dim):
	model = Sequential()
	model.add(Dense(13, input_dim=input_dim, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal'))
	model.compile(loss='mean_squared_error', optimizer='adam')
	return model

In [None]:
model = baseline_model(X_train.shape[1])
model.fit(X_train, y_train, epochs=5, batch_size=10)

In [None]:
y_pred_rnn = model.predict(X_test) 

In [None]:
mae_rnn = mean_absolute_error(y_test, y_pred_rnn)
mse_rnn = mean_squared_error(y_test, y_pred_rnn)
r2_rnn = r2_score(y_test, y_pred_rnn)

print("MAE : ", mae_rnn)
print("MSE : ", mse_rnn)
print("R2 : ", r2_rnn)

## Use intelligibility from readml

 **WARNINGS**: Take care of change de config_local.cfg to adapt it to the use case, you may need to re run ```pip install -e .``` after the change in the configuration.

In [None]:
model_explain = model # The model you use
task = "regression" # here we try to solve a regression problem
features_name = list(X_train.columns) # all the features without de target column
out_path = "../outputs/notebooks/" # the path where you want to save the report
out_path = os.path.join(ROOT_DIR, out_path)
if not os.path.exists(out_path):
    os.makedirs(out_path)

 If you just want to try it, take care of take only the ```.head()``` to have report on only a small part of the observation, because this would run and do intelligibility on each row you give to the ```.explain_tabular()``` method.

In [None]:
exp = ExplainDL(
                model=model_explain,
                out_path=out_path,
                )

In [None]:
exp.explain_tabular(
                df_test.head(), # you need to use the dataframe with features and target into
                features_name=features_name, 
                task_name=task,
                )