# Neptune + Dalex

## Introduction

[Dalex](https://dalex.drwhy.ai/) is an open-source tool to explore and explain model behaviour to understand how complex models are working.  
This guide will show you how to:

* Upload pickeled dalex explainer object to Neptune
* Upload dalex's interactive reports to Neptune.

This guide is adapted from the dalex documentation [here](https://dalex.drwhy.ai/python-dalex-titanic.html).

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [1]:
%pip install -U dalex neptune pandas
%pip install -U --user scikit-learn

Collecting dalex
  Downloading dalex-1.6.0.tar.gz (1.0 MB)
     ---------------------------------------- 1.0/1.0 MB 6.4 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting neptune
  Using cached neptune-1.2.0-py3-none-any.whl (448 kB)
Building wheels for collected packages: dalex
  Building wheel for dalex (setup.py): started
  Building wheel for dalex (setup.py): finished with status 'done'
  Created wheel for dalex: filename=dalex-1.6.0-py3-none-any.whl size=1046062 sha256=f479644e8e5c79090d32bad8f340f00942f4bb3ecdcb74b8437106a931d53483
  Stored in directory: c:\users\siddh\appdata\local\pip\cache\wheels\dd\07\2f\f5456a1d25db6d8e3568bef25dfa7c6b180921487177da33be
Successfully built dalex
Installing collected packages: dalex, neptune
Successfully installed dalex-1.6.0 neptune-1.2.0
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packag

## Import libraries

In [55]:
import dalex as dx

import pandas as pd

from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

import warnings
warnings.filterwarnings('ignore')

## Load data

In [56]:
data = dx.datasets.load_titanic()

X = data.drop(columns='survived')
y = data.survived

data.head(10)

Unnamed: 0,gender,age,class,embarked,fare,sibsp,parch,survived
0,male,42.0,3rd,Southampton,7.11,0,0,0
1,male,13.0,3rd,Southampton,20.05,0,2,0
2,male,16.0,3rd,Southampton,20.05,1,1,0
3,female,39.0,3rd,Southampton,20.05,1,1,1
4,female,16.0,3rd,Southampton,7.13,0,0,1
5,male,25.0,3rd,Southampton,7.13,0,0,1
6,male,30.0,2nd,Cherbourg,24.0,1,0,0
7,female,28.0,2nd,Cherbourg,24.0,1,0,1
8,male,27.0,3rd,Cherbourg,18.1509,0,0,1
9,male,20.0,3rd,Southampton,7.1806,0,0,1


## Create a pipeline model

In [57]:
numerical_features = ['age', 'fare', 'sibsp', 'parch']
numerical_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())
    ]
)

categorical_features = ['gender', 'class', 'embarked']
categorical_transformer = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ]
)

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

classifier = MLPClassifier(hidden_layer_sizes=(150,100,50), max_iter=500, random_state=0)

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', classifier)])

## Fit the model

In [58]:
clf.fit(X, y)

### (Neptune) Start a run

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/dalex-support](https://app.neptune.ai/o/common/org/dalex-support). **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

### Log to your own project instead

Replace the code below with the following:

```python
import neptune
from getpass import getpass

run = neptune.init_run(
    project="workspace-name/project-name",  # replace with your own (see instructions below)
    api_token=getpass("Enter your Neptune API token: "),
    tags=["reports"],  # (optional) replace with your own
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. The workspace name is displayed in the top-left corner of the app.

    To copy the project path, in the top-right corner, open the settings menu and select **Properties**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [79]:
import neptune

run = neptune.init_run(
    #api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/dalex-support",
    tags=["dalex reports"] # (optional) replace with your own
)

https://app.neptune.ai/common/dalex-support/e/DLX-1


**To open the run in the Neptune web app, click the link that appeared in the cell output.**

We'll use the `run` object we just created to log metadata. You'll see the metadata appear in the app.

## Create an explainer for the model¶

In [80]:
exp = dx.Explainer(clf, X, y)

Preparation of a new explainer is initiated

  -> data              : 2207 rows 7 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 2207 values
  -> model_class       : sklearn.neural_network._multilayer_perceptron.MLPClassifier (default)
  -> label             : Not specified, model's class short name will be used. (default)
  -> predict function  : <function yhat_proba_default at 0x0000017FF73B75E0> will be used (default)
  -> predict function  : Accepts only pandas.DataFrame, numpy.ndarray causes problems.
  -> predicted values  : min = 2.72e-06, mean = 0.337, max = 1.0
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.921, mean = -0.0146, max = 0.975
  -> model_info        : package sklearn

A new explainer has been created!


### (Neptune) Upload explainer object to Neptune
You can use dalex's [`dumps()`](https://dalex.drwhy.ai/python/api/#dalex.Explainer.dumps) method to get a pickled representation of the explainer, and then upload it to Neptune using Neptune's [`from_content()`](https://docs.neptune.ai/api/field_types/#from_content) method.

In [81]:
from neptune.types import File

run["pickled_explainer"].upload(File.from_content(exp.dumps()))

  -> 'residual_function' attribute is a local function; thus, has to be dropped.


## Model-level explanations

### model_performance

This function calculates various Model Performance measures:

- __Classification:__ F1, accuracy, recall, precision and AUC
- __Regression:__ mean squared error, R squared, median absolute deviation


In [None]:
mp = exp.model_performance()
mp.plot(geom="roc")

#### (Neptune) Upload ROC plot to Neptune
These plots can be uploaded to Neptune by setting `show=False`.  
To distinguish between the plot types, you can use namespaces. For example, "model/performace/roc", "model/performance/ecdf", etc. 
You can learn more about Neptune namespaces and fields in the [documentation](https://docs.neptune.ai/about/namespaces_and_fields/).

In [None]:
run["model/performance/roc"].upload(mp.plot(geom="roc", show=False))

### model_parts
This function calculates Variable Importance.

In [None]:
vi = exp.model_parts()
vi.plot()

There is also a possibility of calculating variable importance of group of variables

In [None]:
vi_grouped = exp.model_parts(variable_groups={'personal': ['gender', 'age', 'sibsp', 'parch'],
                                     'wealth': ['class', 'fare']})
vi_grouped.plot()

#### (Neptune) Upload variable importance plots to Neptune

In [None]:
run["model/variable_importance/single"].upload(vi.plot(show=False))
run["model/variable_importance/grouped"].upload(vi_grouped.plot(show=False))

### model_profile
This function calculates explanations that explore model response as a function of selected variables.  
The explanations can be calulated as Partial Dependence Profile or Accumulated Local Dependence Profile.

In [None]:
pdp_num = exp.model_profile(type = 'partial', label="pdp")
ale_num = exp.model_profile(type = 'accumulated', label="ale")
pdp_num.plot(ale_num)

Calculating ceteris paribus: 100%|██████████| 7/7 [00:00<00:00, 18.89it/s]
Calculating ceteris paribus: 100%|██████████| 7/7 [00:00<00:00, 22.54it/s]
Calculating accumulated dependency: 100%|██████████| 4/4 [00:00<00:00, 18.01it/s]


In [None]:
pdp_cat = exp.model_profile(type = 'partial', variable_type='categorical',
                            variables = ["gender","class"], label="pdp")
ale_cat = exp.model_profile(type = 'accumulated', variable_type='categorical',
                            variables = ["gender","class"], label="ale")
ale_cat.plot(pdp_cat)

Calculating ceteris paribus: 100%|██████████| 2/2 [00:00<00:00, 105.27it/s]
Calculating ceteris paribus: 100%|██████████| 2/2 [00:00<00:00, 117.67it/s]
Calculating accumulated dependency: 100%|██████████| 2/2 [00:00<00:00, 33.04it/s]


#### (Neptune) Upload model profile plots to Neptune

In [None]:
run["model/profile/num"].upload(pdp_num.plot(ale_num, show=False))
run["model/profile/cat"].upload(ale_cat.plot(pdp_cat, show=False))

## Prediction-level explanations

Let's create two example persons for this tutorial.

In [82]:
john = pd.DataFrame({'gender': ['male'],
                       'age': [25],
                       'class': ['1st'],
                       'embarked': ['Southampton'],
                       'fare': [72],
                       'sibsp': [0],
                       'parch': 0},
                      index = ['John'])

mary = pd.DataFrame({'gender': ['female'],
                     'age': [35],
                     'class': ['3rd'],
                     'embarked': ['Cherbourg'],
                     'fare': [25],
                     'sibsp': [0],
                     'parch': [0]},
                     index = ['Mary'])

### predict_parts
This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.  
Model prediction is decomposed into parts that are attributed for particular variables.

Breakdown values for John's predictions

In [83]:
bd_john = exp.predict_parts(john, type='break_down', label=john.index[0])
bd_interactions_john = exp.predict_parts(john, type='break_down_interactions', label="John+")
bd_john.plot(bd_interactions_john)

Shapely values for Mary's predictions

In [84]:
sh_mary = exp.predict_parts(mary, type='shap', B = 10, label=mary.index[0])
sh_mary.plot()

#### (Neptune) Upload plots to Neptune

In [85]:
run["prediction/breakdown/john"].upload(bd_john.plot(bd_interactions_john, show=False))
run["prediction/shapely/mary"].upload(sh_mary.plot(show=False))

### predict_profile

This function computes individual profiles aka Ceteris Paribus Profiles.

In [86]:
cp_mary = exp.predict_profile(mary, label=mary.index[0])
cp_john = exp.predict_profile(john, label=john.index[0])

Calculating ceteris paribus: 100%|██████████| 7/7 [00:00<00:00, 212.49it/s]
Calculating ceteris paribus: 100%|██████████| 7/7 [00:00<00:00, 154.67it/s]


In [87]:
cp_mary.plot(cp_john)

In [88]:
cp_john.plot(cp_mary, variable_type = "categorical")

#### (Neptune) Upload CP plots to Neptune

In [89]:
run["prediction/profile/numerical"].upload(cp_mary.plot(cp_john, show=False))
run["prediction/profile/categorical"].upload(cp_mary.plot(cp_john, variable_type = "categorical", show=False))

### Stop logging

Once you are done logging, stop tracking the run.

In [102]:
run.stop()

Shutting down background jobs, please wait a moment...
Done!
All 0 operations synced, thanks for waiting!
Explore the metadata in the Neptune app:
https://app.neptune.ai/common/dalex-support/e/DLX-1/metadata


### Analyze reports in the Neptune app
Go to the run link and explore the reports. 
You can also explore this [example run](https://app.neptune.ai/o/common/org/dalex-support/runs/details?viewId=standard-view&detailsTab=dashboard&dashboardId=993ea4c1-c528-4d6d-86ba-1a7a3bd65e7e&shortId=DLX-2&type=run).