# Partial Dependence Plot

## Summary

Partial dependence plots visualize the dependence between the response and a set of target features (usually one or two), marginalizing over all the other features. For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features, and can be misleading interpretability-wise when this is not met (e.g. when the model has many high order interactions).

## How it Works

The PDP module for `scikit-learn` {cite}`pedregosa2011scikit` provides a succinct description of the algorithm [here](https://scikit-learn.org/stable/modules/partial_dependence.html).

Christoph Molnar's "Interpretable Machine Learning" e-book {cite}`molnar2020interpretable` has an excellent overview on partial dependence that can be found [here](https://christophm.github.io/interpretable-ml-book/pdp.html).

The conceiving paper "Greedy Function Approximation: A Gradient Boosting Machine" {cite}`friedman2001greedy` provides a good motivation and definition.

## Code Example

The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with Partial Dependence Plots. The visualizations provided will be for global explanations.

In [None]:
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from interpret import show
from interpret.blackbox import PartialDependence

seed = 1
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

pca = PCA()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1)

blackbox_model = Pipeline([('pca', pca), ('rf', rf)])
blackbox_model.fit(X_train, y_train)

pdp = PartialDependence(predict_fn=blackbox_model.predict_proba, data=X_train)
pdp_global = pdp.explain_global()

show(pdp_global)

## Further Resources

- [Paper link to conceiving paper](https://projecteuclid.org/download/pdf_1/euclid.aos/1013203451)
- [scikit-learn on their PDP module](https://scikit-learn.org/stable/modules/partial_dependence.html)

## Bibliography

```{bibliography} references.bib
:style: unsrt
:filter: docname in docnames
```

## API

### PartialDependence

```{eval-rst}
.. autoclass:: interpret.blackbox.PartialDependence
   :members:
   :inherited-members:
```