# Interpretability of Supervised Learning Models

**BMI 773 Clinical Research Informatics**

*Yuriy Sverchkov*

*April 15, 2020*

[![xkcd: AI hiring algorithm](images/ai_hiring_algorithm.png)](https://xkcd.com/2237/)

## Readings

* Z. C. Lipton, “The mythos of model interpretability,” Commun. ACM, vol. 61, no. 10, pp. 36–43, Sep. 2018, [doi: 10.1145/3233231](https://doi.org/10.1145/3233231). [arXiv](https://arxiv.org/pdf/1606.03490v2)
* Caruana paper
* LIME
* GradCAM


### To review (potential readings)

* http://philsci-archive.pitt.edu/16734/1/preprint.pdf

## Why do we want to interpret models?

* __Trust__: having an interpretation along with a prediction can bring a practitioner to agree with a model.
* __Causality__: understanding the associations driving model decisions can help uncover underlying mechanisms.
* __Transferability__: understanding how a model makes decisions informs about how it will perform on a different data distribution
* __Informativeness__: ppointing out evidence to support a decision (decision support systems)

## What makes a model interpretable?

* __Simulatability__: Can a person can look at the description of the model and figure out what the model's prediction about a given case would be?
* __Decomposability__: Is the model's decision made up of semantically meaningful components?

## What constitutes an interpretation?

* Model-agnostic vs Model-based
* Global vs Local
* Feature importances - Matt Churpek mentioned
* Feature influences
* Models that are interpretable by design
    * Logistic regression
    * Decision trees
    * Rule sets
    * Additive model (Caruana)
* Model-based post-hoc interpretation
    * GradCAM and variants - Matt Churpek mentioned
    * Attention?

## Models that are interpretable by design

* Rule-based models
* Decision trees
* Linear and logistic regression
* Additive models

### Interpreting linear regression coefficients

__Model:__
$y = \beta_0 + \sum_{i=1}^d x_i \beta_i$

__Interpretation:__
An increase in the value of feature $i$ by 1 unit corresponds to the incease in the outcome by $\beta_i$ units.

### Interpreting logistic regression coefficients

__Model:__
$$ \overbrace{ \log \left( \frac{ P(y=1) }{ P(y=0) }  \right) }^\text{log odds} = \beta_0 + \sum_{i=1}^d x_i \beta_i $$

__Interpretation:__ An (additive) increase in the value of feature $i$ by 1 unit corresponds to the increase in the odds of the outcome by a (multiplicative) factor of $\beta_i$.

## Models that generate explanations along with predictions

* Image captioning
* Attention-based models

## Post-hoc model-aware interpretation

* __Post-hoc__ - the interpretation is not built into the predictive model.
* __Model-aware__ - the interpretation exploits knowledge about the model's internals

### Feature importances in random forests

### Saliency maps

## Post-hoc model agnostic interpretation

![Feature importance vs model translation](images/akshay-slide.png)

### Eliciting feature importances from black-box models

### Learning high-fidelity mimic models
Trepan, LIME, 