Skip to content

Latest commit

 

History

History
1311 lines (747 loc) · 35.9 KB

explainability.md

File metadata and controls

1311 lines (747 loc) · 35.9 KB
author title semester footer license
Christian Kaestner and Claire Le Goues
MLiP: Explainability and Interpretability
Spring 2024
Machine Learning in Production/AI Engineering • Christian Kaestner & Claire Le Goues, Carnegie Mellon University • Spring 2024
Creative Commons Attribution 4.0 International (CC BY 4.0)

Machine Learning in Production

Explainability and Interpretability


Administrativa

Homework I4 to be released Monday; 1 week assignment, due Apr 17

diabetic retinopathy, screening photo

diabetic retinopathy, eye photo

smartphone with lense attached


Research in this Course (in I4)

We are conducting academic research on explainability policies and evidence. This research will involve analyzing student work of this assignment. You will not be asked to do anything above and beyond the normal learning activities and assignments that are part of this course. You are free not to participate in this research, and your participation will have no influence on your grade for this course or your academic career at CMU. If you do not wish to participate, please send an email to Nadia Nahar (nadian@andrew.cmu.edu). Participants will not receive any compensation or extra credit. The data collected as part of this research will not include student grades. All analyses of data from participants’ coursework will be conducted after the course is over and final grades are submitted -- instructors will not know who chooses not to participate before final grades are submitted. All data will be analyzed in de-identified form and presented in the aggregate, without any personal identifiers. If you have questions pertaining to your rights as a research participant, or to report concerns to this study, please contact Nadia Nahar (nadian@andrew.cmu.edu) or the Office of Research Integrity and Compliance at Carnegie Mellon University (irb-review@andrew.cmu.edu; phone: 412-268-4721).


Milestone 3: Availability Requirement

10pt: The recommendation service is at least 70% available in the 72 hours before the submission and the 96 hours after (i.e., max downtime of 50h), while at least two updates are performed in that time period.

5pt: Bonus points if the recommendation service is at least 99% available in the same 7-day window (max 100min downtime), while at least two updates are performed in that time period.


Solar eclipse


TAing Next Semester?


Explainability as Building Block in Responsible Engineering

Overview of course content


"Readings"

Required one of:

Recommended supplementary reading:


Learning Goals

  • Understand the importance of and use cases for interpretability
  • Explain the tradeoffs between inherently interpretable models and post-hoc explanations
  • Measure interpretability of a model
  • Select and apply techniques to debug/provide explanations for data, models and model predictions
  • Eventuate when to use interpretable models rather than ex-post explanations

Motivating Examples


Turtle recognized as gun


Adversarial examples

Image: Gong, Yuan, and Christian Poellabauer. "An overview of vulnerabilities of voice controlled systems." arXiv preprint arXiv:1803.09156 (2018).


Detecting Anomalous Commits

Reported commit

Goyal, Raman, Gabriel Ferreira, Christian Kästner, and James Herbsleb. "Identifying unusual commits on GitHub." Journal of Software: Evolution and Process 30, no. 1 (2018): e1893.


Is this recidivism model fair?

IF age between 1820 and sex is male THEN 
  predict arrest
ELSE IF age between 2123 and 23 prior offenses THEN 
  predict arrest
ELSE IF more than three priors THEN 
  predict arrest
ELSE 
  predict no arrest

Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1, no. 5 (2019): 206-215.


How to interpret the results?

Screenshot of the COMPAS tool

Image source (CC BY-NC-ND 4.0): Christin, Angèle. (2017). Algorithms in practice: Comparing web journalism and criminal justice. Big Data & Society. 4.


How to consider seriousness of the crime?

Recidivism scoring systems

Rudin, Cynthia, and Berk Ustun. "Optimized scoring systems: Toward trust in machine learning for healthcare and criminal justice." Interfaces 48, no. 5 (2018): 449-466.


What factors go into predicting stroke risk?

Scoring system for stroke risk

Rudin, Cynthia, and Berk Ustun. "Optimized scoring systems: Toward trust in machine learning for healthcare and criminal justice." Interfaces 48, no. 5 (2018): 449-466.


Is there an actual problem? How to find out?



News headline: Stanford algorithm for vaccine priority controversy


The "algorithm" used at Stanford


Explaining Decisions

Cat? Dog? Lion? -- Confidence? Why?

Cat


What's happening here?

Perceptron


Explaining Decisions

Slack Notifications Decision Tree


Explainability in ML

Explain how the model made a decision

  • Rules, cutoffs, reasoning?
  • What are the relevant factors?
  • Why those rules/cutoffs?

Challenging because models too complex and based on data

  • Can we understand the rules?
  • Can we understand why these rules?

Why Explainability?


Why Explainability?


Debugging

  • Why did the system make a wrong prediction in this case?
  • What does it actually learn?
  • What data makes it better?
  • How reliable/robust is it?
  • How much does second model rely on outputs of first?
  • Understanding edge cases

Turtle recognized as gun

Debugging is the most common use in practice (Bhatt et al. "Explainable machine learning in deployment." In Proc. FAccT. 2020.)


Auditing

  • Understand safety implications
  • Ensure predictions use objective criteria and reasonable rules
  • Inspect fairness properties
  • Reason about biases and feedback loops
  • Validate "learned specifications/requirements" with stakeholders
IF age between 1820 and sex is male THEN predict arrest
ELSE IF age between 2123 and 23 prior offenses THEN predict arrest
ELSE IF more than three priors THEN predict arrest
ELSE predict no arrest

Trust

More accepting a prediction if clear how it is made, e.g.,

  • Model reasoning matches intuition; reasoning meets fairness criteria
  • Features are difficult to manipulate
  • Confidence that the model generalizes beyond target distribution

Trust model

Conceptual model of trust: R. C. Mayer, J. H. Davis, and F. D. Schoorman. An integrative model of organizational trust. Academy of Management Review, 20(3):709–734, July 1995.


Actionable Insights to Improve Outcomes

"What can I do to get the loan?"

"How can I change my message to get more attention on Twitter?"

"Why is my message considered as spam?"


Regulation / Legal Requirements

The EU General Data Protection Regulation extends the automated decision-making rights [...] to provide a legally disputed form of a right to an explanation: "[the data subject should have] the right ... to obtain an explanation of the decision reached"

US Equal Credit Opportunity Act requires to notify applicants of action taken with specific reasons: "The statement of reasons for adverse action required by paragraph (a)(2)(i) of this section must be specific and indicate the principal reason(s) for the adverse action."

See also https://en.wikipedia.org/wiki/Right_to_explanation


Curiosity, learning, discovery, science

Statistical modeling from Badges Paper


Curiosity, learning, discovery, science

News article about using machine learning for detecting smells


Settings where Interpretability is not Important?

Notes:

  • Model has no significant impact (e.g., exploration, hobby)
  • Problem is well studied? e.g optical character recognition
  • Security by obscurity? -- avoid gaming

Exercise: Debugging a Model

Consider the following debugging challenges. In groups discuss how you would debug the problem. In 3 min report back to the class.

Algorithm bad at recognizing some signs in some conditions: Stop Sign with Bounding Box

Graduate appl. system seems to rank applicants from HBCUs low: Cheyney University founded in 1837 is the oldest HBCU

Left Image: CC BY-SA 4.0, Adrian Rosebrock


Defining Interpretability

Interpretability Definitions

Two common approaches:

Interpretability is the degree to which a human can understand the cause of a decision

Interpretability is the degree to which a human can consistently predict the model’s result.

(No mathematical definition)

How would you measure interpretability?

Explanation

Understanding a single prediction for a given input

Your loan application has been declined. If your savings account had had more than $100 your loan application would be accepted.

Answer why questions, such as

  • Why was the loan rejected? (justification)
  • Why did the treatment not work for the patient? (debugging)
  • Why is turnover higher among women? (general science question)

How would you measure explanation quality?


Intrinsic interpretability vs Post-hoc explanation?

Models simple enough to understand (e.g., short decision trees, sparse linear models)

Scoring system

Explanation of opaque model, local or global

Your loan application has been declined. If your savings account had more than $100 your loan application would be accepted.


On Terminology

Rudin's terminology and this lecture:

  • Interpretable models: Intrinsily interpretable models
  • Explainability: Post-hoc explanations

Interpretability: property of a model

Explainability: ability to explain the workings/predictions of a model

Explanation: justification of a single prediction

Transparency: The user is aware that a model is used / how it works

These terms are often used inconsistently or interchangeble

Random letters


Understanding a Model

Levels of explanations:

  • Understanding a model
  • Explaining a prediction
  • Understanding the data

Inherently Interpretable: Sparse Linear Models

$f(x) = \alpha + \beta_1 x_1 + ... + \beta_n x_n$

Truthful explanations, easy to understand for humans

Easy to derive contrastive explanation and feature importance

Requires feature selection/regularization to minimize to few important features (e.g. Lasso); possibly restricting possible parameter values


Score card: Sparse linear model with "round" coefficients

Scoring card


Inherently Interpretable: Shallow Decision Trees

Easy to interpret up to a size

Possible to derive counterfactuals and feature importance

Unstable with small changes to training data

IF age between 1820 and sex is male THEN predict arrest
ELSE IF age between 2123 and 23 prior offenses THEN predict arrest
ELSE IF more than three priors THEN predict arrest
ELSE predict no arrest

Not all Linear Models and Decision Trees are Inherently Interpretable

  • Models can be very big, many parameters (factors, decisions)
  • Nonlinear interactions possibly hard to grasp
  • Tool support can help (views)
  • Random forests, ensembles no longer easily interpretable
173554.681081086 * root + 318523.818532818 * heuristicUnit + -103411.870761673 * eq + -24600.5000000002 * heuristicVsids +
-11816.7857142856 * heuristicVmtf + -33557.8961038976 * heuristic + -95375.3513513509 * heuristicUnit * satPreproYes + 
3990.79729729646 * transExt * satPreproYes + -136928.416666666 * eq * heuristicUnit + 12309.4990990994 * eq * satPreproYes + 
33925.0833333346 * eq * heuristic + -643.428571428088 * backprop * heuristicVsids + -11876.2857142853 * backprop * 
heuristicUnit + 1620.24242424222 * eq * backprop + -7205.2500000002 * eq * heuristicBerkmin + -2 * Num1 * Num2 + 10 * Num3 * Num4

Notes: Example of a performance influence model from http://www.fosd.de/SPLConqueror/ -- not the worst in terms of interpretability, but certainly not small or well formated or easy to approach.


Inherently Interpretable: Decision Rules

if-then rules mined from data

easy to interpret if few and simple rules

see association rule mining:

{Diaper, Beer} -> Milk  (40% support, 66% confidence)
Milk -> {Diaper, Beer}  (40% support, 50% confidence)
{Diaper, Beer} -> Bread (40% support, 66% confidence)

Research in Inherently Interpretable Models

Several approaches to learn sparse constrained models (e.g., fit score cards, simple if-then-else rules)

Often heavy emphasis on feature engineering and domain-specificity

Possibly computationally expensive

Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1, no. 5 (2019): 206-215.


Post-Hoc Model Explanation: Global Surrogates

  1. Select dataset X (previous training set or new dataset from same distribution)
  2. Collect model predictions for every value: $y_i=f(x_i)$
  3. Train inherently interpretable model $g$ on (X,Y)
  4. Interpret surrogate model $g$

Can measure how well $g$ fits $f$ with common model quality measures, typically $R^2$

Advantages? Disadvantages?

Notes: Flexible, intuitive, easy approach, easy to compare quality of surrogate model with validation data ($R^2$). But: Insights not based on real model; unclear how well a good surrogate model needs to fit the original model; surrogate may not be equally good for all subsets of the data; illusion of interpretability. Why not use surrogate model to begin with?


Advantages and Disadvantages of Surrogates?


Advantages and Disadvantages of Surrogates?

  • short, contrastive explanations possible
  • useful for debugging
  • easy to use; works on lots of different problems
  • explanations may use different features than original model
  • explanation not necessarily truthful
  • explanations may be unstable
  • likely not sufficient for compliance scenario

Post-Hoc Model Explanation: Feature Importance

FI example


Feature Importance

  • Permute a feature's values in validation data -> hide it for prediction
  • Measure influence on accuracy
  • -> This evaluates feature's influence without retraining the model
  • Highly compressed, global insights
  • Effect for feature + interactions
  • Can only be computed on labeled data, depends on model accuracy, randomness from permutation
  • May produce unrealistic inputs when correlations exist

(Can be evaluated both on training and validation data)

Note: Training vs validation is not an obvious answer and both cases can be made, see Molnar's book. Feature importance on the training data indicates which features the model has learned to use for predictions.


Post-Hoc Model Explanation: Partial Dependence Plot (PDP)

PDP Example


Partial Dependence Plot

  • Computes marginal effect of feature on predicted outcome
  • Identifies relationship between feature and outcome (linear, monotonous, complex, ...)
  • Intuitive, easy interpretation
  • Assumes no correlation among features

Partial Dependence Plot for Interactions

PDP Example

Probability of cancer; source: Christoph Molnar. "Interpretable Machine Learning." 2019


Concept Bottleneck Models

Hybrid/partially interpretable model

Force models to learn features, not final predictions. Use inherently interpretable model on those features

Requries to label features in training data

Illustration from the paper how CNN are used to extract features

Koh, Pang Wei, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. "Concept bottleneck models." In Proc. ICML, 2020.


Summary: Understanding a Model

Understanding of the whole model, not individual predictions!

Some models inherently interpretable:

  • Sparse linear models
  • Shallow decision trees

Ex-post explanations for opaque models:

  • Global surrogate models
  • Feature importance, partial dependence plots
  • Many more in the literature

Explaining a Prediction

Levels of explanations:

  • Understanding a model
  • Explaining a prediction
  • Understanding the data

Understanding Predictions from Inherently Interpretable Models is easy

Derive key influence factors or decisions from model parameters

Derive contrastive counterfacturals from models

Examples: Predict arrest for 18 year old male with 1 prior:

IF age between 1820 and sex is male THEN predict arrest
ELSE IF age between 2123 and 23 prior offenses THEN predict arrest
ELSE IF more than three priors THEN predict arrest
ELSE predict no arrest

Posthoc Prediction Explanation: Feature Influences

Which features were most influential for a specific prediction?

Lime Example

Source: https://github.com/marcotcr/lime


Feature Influences in Images

Lime Example

Source: https://github.com/marcotcr/lime


Feature Importance vs Feature Influence

Feature importance is global for the entire model (all predictions)

FI example

Feature influence is for a single prediction

Lime Example


Feature Infl. with Local Surrogates (LIME)

Create an inherently interpretable model (e.g. sparse linear model) for the area around a prediction

Lime approach:

  • Create random samples in the area around the data point of interest
  • Collect model predictions with $f$ for each sample
  • Learn surrogate model $g$, weighing samples by distance
  • Interpret surrogate model $g$

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. ""Why should I trust you?" Explaining the predictions of any classifier." In Proc International Conference on Knowledge Discovery and Data Mining, pp. 1135-1144. 2016.


Lime Example


Lime Example


Lime Example


LIME Example

Lime Example

Source: Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. ""Why should I trust you?" Explaining the predictions of any classifier." In Proc. KDD. 2016.


Advantages and Disadvantages of Local Surrogates?


Posthoc Prediction Explanation: Shapley Values / SHAP

  • Game-theoretic foundation for local explanations (1953)
  • Explains contribution of feature, over predictions with different feature subsets
    • "The Shapley value is the average marginal contribution of a feature value across all possible coalitions"
  • Solid theory ensures fair mapping of influence to features
  • Requires heavy computation, usually only approximations feasible
  • Explanations contain all features (ie. not sparse) Currently, most common local method used in practice

Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." In Advances in neural information processing systems, pp. 4765-4774. 2017.


SHAP Force Plot

Shap force plot


Anchors

Anchors example


Counterfactual Explanations

if X had not occured, Y would not have happened

Your loan application has been declined. If your savings account had had more than $100 your loan application would be accepted.

-> Smallest change to feature values that result in given output


Multiple Counterfactuals

Often long or multiple explanations

Your loan application has been declined. If your savings account ...

Your loan application has been declined. If your lived in ...

Report all or select "best" (e.g. shortest, most actionable, likely values)

(Rashomon effect)

Rashomon


Counterfactuals example


Searching for Counterfactuals?


Searching for Counterfactuals

Random search (with growing distance) possible, but inefficient

Many search heuristics, e.g. hill climbing or Nelder–Mead, may use gradient of model if available

Can incorporate distance in loss function

$$L(x,x^\prime,y^\prime,\lambda)=\lambda\cdot(\hat{f}(x^\prime)-y^\prime)^2+d(x,x^\prime)$$

(similar to finding adversarial examples)


Adversarial examples


Discussion: Counterfactuals

  • Easy interpretation, can report both alternative instance or required change
  • No access to model or data required, easy to implement
  • Often many possible explanations (Rashomon effect), requires selection/ranking
  • May require changes to many features, not all feasible
  • May not find counterfactual within given distance
  • Large search spaces, especially with high-cardinality categorical features

Actionable Counterfactuals

Example: Denied loan application

  • Customer wants feedback of how to get the loan approved
  • Some suggestions are more actionable than others, e.g.,
    • Easier to change income than gender
    • Cannot change past, but can wait
  • In distance function, not all features may be weighted equally

Similarity

  • k-Nearest Neighbors inherently interpretable (assuming intutive distance function)
  • Attempts to build inherently interpretable image classification models based on similarity of fragments

Paper screenshot from "this looks like that paper"

Chen, Chaofan, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K. Su. "This looks like that: deep learning for interpretable image recognition." In NeurIPS (2019).


Summary: Understanding a Prediction

Understanding a single predictions, not the model as a whole

Explaining influences, providing counterfactuals and sufficient conditions, showing similar instances

Easy on inherently interpretable models

Ex-post explanations for opaque models:

  • Feature influences (LIME, SHAP, attention maps)
  • Searching for Cunterfactuals
  • Similarity, knn

Understanding the Data

Levels of explanations:

  • Understanding a model
  • Explaining a prediction
  • Understanding the data

Prototypes and Criticisms

  • Prototype is a data instance that is representative of all the data
  • Criticism is a data instance not well represented by the prototypes

Example

Source: Christoph Molnar. "Interpretable Machine Learning." 2019


Example: Prototypes and Criticisms?

Example


Example: Prototypes and Criticisms

Example

Source: Christoph Molnar. "Interpretable Machine Learning." 2019


Example: Prototypes and Criticisms

Example

Source: Christoph Molnar. "Interpretable Machine Learning." 2019

Note: The number of digits is different in each set since the search was conducted globally, not per group.


Methods: Prototypes and Criticisms

Clustering of data (ala k-means)

  • k-medoids returns actual instances as centers for each cluster
  • MMD-critic identifies both prototypes and criticisms
  • see book for details

Identify globally or per class


Discussion: Prototypes and Criticisms

  • Easy to inspect data, useful for debugging outliers
  • Generalizes to different kinds of data and problems
  • Easy to implement algorithm
  • Need to choose number of prototypes and criticism upfront
  • Uses all features, not just features important for prediction

Influential Instance

Data debugging: What data most influenced the training?

Example

Source: Christoph Molnar. "Interpretable Machine Learning." 2019


Influential Instances

Data debugging: What data most influenced the training? Is the model skewed by few outliers?

Approach:

  • Given training data with $n$ instances...
  • ... train model $f$ with all $n$ instances
  • ... train model $g$ with $n-1$ instances
  • If $f$ and $g$ differ significantly, omitted instance was influential
    • Difference can be measured e.g. in accuracy or difference in parameters

Note: Instead of understanding a single model, comparing multiple models trained on different data


Influential Instances Discussion

Retraining for every data point is simple but expensive

For some class of models, influence of data points can be computed without retraining (e.g., logistic regression), see book for details

Hard to generalize to taking out multiple instances together

Useful model-agnostic debugging tool for models and data

Christoph Molnar. "Interpretable Machine Learning: A Guide for Making Black Box Models Explainable." 2019


Three Concepts

Feature importance: How much does the model rely on a feature, across all predictions?

Feature influence: How much does a specific prediction rely on a feature?

Influential instance: How much does the model rely on a single training data instance?


Summary: Understanding the Data

Understand the characteristics of the data used to train the model

Many data exploration and data debugging techniques:

  • Criticisms and prototypes
  • Influential instances
  • many others...

Breakout: Debugging with Explanations

In groups, discuss which explainability approaches may help and why. Tagging group members, write to #lecture.

Algorithm bad at recognizing some signs in some conditions: Stop Sign with Bounding Box

Graduate appl. system seems to rank applicants from HBCUs low: Cheyney University founded in 1837 is the oldest HBCU

Left Image: CC BY-SA 4.0, Adrian Rosebrock


"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead."


Accuracy vs Explainability Conflict?

Accuracy vs Explainability Sketch

Graphic from the DARPA XAI BAA (Explainable Artificial Intelligence)


Faithfulness of Ex-Post Explanations


CORELS’ model for recidivism risk prediction

IF age between 1820 and sex is male THEN predict arrest
ELSE IF age between 2123 and 23 prior offenses THEN predict arrest
ELSE IF more than three priors THEN predict arrest
ELSE predict no arrest

Simple, interpretable model with comparable accuracy to proprietary COMPAS model

Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1.5 (2019): 206-215. (Preprint)


"Stop explaining ..."

Hypotheses:

  • It is a myth that there is necessarily a trade-off between accuracy and interpretability (when having meaningful features)
  • Explainable ML methods provide explanations that are not faithful to what the original model computes
  • Explanations often do not make sense, or do not provide enough detail to understand what the black box is doing
  • Black box models are often not compatible with situations where information outside the database needs to be combined with a risk assessment
  • Black box models with explanations can lead to an overly complicated decision pathway that is ripe for human error

Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1.5 (2019): 206-215. (Preprint)


Prefer Interpretable Models over Post-Hoc Explanations

  • Interpretable models provide faithful explanations
    • post-hoc explanations may provide limited insights or illusion of understanding
    • interpretable models can be audited
  • Inherently interpretable models in many cases have similar accuracy
  • Larger focus on feature engineering, more effort, but insights into when and why the model works
  • Less research on interpretable models and some methods computationally expensive

ProPublica Controversy

ProPublica Article

Notes: "ProPublica’s linear model was not truly an “explanation” for COMPAS, and they should not have concluded that their explanation model uses the same important features as the black box it was approximating."

ProPublica Controversy

IF age between 1820 and sex is male THEN 
  predict arrest
ELSE IF age between 2123 and 23 prior offenses THEN 
  predict arrest
ELSE IF more than three priors THEN 
  predict arrest
ELSE 
  predict no arrest

Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1, no. 5 (2019): 206-215.


Drawbacks of Interpretable Models

Intellectual property protection harder

  • may need to sell model, not license as service
  • who owns the models and who is responsible for their mistakes?

Gaming possible; "security by obscurity" not a defense

Expensive to build (feature engineering effort, debugging, computational costs)

Limited to fewer factors, may discover fewer patterns, lower accuracy


Summary

  • Interpretability useful for many scenarios: user feedback, debugging, fairness audits, science, ...
  • Defining and measuring interpretability
    • Explaining the model
    • Explaining predictions
    • Understanding the data
  • Inherently interpretable models: sparse regressions, shallow decision trees
  • Providing ex-post explanations of opaque models: global and local surrogates, dependence plots and feature importance, anchors, counterfactual explanations, criticisms, and influential instances
  • Consider implications on user interface design
  • Gaming and manipulation with explanations

Further Readings