# 2. Interpretability

various definition of interpretability:  
  1. the degree to which a human can understand the cause of a decision.  
  2. the degree to which a human can consistently predict the model's result.

## 2.1 The Importance of Interpretability

In predictive modelling, we have to make a trade-off:
  - Do you simply want to know what is predicted, or why the prediction was made and possibly paying for the interpretability with a drop in accuracy? (It depends on what the situation is.)  
  
People usually have a curiousness when they go thorough unexpected events. Closely related to learning is the human desire to find meaning in the world.  
- Machine learning model itself becomes a source of knowledge, instead of the data. Interpretability allows to tap into this additional knowledge captured by the model.  
- By default most machine learning models pick up biases from the training data. Interpretability can be a iuseful debugging tool to detect the biases in the models.  
- The process of integrating machines and algorithms into our daily lives demands interpretability to increase social acceptance.  

Traits for the machine learning model to be able to explain decisions(Doshi-Velez and Kim 2017):  
- Fairness: Making sure the predictions are unbiased and not discrminating against protected.  
- Privacy: Ensuring that sensitive information in the data is protected.  
- Reliability or Robustness: Test that small changes in the input don't lead to big changes in the prediction.  
- Causality: Check if only causal relationships are picked up.  
- Trust: It is easier for humans to trust a system that explains its decisions compared to a black box.  
  
When we do not need interpretability:  
- if the model has no significant impact
- when the problem is well-studied
- if there is a mismatch in the objectives of the creator and the user of a model, Interpretability might cause problems with users fooling a system.

## 2.2 Taxonomy of Interpretability Methods

* Intrinsic or post hoc?
   * Intrinsic interpretability means selecting and training a machine learning model that is considered to be intrinsically interpretable(e.g. short decision trees)
   * Post hoc interpretability means selecting and training a black box model(e.g. neural network) and applying interpretability methods after the training(i.e. measuring the feature importance)
* The interpretability method according to their outcomes:
   * Feature summary statistic
   * Feature summary visualization: some feature summaries can only be visualized and not meaningfully be placed in a table(e.g. partial dependence of a feature)
   * Model internals(e.g. learned weights): The interpretation of intrinsically interpretable models falls under this category.
   * Data point: This category includes all methods that return data paoints (can be existing or newly created) to make a model interpretable. Interpretability methods that ouput new data points work well for images and text but is less useful for tubular data with hundreds of features.
   * Intrinsically interpretable model: the interpretable model themselves are interpreted by internal model parameter or feature summary statistics.
* Model-specific or model-agnostic?
   * Model-specific interpretation tools are limited to specific model classes. the interpretation of intrinsically interpretable models is always model-specific.
   * Model-agnostic tools can be used on any machine learning model and are usually post hoc. These agnostic methods usually operate by analysing feature input and output pairs.

## 2.3 Scope of Interpretability

* Algorithm transparency
   * Algorithm transparency is about how the algorithm learns a model from the data and what kind of relationships it is capable of picking up.
   * Algorithms for linear models are well studied and understood, so they score high in transparency. On the other hand, if how some methods exactly work is not clear, they are less transparent.
* Global, Holistic Model Interpretability
   * This level of interpretability is about understanding how the model makes the decisions, based on a holistic view of its features and each of the learned components like weights, parameters, and structures.
   * global model interpretability is very hard to achieve in practice. 
* Global Model Interpretability on a Modular Level
   * While global model interpretability is usually out of reach, there is a better chance to understand at least some models on a modular level.
   * this interpretability can not be work on every model. For example, in the case of interpretation of a single weight, the weight always come with other inputs.
* Local Interpretability for a Sigle Prediction
   * You can zoom in on a single instance and examine what kind of prediction the model makes for this input, and why it made this decision.
   * Local expectation can be more accurate compared to global explanation.
* Local Interpretability for a Group of Prediction
   * The global methods can be applied by taking the group of instances, pretending it's the complete datasets, and using the global methods on this subset.
   * The single explanation methods can be used on each instance and listed or aggregated afterwards for the whole group.

## 2.4 Evaluating Interpretability

There is no real consensus on what interpretability in machine learning is. Also it is not clear how to measure it.  

* Approaches for Evaluating the Interpretability Quality(Doshi-Velez and Kim proposed three major levels(2017))  
     * Application level evaluation (real task): Put the explanation into the product and let the end user test it. It is conducted with the domain experts.
     * Human level evaluation (simple task): simplified application level evaluation. it is conducted with lay humans.
     * Function level evaluation (proxy task): This works best when the class of models used was already evaluated by someone else in a human level evaluation.

## 2.5 Properties of Explanations 

- Properties of Explanation Methods  
   * Expressive Power:  
           structure of the explanation. i.e. If-Then rules, decision tree, or something else.
   * Translucency:  
           how much the explanation method relies on machine learning model. While methods relying on intrinsically interpretable models are highly translucent, methods only relying on manipulating inputs and observing the predictions have zero translucency.
           The advantage of high translucency is that the method can rely on more information to generate explanations. The advantage of low translucency is that the explanation method is more portable.
   * Portability:  
           the range of machine learning models with which the explanation method can be used. Methods with a low translucency have a higer portability. Surrogate models might has the highest portability.
   * Algorithmic Complexity:
           computational complexity of the methods  
        
- Properties of Individiual Explanations  
   * Accuracy:
           How well an explanation predicts data
   * Fidelity:
           How well the explanation approximates the prediction of the black box model. High fidelity is one of the most important properties of an explanation. Accuracy and fidelity are closely related. If the black box model has high accuracy and the explanation has high fidelity, the explanation also has high accuracy.
   * Consistency:
           How similar the explanations are between models that have been trained on the same task and that produce similar predictions. 
   * Stability:
           How siilar the explanations are for similar instances.
   * Comprehensibility:
           How well humans understand the explanations
   * Certainty:
           How well the explanation reflect the certainity of the machine learning model.
   * Degree of Importance:
           How well the explanation reflect the importance of features or parts of the explanation.
   * Novelty:
           How well the explanation reflect whether a data instance to be explained comes from a "new" region far removed from the distribution of training data. The higher the novelty is, the more likley it is that the model will have low certainty due to lack of data.
   * Representativeness:
           How many instances an explanation covers.

## 2.6 Human-friendly Explanations

As an explanation for an event, humans prefer short explanations (just 1 or 2 causes), which contrast the current situation with a situation where the event would not have happened. Especially abnormal causes make good explanations.

### 2.6.1 What is an explanation?

    - Explanations are social interactions between the explainer and the explainee (receiver of the explanation) and therefore the social context has a huge influence on the actual content of the explanation.
    - An explanation is the answer to a why-question (Miller 2017).
    - The term “explanation” means the social and cognitive process of explaining, but it’s also the product of these processes. The explainer can be a human or a machin

### 2.6.2 What is a "good" explanation?

* Explanations are contrastive (Lipton 2016)
        Humans don’t want a complete explanation for a prediction but rather compare what the difference were to another instance’s prediction (could also be an artificial one).
* Explanations are selected
        Make the explanation very short, give only 1 to 3 reasons, even if the world is more complex.
* Explanations are social
        Be mindful of the social setting of your machine learning application and of the target audience.
* Explanations focus on the abnormal
        If one of the input features for a prediction was abnormal in any sense (like a rare category of a categorical feature) and the feature influenced the prediction, it should be included in an explanation, even if other ‘normal’ features have the same influence on the prediction as the abnormal one.
* Explanations are truthful
        The explanation should predict the event as truthfully as possible, which is sometimes called fidelity in the context of machine learning.
* Good explanations are coherent with prior beliefs of the explainee.
        Good explanations are consistent with prior beliefs. This one is hard to infuse into machine learning and would probably drastically compromise predictive accuracy.
* Good explanations are general and probable.
        Generality is easily measured by a feature’s support, which is the number of instances for which the explanation applies over the total number of instances.

(Reference: https://christophm.github.io/interpretable-ml-book/interpretability.html)