##### Master Degree in Computer Science and Data Science for Economics

# Explainability

### Alfio Ferrara

## Introduction

By **explainability** or **explainable AI** we mean the need of a description of the **reasons behind the machine's behavior** that is **understandable in human terms**.

Such explanations may be **local** or **global**. Local explanations aim to clarify why a machine learning model made a specific decision for a particular input, focusing on individual predictions. In contrast, global explanations seek to describe the overall behavior and decision-making logic of the model across all inputs. While local explanations provide detailed insights into single cases, global explanations offer a broader understanding of the model's general rules and patterns.

There are several approaches to XAI. Here we just focus on some examples.

## SHAP-like methods

SHAP (SHapley Additive exPlanations) is a method used to explain the output of machine learning models by assigning each feature a contribution value for a particular prediction. It is based on game theory, where each feature is seen as a "player" in a cooperative game and the model’s prediction is the "payout" to be fairly divided. SHAP calculates how the prediction changes when a feature is added or removed, averaging over many possible combinations. SHAP-like methods follow similar principles, aiming to provide consistent and interpretable explanations by fairly attributing importance to input features.

An example using SHAP can can be found in [L8.3-shap_example](./L8.3-shap_example.ipynb).

See
> Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.

and the code at [SHAP (SHapley Additive exPlanations)](https://github.com/shap/shap)

## Saliency maps

A saliency map is a visualization technique used to highlight the most important parts of an input that influence a model’s prediction, typically in tasks involving images or text. It shows which regions or features the model "pays attention to" when making a decision. Saliency maps are usually computed by taking the gradient of the model’s output with respect to the input; this gradient indicates how sensitive the prediction is to small changes in each part of the input. In images, for example, brighter areas in the saliency map correspond to pixels that have a stronger impact on the output, helping users understand and interpret the model’s focus.

An example using saliency maps can be found in [L8.4-saliency-clip](./L8.4-saliency-clip.ipynb)

> Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.

## Concept Activation Vectors (CAV)

Concept Activation Vectors (CAVs) are a method used to interpret neural network decisions by connecting them to human-understandable concepts. Instead of focusing on individual features or pixels, CAVs aim to quantify how much a specific high-level concept (e.g., “stripes” or “roundness”) influences the model’s prediction. The process involves collecting examples that represent a concept, then training a linear classifier to distinguish these examples from random inputs within the network’s internal representation space. The vector normal to the decision boundary of this classifier is the CAV. By measuring the directional derivative of the model’s output along this vector, one can determine how sensitive the model is to the concept—essentially answering: “Would the prediction change if this concept were more or less present?”

An example of CAV can be found in [L8.5-concept-activation-vectors](./L8.5-concept-activation-vectors.ipynb)

> Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., & Viegas, F. (2018, July). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning (pp. 2668-2677). PMLR.