In [None]:
# These lines are only for rendering in the docs, and are hidden through Jupyter tags
# Do not run if you're running the notebook seperately

import plotly.io as pio

pio.renderers.default = "notebook_connected"

# Global Explanations for Adaptive Gradient Boosting Models

__Pega__

__2025-10-17__

This notebook demonstrates how to analyze and visualize global explanations for Adaptive Gradient Boosting models in Pega Adaptive Decision Manager (ADM). Global explanations provide insights into which predictors have the most influence on model predictions and how different predictor values correlate with the model's scores.

**Important:** The SHAP explanation datasets used in this notebook are available starting with **Pega Infinity '25**. In earlier versions, the explanation files are not written to the repository.

The explanation files contain SHAP (SHapley Additive exPlanations) contributions for a sample of model executions. This notebook shows how these contributions are aggregated to provide global explanations for Gradient Boosting models. In Pega Infinity '25, a Global Explanations report can also be generated directly from [Prediction Studio](https://docs.pega.com/bundle/platform/page/platform/decision-management/gradient-boosting-explanations.html).

**Note on earlier versions:** In versions prior to Infinity '25, Feature Importance (also known as Predictor Importance) is already available for Gradient Boosting models and uses SHAP values. In Infinity '25, we enhance this by providing detailed insights into the correlation between predictor _values_ and the model's score (for example, high income may correlate with a high contribution to the propensity score for a platinum credit card offer) rather than just the importances of predictors. See: https://docs.pega.com/bundle/platform/page/platform/decision-management/view-summarized-reports-adm.html


## Aggregate data exported from infinity

**Prerequisite**: You have already exported explanation files from infinity.

parameters:
- `data_folder`: the folder which has the explanation files
- `model_name`: `optional` - the model rule to check for explanations, if not passed will pick up any file in the folder
- `from_date`: `optional` - if not passed will be today - 7 days
- `to_date`: `optional` - if not passed will be today

The aggregated data will be stored in the `.tmp/aggregated_data` directory.

In [None]:
from pdstools.explanations import Explanations

import datetime
import polars as pl
import logging
# logging.basicConfig(level=logging.INFO) # Uncomment to see progress for large files


explanations = Explanations(
    # data_folder='../../data/explanations/', # Uncomment this line and provide folder location of exported explanations data
    data_file="https://raw.githubusercontent.com/pegasystems/pega-datascientist-tools/master/data/explanations/AdaptiveBoostCT_20250328064604.parquet", # Remove this argument when running locally with exported data
    model_name='AdaptiveBoostCT',
    from_date=datetime.date(2024,3,28),
    to_date=datetime.date(2025,3,28)
)

## Simple plotting of contributions

These methods will help to plot the contributions for overall model or a specific context.

The first plot will show the `top-n` predictors with their contributions. The remaining plots are for each predictor in the `top-n` list. Numeric predictors values will be binned to a max of 10 bins, while the categorical predictors will show the `top-k` categories with their contributions.

### Explanations for overall model

Call `explanations.plot.contributions()` without selecting any context from the interactive context picker. This will result in plots which aggregate the data over all contexts.

parameters:
- `top_n`: Number of top predictors to plot.
- `top_k`: Number of top predictor values for symbolic predictors to plot.
- `remaining`: If `True`, the remaining predictors will be plotted as a single bar.
- `missing`: If `True`, the missing values will be plotted as a separate bar.
- `descending`: If `True`, the predictors will be sorted in descending order of their contributions. i.e least contributing predictors will be plotted first.
- `contribution_calculation`: Method to calculate contributions. Some options are `contribution`, `contribution_abs`, `contribution_weighted`. Default is `contribution` which is the average contributions to predictions.

In [None]:
_, plots = explanations.plot.contributions(top_n=3, top_k=5, remaining=True)

### Explanations for selected context

Call `explanations.filter.interactive()` to display the interactive context picker. This allows you to select a specific context from the list of available contexts.

The context picker will help in filtering the data for very large list of contexts. Fine-tune your selection by using the comboboxes on the left side of the context picker. This will display the available contexts on the right, from which you can select specific context keys.

Run `explanations.plot.contributions()` after selecting a context from the interactive context picker. This will plot the contributions for the selected context.

__NOTE__: Plots are only for a single context. i.e required for a context to be selected from the list.

In [None]:
explanations.filter.interactive()

In [None]:
context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)

Can also set the context manually by passing a dictionary with the context keys and values.

In [None]:
explanations.filter.set_selected_context(
    {"pyChannel": "PegaBatch",
    "pyDirection": "E2E Test",
    "pyGroup": "E2E Test",
    "pyIssue": "Batch",
    "pyName": "P2"
})

In [None]:
context_header, plots = explanations.plot.contributions(top_n=3, top_k=5)

## Advaced Data Exploration
For more advanced data exploration you can directly look at the aggregate. These classes provide more flexibility in how the data is loaded and processed. Allowing you to inspect the data before plotting.

In [None]:
aggregate = explanations.aggregate # load the aggregated data

### Inspect data for overall model

Get the `top_n` predictors and their contributions for the overall model

In [None]:
df_overall = aggregate.get_predictor_contributions(top_n = 3, remaining=False)
df_overall


We can inspect the most influential values (`top_k`) of the predictors we picked

In [None]:
top_n_predictors = df_overall.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
    predictors=top_n_predictors, 
    top_k = 2, 
    remaining=False
)

### Inspect data by selected context

Let's repeat the same again, but this time we will inspect a selected context, instead of the entire model.

In [None]:
import random
context_info = random.choice(aggregate.get_unique_contexts_list())
print('Selected random context: \n')
for key, value in context_info.items():
    print(f'{key}: {value}')
df_by_context = aggregate.get_predictor_contributions(
    context=context_info, 
    top_n=3, 
    remaining=False)
df_by_context


In [None]:
top_n_predictors = df_by_context.select(pl.col('predictor_name')).unique().to_series().to_list()
aggregate.get_predictor_value_contributions(
    predictors=top_n_predictors, 
    top_k=2, 
    context=context_info, 
    remaining=False)