# Impact Analyzer

This script reproduces the Impact Analyzer numbers from the data in VBD Actuals / Scenario Planner Actuals.

The direct data export from Impact Analyzer gives the exact same numbers as are driving the plots. The VBD data is larger and allows to specify different time ranges or different selections than the Pega UI.

Caveats:

1. Impact Analyzer only looks at *active* actions. This notion of active / not active is not in the VBD data or at least not currently used by this script.

This script is work-in-progress. It currently only reproduces the impression counts. Value should be included as well as lift, uncertainties etc.

In [None]:
from pdstools import ImpactAnalyzer


The raw input data shows like this. The wide PDC format gives all the data per experiment.

In [None]:
import requests

# Load sample data directly from GitHub repository. For a local file, you can just
# create the ImpactAnalyzer object without a custom reader.

sample_pdc_url = 'https://github.com/pegasystems/pega-datascientist-tools/raw/master/data/ia/CDH_Metrics_ImpactAnalyzer.json'

def github_reader(src):
    response = requests.get(src)
    response.raise_for_status()
    return response.json()

input_as_tbl = ImpactAnalyzer.from_pdc(
    sample_pdc_url, reader=github_reader, return_wide_df=True
).collect()

input_as_tbl

In [None]:
ia = ImpactAnalyzer.from_pdc(
    sample_pdc_url, reader=github_reader
)
ia.ia_data.head(10).collect().to_pandas().style

In [None]:
import requests
import zipfile
import io
import tempfile
from pathlib import Path


def load_from_github(zip_url):
    with tempfile.TemporaryDirectory() as temp_dir:
        response = requests.get(zip_url)
        response.raise_for_status()

        with zipfile.ZipFile(io.BytesIO(response.content)) as zip_file:
            zip_file.extractall(temp_dir)

        # Find all JSON files (handles nested directories)
        temp_path = Path(temp_dir)
        json_files = list(temp_path.rglob("*.json"))

        return ImpactAnalyzer.from_pdc(json_files)


ia = load_from_github(
    "https://github.com/pegasystems/pega-datascientist-tools/raw/master/data/ia/impact_analyzer_data_20251202_151201.zip"
)

All the control groups with counts aggregated over all the channels

In [None]:
ia.summarize_control_groups().collect()

All the experiments, split by channel

In [None]:
ia.summarize_experiments("Channel").collect()

There are convenient summarization functions that pivot the lift metrics overall or per channel.

In [None]:
ia.overall_summary().collect()

In [None]:
ia.summary_by_channel().collect()

There is also some (basic) support for plotting

In [None]:
ia.plot.overview(facet="Channel")

In [None]:
ia.plot.trend(facet="Channel", every="1w")