# Predictions Overview

This is a small notebook to report and analyse Prediction Studio data on Predictions. The underlying data is from the Data-DM-Snapshot table that is used to populate the Prediction Studio screen with Prediction Performance, Lift, CTR etc.

As data this notebook accept data exported from PDC - which has a slightly altered format - as well as data directly exported from the pyGetSnapshot dataset in Pega.

For a description of the datamart tables see https://docs-previous.pega.com/decision-management/87/database-tables-monitoring-models.

Disclaimer: this is not a canned, robust and customer-facing notebook (yet). It's mostly used internally to validate Prediction data. Column names and file formats may need some more review to make it more robust.

In [None]:
from pathlib import Path
import polars as pl
import json
from pdstools import readDSExport, Prediction

data_export = "<YOUR DATA FILE HERE>"
data_export = "/Users/perdo/Downloads/NAB_ADM_Result_Data/Data-DM-Snapshot_pyGetSnapshot_20240612T051004_GMT.zip"
# data_export = "/Users/perdo/Library/CloudStorage/OneDrive-SharedLibraries-PegasystemsInc/AI Chapter Data Sets - Documents/Customers/NFCU/PDC/Models_data_20240122T103548.244 GMT.json"

prediction = None
if data_export.endswith(".parquet"):
    # TODO - this should be similar to the JSON import and is by convention
    predictions_raw_data = pl.read_parquet(Path(data_export).expanduser())
    prediction = Prediction(predictions_raw_data.lazy())
elif data_export.endswith(".json"):
    # Assume this is an export from the PDC screen
    with open(Path(data_export).expanduser()) as f:
        predictions_raw_data = (
            pl.from_dicts(json.loads(f.read())["pxResults"])
        )
        prediction = Prediction.from_pdc(predictions_raw_data.lazy())
elif data_export.endswith(".zip"):
    # Assuming a direct export from the dataset
    predictions_raw_data = (
        readDSExport(data_export)
        .collect()
    )
    prediction = Prediction(predictions_raw_data.lazy())

predictions_raw_data.head()  # .to_pandas().style

Quick glance at the available data. Selecting just the last day of the snapshots.

In [None]:
prediction.summary_by_channel().collect().to_pandas().style

In [None]:
import plotly.express as px
px.line(
    prediction.summary_by_channel(keep_trend_data=True)
    .collect()
    .filter(pl.col("isMultiChannelPrediction").not_())
    .filter(pl.col("Channel") != "Unknown")
    .sort(["SnapshotTime"]),
    x="SnapshotTime",
    y="Performance",
    color="Channel",
    title="Prediction Performance",
)

In [None]:
px.line(
    prediction.summary_by_channel(keep_trend_data=True)
    .collect()
    .filter(pl.col("isMultiChannelPrediction").not_())
    .filter(pl.col("Channel") != "Unknown")
    .sort(["SnapshotTime"]),
    x="SnapshotTime",
    y="Lift",
    color="Channel",
    title="Prediction Lift",
).update_yaxes(tickformat = ',.2%')