# Predictions Overview

This is a small notebook to report and analyse Prediction Studio data on Predictions. The underlying data is from the Data-DM-Snapshot table that is used to populate the Prediction Studio screen with Prediction Performance, Lift, CTR etc.

As data this notebook accept data exported from PDC - which has a slightly altered format - as well as data directly exported from the pyGetSnapshot dataset in Pega.

For a description of the datamart tables see https://docs-previous.pega.com/decision-management/87/database-tables-monitoring-models.

Disclaimer: this is not a canned, robust and customer-facing notebook (yet). It's mostly used internally to validate Prediction data. Column names and file formats may need some more review to make it more robust.

In [1]:
import polars as pl

In [None]:
from pathlib import Path
import sys
import polars as pl
import json
from pdstools import readDSExport, Prediction

data_export = "<Your Export Here>"

prediction = None
if data_export.endswith(".parquet"):
    predictions_raw_data = pl.read_parquet(Path(data_export).expanduser())
    prediction = Prediction(predictions_raw_data.lazy())
elif data_export.endswith(".json"):
    print("Import of PDC JSON data not supported")
    sys.exit()
elif data_export.endswith(".zip"):
    # Assuming a direct export from the dataset
    predictions_raw_data = readDSExport(data_export).collect()
    prediction = Prediction(predictions_raw_data.lazy())

predictions_raw_data.head().to_pandas().style

Peek at the internal data

In [None]:
prediction.predictions.head().collect().to_pandas().style

Summary by Channel, over all time

In [None]:
prediction.summary_by_channel().collect().to_pandas().style

Quick glance at the available data aggregated by day.

In [5]:
prediction_summary_by_channel = (
    prediction.summary_by_channel(by_period="1d")
    .with_columns(Prediction=pl.format("{} ({})", pl.col.Channel, pl.col.ModelName))
    .collect()
)

In [None]:
import plotly.express as px

px.line(
    prediction_summary_by_channel.filter(pl.col("isMultiChannelPrediction").not_())
    .filter(pl.col("Channel") != "Unknown")
    .sort(["Period"]),
    x="Period",
    y="Performance",
    color="Prediction",
    title="Prediction Performance",
)

In [None]:
px.line(
    prediction_summary_by_channel.filter(pl.col("isMultiChannelPrediction").not_())
    .filter(pl.col("Channel") != "Unknown")
    .sort(["Period"]),
    x="Period",
    y="Lift",
    color="Prediction",
    title="Prediction Lift",
).update_yaxes(tickformat=",.2%")

In [None]:
px.line(
    prediction_summary_by_channel.filter(pl.col("isMultiChannelPrediction").not_())
    .filter(pl.col("Channel") != "Unknown")
    .sort(["Period"]),
    x="Period",
    y="CTR",
    facet_row="Prediction",
    color="Prediction",
    title="Prediction CTR",
).update_yaxes(tickformat=",.3%", matches=None).for_each_annotation(
    lambda a: a.update(text="")
)

In [None]:
px.line(
    prediction_summary_by_channel.filter(pl.col("isMultiChannelPrediction").not_())
    .filter(pl.col("Channel") != "Unknown")
    .sort(["Period"]),
    x="Period",
    y="ResponseCount",
    facet_row="Prediction",
    color="Prediction",
    title="Prediction Responses",
).update_yaxes(matches=None).for_each_annotation(lambda a: a.update(text=""))