## Link to article

This notebook is included in the documentation, where the interactive Plotly charts show up. See:
https://pegasystems.github.io/cdh-datascientist-tools/Python/articles/vf_analysis.html

In [1]:
# These lines are only for rendering in the docs, and are hidden through Jupyter tags
# Do not run if you're running the notebook seperately

import plotly.io as pio
pio.renderers.default='notebook_connected'

import sys
sys.path.append('../../../')

# Value Finder analysis
Every Value Finder simulation populates a dataset, the pyValueFinder dataset.  This dataset contains a lot more information than is what is currently presented on screen.

The data held in this dataset can be analysed to uncover insights into your decision framework. 

CDH tools has been updated to provide a notebook for some pre-configured analysis of the Value Finder dataset. This analysis can be used to supplement your Value Finder simulation whilst we add these features formally to the product.

In the data folder we’ve stored a copy of such a dataset, generated from an (internal) demo
application (CDHSample).

This page shows an example how the data can be used for additional analyses.

First, let’s look at the results as presented in Pega:

![Pega value finder screen](pegarun_8_6_0.png)

For the sample provided, the relevant action setting is 1.2%. There are 10.000 customers, 3491 without actions, 555 with only irrelevant actions and 5954 with at least one relevant action.

Now, let's import our class, read the data and recreate this view and supplement it with some advanced analysis of the pyValueFinder dataset. Just like with the ADMDatamart class, you can supply your own path and filename as such:
```python
ValueFinder(path = 'path-to-data', filename="Data-Insights_pyValueFinder_timestamp_GMT.zip")
```

If only a path is supplied, it will automatically look for the latest file. 
It is also possible to supply a dataframe as the 'df' argument directly, in which case it will use that instead. 
Lastly, there is now also an additional dataset in cdh tools, which is what we'll be using.

In [1]:
from cdhtools import ValueFinder, datasets
import polars as pl
vf = datasets.SampleValueFinder()

File found through URL
Importing: https://raw.githubusercontent.com/pegasystems/cdh-datascientist-tools/master/data/Data-Insights_pyValueFinder_20210824T112615_GMT.zip
Data import took 5.25 seconds
Transforming to polars... Took: 0.02 seconds
Generating: Customer Summary... Took: 0.0 seconds
Generating: Counts per stage... Took: 0.0 seconds


As we can see, it has found a file on the GitHub repo and imports it straight from there. It also prints out some extra information about some calculations, which can be suppressed by supplying the keyword 'verbose=False'. 

Since there is only one dataset, the data is simply stored in the attribute 'df'. We heavily filter out the dataset for performance reasons, so the data will look like this:

In [2]:
vf.df.head(5)

pyStage,pyIssue,pyGroup,pyChannel,pyDirection,CustomerID,pyName,pyWorkID,pyModelPropensity,pyPropensity,FinalPropensity
str,str,str,str,str,str,str,str,f64,f64,f64
"""Applicability""","""Sales""","""DepositAccount...","""SMS""","""Outbound""","""Customer-1""","""StudentCheckin...","""Opp_NBA_AlDF_S...",0.269231,0.269231,0.278077
"""Applicability""","""Usage""","""Mobilebanking""","""SMS""","""Outbound""","""Customer-100""","""GetTheUMobileA...","""Opp_NBA_AlDF_S...",0.5,0.5,0.713095
"""Applicability""","""Collections""","""Recommendation...","""SMS""","""Outbound""","""Customer-1000""","""SetupAutopayTo...","""Opp_NBA_AlDF_S...",0.5,0.5,0.421306
"""Applicability""","""Sales""","""DepositAccount...","""SMS""","""Outbound""","""Customer-10000...","""StudentCheckin...","""Opp_NBA_AlDF_S...",0.269231,0.269231,0.244777
"""Applicability""","""Sales""","""Bundles""","""SMS""","""Outbound""","""Customer-1001""","""StudentChoice""","""Opp_NBA_AlDF_S...",0.15,0.15,0.2483


This is already enough information to generate the same piechart as shown in platform, but to replicate the same values, we would need to compute the propensity threshold. In this case, the quantile of `0.052` of the propensity distribution seems to represent the same counts as in platform. Whilst we see the final pie chart after arbitration which correlates, it is also possible to view the same pie chart after each engagement policy stage. Simply call the `plotPieCharts()` function on the data:

In [2]:
vf.plotPieCharts(0.052, verbose=False)

By hovering over the rightmost pie chart, you can see the numbers match up exactly to that shown in the value finder simulation. What's more, we don't just show the counts in the final arbitration stage, but also the counts of the eligibility, applicability, and the suitability stages. This view shows you the movement of customers from having at least one relevant action to only irrelevant actions and no actions over the application of the engagement policies. This will show you the most impactful stage of your policies.

Now, of course, if a customer is well served or not, depends heavily on what we consider to be well served. After the application of eligibility engagment policies we choose the relevant action setting. This is set at the 5th percentil of engagement policies. We can plot what that will look like as such, where the dotted line is that set threshold:

In [4]:
vf.plotPropensityThreshold()

These different propensities represent the raw propensities from the models (pyModelPropensity), the propensities which may be overridden by the random control group (pyPropensity) and the final propensity from a prediction (FinalPropensity). In a prediction, Thompson Sampling may have been applied, smoothing the final distribution.

We can also look at the propensity distributions across the different stages. Simply call `plotPropensityDistribution()`.

In [5]:
vf.plotPropensityDistribution()

So there is a spread out distribution of propensities - making the selection of the relevant action setting an important choice as this will influence whether customers are considered as having at least one relevant action or only irrelevant actions. 

While we can create this pie chart for one threshold, we can also do this for a range of them. To do this, simply supply three arguments to the `plotPieCharts()` function: `start`, `stop` and `step`. These correspond to a range of propensity *quantiles* for which we want to compute the counts. In the background, this will generate the aggregated counts per stage, which we can plot as such:

In [3]:
vf.plotPieCharts(start=0.01, stop=0.5, step=0.01, verbose=False)

Note the slider at the bottom: playing around with this, you can easily see how choosing a different threshold changes the view of a customer. This makes intuitive sense: if you consider an action 'good' from a lower propensity threshold, then more customers will be well served than if you consider an action 'good' from a higher propensity threshold.

While this is a nice 'slice' of the distribution at a given threshold, we can also show a bit more information. Call `plotPropensityDistributionPerThreshold()` to show this same distribution, but then with the threshold on the x-axis. By default, it considers the quantiles, but if you supply the `target` parameter to be `'Propensity'`, then it will update to that instead.

In [11]:
vf.plotDistributionPerThreshold(verbose=False)

All thresholds already computed.


Unnamed: 0_level_0,pyStage,At least one relevant action,Only irrelevant actions,Without actions
Quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0.11,Arbitration,5332,1177,3491
0.11,Suitability,5647,916,3437
0.11,Eligibility,6491,767,2742
0.11,Applicability,5913,890,3197
0.31,Applicability,4679,2124,3197
0.31,Arbitration,4317,2192,3491
0.31,Eligibility,5257,2001,2742
0.31,Suitability,4584,1979,3437
0.21,Arbitration,4528,1981,3491
0.21,Applicability,5149,1654,3197


In [2]:
vf.plotDistributionPerThreshold(target='Propensity', verbose=False)

100%|██████████| 10/10 [00:00<00:00, 282.33it/s]


One area to consider is how your action distribution changes through the stages. Simply call the `plotFunnelChart()` function for an overview of this funnel effect throughout each stage. As a rule of thumb, if there are only a few actions in each stage, this is not a good sign. If certain actions are completely filtered out from one stage to the next, it may also be a warning of agressive filtering. In this case, let’s also use the `‘query’` functionality to only look at actions in the `'Sales'` issue

In [9]:
vf.plotFunnelChart('Action', query=pl.col('pyIssue')=='Sales')

Of course this is quite a lot of information. If instead we want to look at the distribution of *issues* over each stage, simply supply the `level` parameters as `'Issue'`:

In [10]:
vf.plotFunnelChart('Issue')

Lastly, it may also be interesting to look at the distribution of groups over the different stages. Here, let's again filter on the `'Sales'` issue only.

In [11]:
vf.plotFunnelChart('Group', query=pl.col('pyIssue')=='Sales')