# Prediction Insights

## Overview

Howso Engine enables powerful predictions with complete attribution and detailed explanations to make learning from
and debugging your data and predictions as easy as possible. For more information on predictions with Howso Engine,
check out the [predictions user guide](https://docs.howso.com/user_guide/basics/predictions.html).

In [1]:
import pandas as pd
from pmlb import fetch_data

from howso.engine import Trainee
from howso.utilities import infer_feature_attributes

## Setup

The [basic workflow guide](https://docs.howso.com/user_guide/basics/basic_workflow.html) goes into more specifics about the individual details of this section. This recipe will focus more on the insights.

### Step 1: Load Data and Create Trainee

In [2]:
df = fetch_data("iris", local_cache_dir="data")
features = infer_feature_attributes(df)

t = Trainee(features=features)

The following parameters from configuration file will override the Amalgam parameters set in the code: {'trace'}


### Step 2: Train, Analyze, and React

In [3]:
t.train(df)
t.analyze()
t.react_into_features(similarity_conviction=True)

### Step 3: React to a Case

We supply a case using the `context_features` and `context_values` parameters. These are treated as the independent
variables when making the prediction. The `action_features` describe the dependent variables and are what are output
in addition to any requested details.

In [4]:
reaction = t.react(
    contexts=[3.0, 2.0, 3.0, 1.0],
    context_features=["sepal-length", "sepal-width", "petal-length", "petal-width"],
    action_features=["target"],
    details={
        "influential_cases": True,
        "similarity_conviction": True,
        "feature_contributions": True,
        "feature_residuals": True,
        "robust_influences": True,
        "robust_residuals": True,
        "local_case_feature_residual_convictions": True,
        "categorical_action_probabilities": True,
    }
)
influential_cases = pd.DataFrame(reaction["details"]["influential_cases"][0])
influential_cases = influential_cases[sorted(influential_cases.columns)]

## Insights

### Insight 1: Prediction

The predicted value(s) are accessed using the `action` key, while the explanations can be accessed using the
`details` key.

In [5]:
list(reaction["details"].keys())

['influential_cases',
 'feature_contributions',
 'action_features',
 'feature_residuals',
 'local_case_feature_residual_convictions',
 'similarity_conviction',
 'categorical_action_probabilities',
 'directional_feature_contributions']

In [6]:
reaction["action"]

Unnamed: 0,target
0,1


For categorical action features, the prediction can be further understood with the `categorical_action_probabilities`
detail.

In [7]:
reaction["details"]["categorical_action_probabilities"][0]

{'target': {'1': 0.8815330011979926, '0': 0.11846699880200744}}

### Insight 2: Attribution

Howso provides complete attribution for any and all predictions. To determine which cases were used to make a prediction, use the `influential_cases` detail.

In [8]:
influential_cases

Unnamed: 0,.influence_weight,.session,.session_training_index,petal-length,petal-width,sepal-length,sepal-width,target
0,0.143901,f2903e23-3a9e-4311-9d66-bd40dfca1113,126,3.3,1.0,4.9,2.4,1
1,0.14119,f2903e23-3a9e-4311-9d66-bd40dfca1113,83,3.5,1.0,5.0,2.0,1
2,0.139901,f2903e23-3a9e-4311-9d66-bd40dfca1113,99,3.3,1.0,5.0,2.3,1
3,0.129369,f2903e23-3a9e-4311-9d66-bd40dfca1113,127,3.0,1.1,5.1,2.5,1
4,0.118467,f2903e23-3a9e-4311-9d66-bd40dfca1113,57,1.3,0.3,4.5,2.3,0
5,0.109472,f2903e23-3a9e-4311-9d66-bd40dfca1113,20,3.9,1.4,5.2,2.7,1
6,0.109396,f2903e23-3a9e-4311-9d66-bd40dfca1113,140,3.7,1.0,5.5,2.4,1
7,0.108305,f2903e23-3a9e-4311-9d66-bd40dfca1113,5,3.8,1.1,5.5,2.4,1


The `influence_weight` is inversely correlated with distance and tells us how influential each of these cases was
during the prediction.

The `.session` and `.session_training_index` features can be used to derive additional insights from the influential
cases. For example, the `influential_cases` detail can be combined with `similarity_conviction` to highlight which
influential cases were anomalous.

In [9]:
inf_case_indices = influential_cases[[".session", ".session_training_index"]].values.tolist()

anom_df = t.get_cases(
    case_indices=inf_case_indices,
    features=["sepal-length", "sepal-width", "petal-length", "petal-width", "target", "similarity_conviction"]
)

anom_df.sort_values(by="similarity_conviction")

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,target,similarity_conviction
4,4.5,2.3,1.3,0.3,0,0.243127
1,5.0,2.0,3.5,1.0,1,0.572244
3,5.1,2.5,3.0,1.1,1,0.651609
5,5.2,2.7,3.9,1.4,1,0.715024
0,4.9,2.4,3.3,1.0,1,0.875392
2,5.0,2.3,3.3,1.0,1,0.938972
6,5.5,2.4,3.7,1.0,1,1.123037
7,5.5,2.4,3.8,1.1,1,1.326493


### Insight 3: Explanation

Howso also provides detailed explanations about each prediction. Commonly used explanations are:

- Feature Contributions for measuring feature importance
- Residuals for measuring the uncertainty of the local data
- Conviction for measurements that are relative to the expected value

Feature contributions will help determine how much each context feature contributed to the prediction, on the scale
of the action feature.

In [10]:
reaction["details"]["feature_contributions"][0]

{'petal-width': 0.28871181572340304,
 'petal-length': 0.2931029454393371,
 'sepal-length': 0.04494568151237635,
 'sepal-width': 0.15656876988315238}

Residuals can characterize the uncertainty of the data around the prediction.

In [11]:
reaction["details"]["feature_residuals"][0]

{'petal-width': 0.25238915042846627,
 'petal-length': 0.6211629574817258,
 'target': 0.12239583333333334,
 'sepal-length': 0.5305952186638815,
 'sepal-width': 0.3249440776128855}

Conviction refers to a number of statistics that are defined as:

$$
\text{Conviction} = \frac{\text{Measured Surprisal}}{\text{Expected Surprisal}}
$$

Since Howso provides numerous privacy measures, there are numerous types of conviction as well, such as similarity
conviction, used above, and residual conviction. Conviction is useful for characterizing data in a scale-invariant
manner.

Similarity conviction, similar to before, can help determine how anomalous a case is.

In [12]:
reaction["details"]["similarity_conviction"][0]

0.21021952799848467

Since the similarity conviction of the case we reacted to is close to 1, we can infer that it is not very anomalous (or, about as anomalous as expected). If we react to a more anomalous case we can observe a lower similarity
conviction.

In [13]:
t.react(
    contexts=[11.0, 0.01, 3.4890217, 0.2],
    context_features=["sepal-length", "sepal-width", "petal-length", "petal-width"],
    action_features=["target"],
    details={
        "similarity_conviction": True,
    }
)["details"]["similarity_conviction"][0]


0.07295393004744691

We can also inspect the residual conviction of the features within the local data, which is an alternative way of 
characterizing uncertainty around the prediction which is scale invariant.

In [14]:
reaction["details"]["local_case_feature_residual_convictions"][0]

{'petal-width': 1.165407938554282,
 'petal-length': 0.5050790283384394,
 'target': 0.38584119647954107,
 'sepal-length': 0.20595329130908402,
 'sepal-width': 0.3767108186938781}