# Conditioned Insights

## Overview

Howso Engine enables powerful prediction insights through a multitude of prediction statistics views. These prediction statistics encompass many of the commonly-used prediction performance metrics such as accuracy or mean squared error. Howso also includes tools to condition the statistics for detailed analyses between subgroups. An understanding of how your dataset's predictability changes under a variety of conditions provides out of the box insights on your data without the need to build multiple machine learning models.

For this recipe, we will use the `Adult` dataset and explore how the prediction performance differs by `sex`.

In [1]:
import os

import plotly.io as pio
from pmlb import fetch_data

from howso.engine import Trainee
from howso.utilities import infer_feature_attributes

pio.renderers.default = os.getenv("HOWSO_RECIPE_RENDERER", "notebook")

# Section 1: Load, Train, Analyze

The [basic workflow guide](https://docs.howso.com/user_guide/basics/basic_workflow.html) goes into more specifics about the individual details of this section. This recipe will focus more on the insights.

In [2]:
df = fetch_data('adult', local_cache_dir="../../data/adult")

# Subsample the data to ensure the example runs quickly
df = df.sample(2000)
df = df.replace({"sex": {0: "female", 1: "male"}})

features = infer_feature_attributes(df)

# Split out the last row for a prediction set and drop the Action Feature
test_case = df.iloc[[-1]].copy()

# Set the sex to be female
test_case.at[0, 'sex'] = 'female'

test_case = test_case.drop('target', axis=1)

In [3]:
t = Trainee(features=features)

action_features = ['target']
context_features = features.get_names(without=action_features)

t.train(df)

t.analyze(context_features=context_features, action_features=action_features)

# Prediction Stats

Howso provides a variety of prediction stats based on the variable type, whether it is continuous or nominal. Prediction stats for nominal variables include accuracy, precision, recall, and [Matthews Correlation Coefficient](https://en.wikipedia.org/wiki/Phi_coefficient), while for continuous variables they include mean absolute error (MAE), root mean squared error, r2, and spearman coefficient. MAE also appears for nominal values, however they are a form of weighted accuracy using [categorical action probabilities](https://docs.howso.com/getting_started/terminology.html#categorical-action-probabilities). These prediction stats give insight into the predictive performance of the Trainee as well as insights into the data itself. 

The [bias_mitigation.ipynb](https://github.com/howsoai/howso-engine-recipes/blob/main/recipes/2-Workflows/bias_mitigation/bias_mitigation.ipynb) recipe highlights looking for bias in the data through the analysis of individual features. This recipe demonstrates how insights like overall data bias can be explored.

### Global Prediction Stats


Global prediction stats provide an overall view of the Trainee's accuracy. This can be a great introductory look into the usability of the data and Trainee. Most machine learning models are evaluated on similiar global metrics. 

Global statistics show the aggregate performance of all of the cases in your data while local statisitics only show the aggregate performance of the cases in the region you specified. 

For more information on the Global statistics, see the [Global vs Local](https://docs.howso.com/user_guide/concepts/global_vs_local.html) documentation.

In [4]:
global_stats = t.react_aggregate(
    prediction_stats_action_feature = action_features[0],
    details = {"prediction_stats": True}
)

global_stats['target'].dropna()

accuracy     0.849000
mae          0.224365
recall       0.753986
precision    0.786435
mcc          0.539446
Name: target, dtype: float64

The various prediction stats for the action feature are shown above. For the rest of this recipe, we will focus on Accuracy, however all of the prediction stats will be avaiable during any of the following methods. 

We can see that Trainee's overall performance through this accuracy, however we might be interested in specific cases or regions in the data, which can be explored using local prediction stats. 

### Local Prediction Stats

In the cell above, we showed the global prediction stats, but now let's look at the performance for our individual test case. In addition to global prediction stats, Howso has the unique ability to provide local prediction stats. These prediction stats are more finely tuned to the exact case(s) in which you are interested. This can be extremely valuable for workflows including data exploration and it provides more nuanced performance metrics. The more variance there is in the performance of dataset from region to region, the more powerful this ability is.

In [5]:
results = t.react(
    test_case,
    context_features=context_features,
    action_features=action_features,
    details = {
        "prediction_stats": True
    }
)

local_stats = results['details']['prediction_stats']['target']
local_stats.dropna()

accuracy     0.733333
mae          0.348199
recall       0.527950
precision    0.555556
mcc          0.078811
Name: target, dtype: float64

We can see how the local prediction stats differ from the global stats. Using local metrics, we can drill into specific cases. For example, in our test case which is manually set to `female`, we can get insights into how well the Trainee fits similiar cases. In the local space, this does not gurantee all of the other cases in the local space are also female, however it increases the chances that they are.

If we see the local accuracy is significantly lower than the global accuracy, the Trainee does not have the same predictive power within this region of the data, compared to it's overall performance. This could indicate a need for new and/or different training data to improve the Trainee's performance in this region.

# Conditioned Prediction Stats

Zooming back out to global prediction stats, Howso also has the ability to condition the prediction stats by providing conditions for the context set and action set. Similar to context features and action featuers, the context set is the set queried to make predictions and the action set is the set that the predictions are made for. In other words, the data from the context set is the known information you are using to make a prediction and the action set is unknown information you are trying to predict.

This conditioning gives us the ability to segment the context and action sets to give us more detailed insights in the relationship as well as the Trainee's predictive performance between context sets.

### Action Condition Only

By conditioning on an action condition only, prediction stats for a certain segment are returned by holding out each case from the action set respectively and making predictions in a Leave One Out (LOO) fashion. Thus, using this method, the context set consists of every case except the actual case being predicted at the time. After each case from the action set is predicted, the case is returned to the context set, making it available to be queried for other predictions.

For example, if we specify that the action condition is all females, then when we get return the prediction stats, the prediction stats are only for females in the dataset. The prediction stats for each female case is calcluated, aggregated, and then returned. However, for each female case, any other case can be queried when making a prediction. This is exactly how a standard Howso prediction is calculated.

In [6]:
male_accuracy = t.react_aggregate(
    details = {
        "prediction_stats": True,
        "action_condition": {'sex': 'male'}
    }
)['target']
female_accuracy = t.react_aggregate(
    details = {
        "prediction_stats": True,
        "action_condition": {'sex': 'female'}
    }
)['target']

print(f"Female accuracy: {female_accuracy['accuracy']:.0%}")
print(f"Male accuracy: {male_accuracy['accuracy']:.0%}")

Female accuracy: 91%
Male accuracy: 80%


If we see a difference in the model performance between males and females, this may indicate that there may be further bias that needs to be investigated and/or this Trainee is not suitable for inference on both sexes.

If this Trainee and data were used to make decisions on loan acceptance based on income, these differences in performance may indicate that this Trainee and data may be suitable for predictions on males but not females. This can lead to actions like gathering better data that captures the characteristics of females better.

### Action Condition and Context Condition

Another way to see how various groups can differ is by specifying both action and context conditions. Unlike when only the action condition is specified, specifying both conditions will hold out `ALL` of the action set. After a case from the action set is used to make a prediction, unlike when just the action condition is specified, that case will continue to be held out. Thus, using this method, the cases from the context set and action set remain separate during the entire process.

In [7]:
male_conditioned_accuracy = t.react_aggregate(
    details = {
        "prediction_stats": True,
        "context_condition": {'sex': 'female'},
        "action_condition": {'sex': 'male'}
    }
)['target']['accuracy'].round(2)
female_conditioned_accuracy = t.react_aggregate(
    details = {
        "prediction_stats": True,
        "context_condition": {'sex': 'male'},
        "action_condition": {'sex': 'female'}
    }
)['target']['accuracy'].round(2)

print(f"Female accuracy: {female_conditioned_accuracy:.0%}")
print(f"Male accuracy: {male_conditioned_accuracy:.0%}")

Female accuracy: 87%
Male accuracy: 77%


This method may reveal extra insights that conditioning on the action set may miss. Since the entire action set is being held out, the prediction stats are based purely on the context set. For example, we may have a situation that when only cases that are males are used to predict cases that are females, the performance drops greatly from the global predicition stats. This can provide further indications about the difference in the underlying data and relationships among females vs males. A difference in the performance when the holdouts are reversed may indicate that the data overall suits one segment vs another. 