# 🪙 Counterfactuals

Counterfactuals are great for seeing what input we would need to get some desired output.  
In our case, it might be that we wanted to check what input we needed to make the song popular in Turkey.  
We will be using TrustyAI to test exactly this, and see how much we would need to change.

In [None]:
!pip -q install "onnx" "onnxruntime" "numpy==1.26.4"

In [2]:
import pickle
import pandas as pd
import numpy as np
import onnxruntime as rt

In [None]:
import warnings

# Ignore UserWarnings
warnings.filterwarnings("ignore", category=UserWarning)

Let's start by choosing a country we want the song to be popular in.  
We also pick what probability we need to see before we say that there's a good chance that our song will be popular in that country.  

In [None]:
PRED_COUNTRY = "TR"
POPULAR_THRESHOLD = 0.3

We then load our model, as well as our pre-and-post-processing artifacts.  

In [None]:
onnx_session = rt.InferenceSession("convert-keras-to-onnx/onnx_model.onnx", providers=rt.get_available_providers())
onnx_input_name = onnx_session.get_inputs()[0].name
onnx_output_name = onnx_session.get_outputs()[0].name

with open('preprocess-data/scaler.pkl', 'rb') as handle:
    scaler = pickle.load(handle)

with open('preprocess-data/label_encoder.pkl', 'rb') as handle:
    label_encoder = pickle.load(handle)

### Data

Then we pick a song we want to try to make popular in that country.  
We will also process the song properties a bit, such as scaling them, just like what we did when training the model. This is to make sure they have an input that the model understands. 

In [None]:
song_properties = pd.read_parquet('../99-data_prep/song_properties.parquet')
favorite_song = song_properties.loc[song_properties["name"]=="Not Like Us"]
favorite_song

In [None]:
song_properties = favorite_song[['is_explicit', 'duration_ms', 'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]
song_properties.T

In [None]:
scaled_feature = scaler.transform(song_properties)[0]
feature_values = {
    "is_explicit": scaled_feature[0],
    "duration_ms": scaled_feature[1],
    "danceability": scaled_feature[2],
    "energy": scaled_feature[3],
    "key": scaled_feature[4],
    "loudness": scaled_feature[5],
    "mode": scaled_feature[6],
    "speechiness": scaled_feature[7],
    "acousticness": scaled_feature[8],
    "instrumentalness": scaled_feature[9],
    "liveness": scaled_feature[10],
    "valence": scaled_feature[11],
    "tempo": scaled_feature[12]
}

feature_df = pd.DataFrame([feature_values])
feature_df.T

We also set what all the output names should be called, this will be the same as the country codes.

In [None]:
output_names = label_encoder.classes_
output_names

### Counterfactual analysis

Now that we have all of this set up, will set up our counterfactual analysis.  
Here we need to first create a predict function (if your model inputs and outputs pandas dataframe by default, this is not needed).  
Then we will create a TrustyAI "Model", this just wraps our model and will be used by TrustyAI to iterate on different input values.  
Finally, we will define TrustyAI "domains" for each of our inputs. This tells TrustyAI what values the input is allowed to be between.

In [None]:
def pred(x):
    x = x[0]
    x_dict = {name: np.asarray([[x[i]]]).astype(np.float32) for i, name in enumerate(feature_df.columns)}
    pred = onnx_session.run([onnx_output_name], x_dict)
    pred = np.squeeze(pred)
    pred = {output_names[i]: pred[i] for i in range(pred.shape[0])}
    print(f"Predicted probability is: {pred[PRED_COUNTRY]}")
    if pred[PRED_COUNTRY] >= POPULAR_THRESHOLD:
        pred = {PRED_COUNTRY: True}
    else:
        pred = {PRED_COUNTRY: False}
    return pd.DataFrame([pred])

In [None]:
pred(feature_df.to_numpy())

In [12]:
from trustyai.model import Model

model = Model(pred, output_names=[PRED_COUNTRY])

In [13]:
from trustyai.model.domain import feature_domain
_domains = {
        "is_explicit": (0.0, 1.0),
        "duration_ms": (0.0, 1.0),
        "danceability": (0.0, 1.0),
        "energy": (0.0, 1.0),
        "key": (0.0, 1.0),
        "loudness": (0.0, 1.0),
        "mode": (0.0, 1.0),
        "speechiness": (0.0, 1.0),
        "acousticness": (0.0, 1.0),
        "instrumentalness": (0.0, 1.0),
        "liveness": (0.0, 1.0),
        "valence": (0.0, 1.0),
        "tempo": (0.0, 1.0)
}
domains = {key: None for key  in feature_values.keys()}

for key in  _domains.keys():
        domains[key] = feature_domain(_domains[key])

domains = list(domains.values())

In [14]:
from trustyai.model import output
goal = [output(name=PRED_COUNTRY, dtype="bool", value=True)]

After we have the model, the domains, and the goal, we can start running through possible inputs to see which one can give us the output we want.  

In [None]:
from trustyai.explainers import CounterfactualExplainer

STEPS=50
explainer = CounterfactualExplainer(steps=STEPS)
explanation = explainer.explain(inputs=feature_df, goal=goal, model=model, feature_domains=domains)

In [None]:
model(explanation.proposed_features_dataframe.to_numpy())

Now that it has finished running, we can see how much we would need to change our original input (remember the song we chose at the start) for the song to become popular in our country.  

In [None]:
explanation.as_dataframe()

In [None]:
df = explanation.as_dataframe()
df[df.difference != 0.0]

In [None]:
if not df[df.difference != 0.0].empty:
    explanation.plot()
else:
    print(f"The country {PRED_COUNTRY} did not reach the probability {POPULAR_THRESHOLD} in {STEPS} steps")