# Learning to Rank Expanations Demo 2022

## Overview

In this notebook, we will explore how to explain the scores of a Learning to Rank model using OmniXAI

#### Key Takeaways
- How to install and get started with ml4ir as a script
- Explaining the rank scores using OmniXAI

#### Learning to Rank
The goal of Learning to Rank(LTR) is to come up with a ranking function to generate an optimal ordering of a list of documents. In this notebook, we will learn a simple **pointwise ranking function** using a **listwise loss** which will predict the ranking scores for all records of a given query. These scores can then be used at inference to determine the optimal ordering.

#### Per Query Valid Explanations 

We explore the per-query Valid explanations using Omnixai's ValidityRankingExplainer

Reference for algorithm: Singh, J., Khosla, M., & Anand, A. (2020). Valid Explanations for Learning to Rank Models. ArXiv, abs/2004.13972.

## Install ml4ir and omnixai

In [18]:
!pip install ml4ir -q

You should consider upgrading via the '/Users/tlaud/ml4ir/python/venv/bin/python3.7 -m pip install --upgrade pip' command.[0m[33m
[0m

In [34]:
!pip install -q git+https://github.com/salesforce/OmniXAI@ranking_validity_explainer

You should consider upgrading via the '/Users/tlaud/ml4ir/python/venv/bin/python3.7 -m pip install --upgrade pip' command.[0m[33m
[0m

## Installing visualization libraries

In [35]:
!pip install --upgrade -q plotly nbformat

You should consider upgrading via the '/Users/tlaud/ml4ir/python/venv/bin/python3.7 -m pip install --upgrade pip' command.[0m[33m
[0m

## Look at the data

In [1]:
import pandas as pd

df_train = pd.read_csv("../ml4ir/applications/ranking/tests/data/csv/train/file_0.csv")
df_train.head(7)

Unnamed: 0,query_id,query_text,rank,text_match_score,page_views_score,quality_score,clicked,domain_id,domain_name,name_match
0,query_2,MHS7A7RJB1Y4BJT,2,0.47373,0.0,0.0,0,2,domain_2,1
1,query_2,MHS7A7RJB1Y4BJT,1,1.06319,0.205381,0.30103,1,2,domain_2,1
2,query_5,KNJNWV,6,1.368108,0.030636,0.0,0,0,domain_0,0
3,query_5,KNJNWV,3,1.370628,0.041261,0.30103,0,0,domain_0,0
4,query_5,KNJNWV,4,1.3667,0.082535,0.30103,0,0,domain_0,0
5,query_5,KNJNWV,1,1.333836,0.042572,0.30103,1,0,domain_0,0
6,query_5,KNJNWV,5,1.325021,0.046478,0.0,0,0,domain_0,1


## Define the FeatureConfig

**YAML File** -> configs/activate_2020/feature_config.yaml



| Feature          | Type    | TFRecord Type | Usage                                    |
| ---------------- | -------- | ------------- | ---------------------------------------- |
| query_text       | Text     | Context       | Character Embeddings -> biLSTM Encoding  |
| domain_name      | Text     | Context       | VocabLookup -> Categorical Embedding     |
| text_match_score | Numeric  | Sequence      | float                                    |
| page_views_score | Numeric  | Sequence      | float                                    |
| quality_score    | Numeric  | Sequence      | float                                    |

## Define the ModelConfig

In [2]:
print(open("configs/activate_2020/model_config.yaml").read())

architecture_key: dnn
layers:
  - type: dense
    name: first_dense
    units: 256
    activation: relu
  - type: dropout
    name: first_dropout
    rate: 0.3
  - type: dense
    name: second_dense
    units: 64
    activation: relu
  - type: dense
    name: final_dense
    units: 1
    activation: null



## Using ml4ir as a script

In [26]:
!python ../ml4ir/applications/ranking/pipeline.py \
--data_format csv \
--data_dir ../ml4ir/applications/ranking/tests/data/csv \
--feature_config configs/activate_2020/feature_config.yaml \
--model_config configs/activate_2020/model_config.yaml \
--execution_mode train_inference_evaluate \
--loss_key softmax_cross_entropy \
--num_epochs 3 \
--models_dir ../models/explain_demo_2022 \
--logs_dir ../logs/explain_demo_2022 \
--run_id activate_demo

INFO: 2022-08-19 13:04:47.748 
Logging initialized. Saving logs to : ../logs/explain_demo_2022/activate_demo
INFO: 2022-08-19 13:04:47.748 
Run ID: activate_demo
DEBUG: 2022-08-19 13:04:47.748 
CLI args: 
{
    "data_dir": "../ml4ir/applications/ranking/tests/data/csv",
    "data_format": "csv",
    "tfrecord_type": "sequence_example",
    "feature_config": "configs/activate_2020/feature_config.yaml",
    "model_file": null,
    "model_config": "configs/activate_2020/model_config.yaml",
    "loss_key": "softmax_cross_entropy",
    "metrics_keys": [
        "MRR",
        "ACR"
    ],
    "monitor_metric": "new_MRR",
    "monitor_mode": "max",
    "num_epochs": 3,
    "batch_size": 128,
    "compute_intermediate_stats": true,
    "execution_mode": "train_inference_evaluate",
    "random_state": 123,
    "run_id": "activate_demo",
    "run_group": "general",
    "run_notes": "",
    "models_dir": "../models/explain_demo_2022",
    "logs_dir": "../logs/explain_demo_2022",
    "checkpoint_

INFO: 2022-08-19 13:04:48.539 
1 files found under ../ml4ir/applications/ranking/tests/data/csv/tfrecord/train
2022-08-19 13:04:48.539799: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-08-19 13:04:48.556077: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe1c70ac8b0 executing computations on platform Host. Devices:
2022-08-19 13:04:48.556094: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
INFO: 2022-08-19 13:04:50.732 
Created TFRecordDataset from SequenceExample protobufs from 1 files : ['../ml4ir/applications/ranking/tests/data/csv/tfr
INFO: 2022-08-19 13:04:50.733 
1 files found under ../ml4ir/applications/ranking/tests/data/csv/validation
INFO: 2022-08-19 13:04:50.733 
Reading 1 files from [../ml4ir/applications/ranking/tests/data/csv/validation/file_0.csv, ..
INFO: 2022-08-19 13:04:50.733 
Loading dataframe

INFO: 2022-08-19 13:04:55.149 
Training Model
INFO: 2022-08-19 13:04:55.152 
Starting Epoch : 1
INFO: 2022-08-19 13:04:55.152 
{}
INFO: 2022-08-19 13:04:55.157 
lr=0.01
Epoch 1/3
2022-08-19 13:04:56.812904: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_standard_lstm_6398_6997' and '__inference___backward_standard_lstm_6398_6997_specialized_for_training_Adam_gradients_gradients_bidirectional_forward_lstm_StatefulPartitionedCall_grad_StatefulPartitionedCall_at___inference_keras_scratch_graph_9065' both implement 'lstm_22e9a62a-dad5-470e-8e5d-8b1cfde2c471' but their signatures do not match.
2022-08-19 13:04:57.319085: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
INFO: 2022-08-19 13:04:57.319 
[epoch: 1 | batch: 0] {'batch': 0, 'size': 128, 'loss': 1.257792, 'old_MRR': 0.8084635, 'new_MRR': 0.5754498, 'old_ACR'

2022-08-19 13:05:25.315810: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: MutableGraphView::MutableGraphView error: node 'cond/then/_2/concat' has self cycle fanin 'cond/then/_2/concat'.
2022-08-19 13:05:25.317670: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] arithmetic_optimizer failed: Invalid argument: The graph couldn't be sorted in topological order.
2022-08-19 13:05:25.319032: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2022-08-19 13:05:25.320207: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2022-08-19 13:05:25.326250: W tensorflow/core/common_runtime/process_function_library_runtime.cc:675] Ignoring multi-device function optimization failure: Invalid argument: The g

2022-08-19 13:05:27.786986: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: MutableGraphView::MutableGraphView error: node 'cond_4/then/_54/concat' has self cycle fanin 'cond_4/then/_54/concat'.
2022-08-19 13:05:27.789266: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] arithmetic_optimizer failed: Invalid argument: The graph couldn't be sorted in topological order.
2022-08-19 13:05:27.791024: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2022-08-19 13:05:27.792559: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2022-08-19 13:05:27.798265: W tensorflow/core/common_runtime/process_function_library_runtime.cc:675] Ignoring multi-device function optimization failure: Invalid argument:

## Now, the model is saved and ready for inference

In [39]:
MODEL_DIR = '../models/explain_demo_2022/activate_demo'

In [40]:
import logging
import tensorflow as tf
import os
from ml4ir.base.io.local_io import LocalIO
from ml4ir.base.io.file_io import FileIO
from ml4ir.base.features.feature_config import FeatureConfig, SequenceExampleFeatureConfig
from ml4ir.base.model.relevance_model import RelevanceModel

In [41]:
# Set up file I/O handler
file_io : FileIO = LocalIO()
    

# Set up logger
logger = logging.getLogger()

tf.get_logger().setLevel("INFO")
tf.autograph.set_verbosity(3)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

feature_config: SequenceExampleFeatureConfig = FeatureConfig.get_instance(
    tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
    feature_config_dict=file_io.read_yaml("configs/activate_2020/feature_config.yaml"),
    logger=logger)
print("Training features\n-----------------")
print("\n".join(feature_config.get_train_features(key="name")))

Training features
-----------------
text_match_score
page_views_score
quality_score
query_text
domain_name
text_match_score
page_views_score
quality_score
query_text
domain_name


### Sanity check

In [42]:
from ml4ir.base.config.keys import TFRecordTypeKey
from ml4ir.base.model.relevance_model import RelevanceModel

relevance_model = RelevanceModel(
    feature_config=feature_config,
    tfrecord_type=TFRecordTypeKey.EXAMPLE,
    model_file=os.path.join(MODEL_DIR, 'final/default/'),
    logger=logger,
    output_name="relevance_score",
    file_io=file_io
)

logger.info("Is Keras model? {}".format(isinstance(relevance_model.model, tf.keras.Model)))
logger.info("Is compiled? {}".format(relevance_model.is_compiled))

Retraining is not yet supported. Model is loaded with compile=False


In [7]:
from tensorflow.keras import models as kmodels
from tensorflow import data

model = kmodels.load_model(
    os.path.join(MODEL_DIR, 'final/tfrecord/'),
    compile=False)
infer_fn = model.signatures["serving_tfrecord"]

In [44]:
from ml4ir.base.data.tfrecord_helper import get_sequence_example_proto

def predict(features_df):
    features_df["query_text"] = features_df["query_text"].fillna("")
    features_df = (features_df.copy()
                              .rename(columns={
                                  feature["serving_info"]["name"]: feature["name"] for feature in
                                  feature_config.context_features + feature_config.sequence_features
                              }))
    #print(features_df)
    context_feature_names = [feature["name"] for feature in feature_config.context_features]
    protos = features_df.groupby(["query_id","query_text"]).apply(lambda g: get_sequence_example_proto(
            group=g,
            context_features=feature_config.context_features,
            sequence_features=feature_config.sequence_features,
        ))


 
    # Score the proto with the model
    ranking_scores = protos.apply(lambda se: infer_fn(
        tf.expand_dims(
            tf.constant(se.SerializeToString()),
            axis=-1))["ranking_score"].numpy()[0])
         # Check parity of scores
    predicted_scores = (ranking_scores.reset_index(name="ranking_score")
                        .set_index("query_id")
                        .squeeze())
    return predicted_scores["ranking_score"]

### Let's look at one of the queries

In [45]:
df_train[df_train["query_id"]=="query_5"]

Unnamed: 0,query_id,query_text,rank,text_match_score,page_views_score,quality_score,clicked,domain_id,domain_name,name_match
2,query_5,KNJNWV,6,1.368108,0.030636,0.0,0,0,domain_0,0
3,query_5,KNJNWV,3,1.370628,0.041261,0.30103,0,0,domain_0,0
4,query_5,KNJNWV,4,1.3667,0.082535,0.30103,0,0,domain_0,0
5,query_5,KNJNWV,1,1.333836,0.042572,0.30103,1,0,domain_0,0
6,query_5,KNJNWV,5,1.325021,0.046478,0.0,0,0,domain_0,1
7,query_5,KNJNWV,2,1.36272,0.042572,0.30103,0,0,domain_0,0


### And its corresponding model output scores

In [49]:
predict(df_train[df_train["query_id"]=="query_5"])

array([0.11998416, 0.19389412, 0.20375773, 0.17943792, 0.11195529,
       0.1909707 ], dtype=float32)

### Now, let's create a Tabular instance which is a standard way to process datasets in OmniXAI

In [51]:
from omnixai.data.tabular import Tabular
training_data = Tabular(
   df_train,
   target_column='clicked',
)
training_data.to_pd() #The tabular instance can always be converted back to pandas DataFrame

Unnamed: 0,query_id,query_text,rank,text_match_score,page_views_score,quality_score,clicked,domain_id,domain_name,name_match
0,query_2,MHS7A7RJB1Y4BJT,2,0.473730,0.000000,0.00000,0,2,domain_2,1
1,query_2,MHS7A7RJB1Y4BJT,1,1.063190,0.205381,0.30103,1,2,domain_2,1
2,query_5,KNJNWV,6,1.368108,0.030636,0.00000,0,0,domain_0,0
3,query_5,KNJNWV,3,1.370628,0.041261,0.30103,0,0,domain_0,0
4,query_5,KNJNWV,4,1.366700,0.082535,0.30103,0,0,domain_0,0
...,...,...,...,...,...,...,...,...,...,...
5671,query_1487,QCZ4XHLN,6,0.227694,0.000000,0.00000,0,2,domain_2,0
5672,query_1487,QCZ4XHLN,2,1.016954,0.000000,0.00000,0,2,domain_2,1
5673,query_1490,WYNFF89,2,0.474600,0.190735,0.00000,0,0,domain_0,0
5674,query_1490,WYNFF89,1,0.620355,0.143310,0.00000,1,0,domain_0,0


### Similarly for the query sample

In [48]:
sample_query = Tabular(
    df_train[df_train["query_id"]=="query_5"],
    target_column='clicked',
)
sample_query.to_pd()

Unnamed: 0,query_id,query_text,rank,text_match_score,page_views_score,quality_score,clicked,domain_id,domain_name,name_match
2,query_5,KNJNWV,6,1.368108,0.030636,0.0,0,0,domain_0,0
3,query_5,KNJNWV,3,1.370628,0.041261,0.30103,0,0,domain_0,0
4,query_5,KNJNWV,4,1.3667,0.082535,0.30103,0,0,domain_0,0
5,query_5,KNJNWV,1,1.333836,0.042572,0.30103,1,0,domain_0,0
6,query_5,KNJNWV,5,1.325021,0.046478,0.0,0,0,domain_0,1
7,query_5,KNJNWV,2,1.36272,0.042572,0.30103,0,0,domain_0,0


### Define the features that you wish to analyze. These are sequence features in our case

In [52]:
features = [f['name'] for f in feature_config.sequence_features if f['trainable']]
features

['text_match_score', 'page_views_score', 'quality_score']

## Initialize Explainer

In [55]:
from omnixai.explainers.ranking.agnostic.validity import ValidityRankingExplainer

ranking_explainer = ValidityRankingExplainer(training_data=training_data,
                                             features=features,
                                             predict_function=lambda x: predict(x.to_pd()))

## Get explanations in one call

In [56]:
explanation = ranking_explainer.explain(sample_query, # The tabular instance to be explained
                                        k=3 # The maximum number of features to consider as explanation
                                       )

### The resulting order of feature importance:

In [60]:
explanation.get_explanations()["top_features"].keys()

dict_keys(['quality_score', 'text_match_score', 'page_views_score'])

### We can determine the validity of our explanation

In [61]:
explanation.explanations['validity']['Tau']

KendalltauResult(correlation=0.9999999999999999, pvalue=0.002777777777777778)

### Kendall Tau of 0.99 indicates that the feature importances are a valid explanation for the ranking.  <br> We can also plot the features with importance grading:

In [33]:
fig = explanation.ipython_fig()
fig.update_layout(autosize=False, width=1800)