# Learning to Rank Demo | Activate 2020

## Overview

In this notebook, we will train a Learning to Rank model from user click data using ml4ir.

#### Key Takeaways
- How to install and get started with ml4ir as a script and a library
- Defining a ranking pipeline from scratch
- Transfer learning for ml4ir models
- Serving models trained on ml4ir

#### Learning to Rank
The goal of Learning to Rank(LTR) is to come up with a ranking function to generate an optimal ordering of a list of documents. In this notebook, we will learn a simple **pointwise ranking function** using a **listwise loss** which will predict the ranking scores for all records of a given query. These scores can then be used at inference to determine the optimal ordering.

## Contents

1. [Install ml4ir](#Install-ml4ir)
1. [Look at the Data](#Look-at-the-Data)
1. [Define the FeatureConfig](#Define-the-FeatureConfig)
1. [Define the ModelConfig](#Define-the-ModelConfig)
1. [Using ml4ir as a script](#Using-ml4ir-as-a-script)
1. [Using ml4ir as a library](#Using-ml4ir-as-a-library)
    1. [Setup](#Setup)
    1. [Load the FeatureConfig](#Load-the-FeatureConfig)
    1. [Create a RelevanceDataset](#Create-a-RelevanceDataset)
    1. [Define an InteractionModel](#Define-an-InteractionModel)
    1. [Define losses, metrics and optimizer](#Define-losses,-metrics-and-optimizer)
    1. [Define the scoring function, or the Scorer](#Define-a-scoring-function,-or-the-Scorer)
    1. [Combine it all to create a RankingModel](#Combine-it-all-to-create-a-RankingModel)
    1. [Train and Evaluate your RankingModel](#Train-and-Evaluate-your-RankingModel)
    1. [Save the trained RankingModel](#Save-the-trained-RankingModel)
1. [Let's try some Transfer Learning](#Let's-try-some-Transfer-Learning)
    1. [What does ml4ir save?](#What-does-ml4ir-save?)
    1. [Using pre-trained character embeddings](#Using-pre-trained-character-embeddings)
1. [Model Serving](#Model-Serving)
    1. [JVM Serving Logic](#JVM-Serving-Logic)
    1. [Serving your Ranking Model](#Serving-your-Ranking-Model)

## Install ml4ir

In [1]:
!pip install ml4ir

Looking in indexes: https://pypi.python.org/simple








Collecting pyarrow<0.15.0,>=0.14.0
  Using cached pyarrow-0.14.1-cp37-cp37m-macosx_10_6_intel.whl (34.4 MB)
[31mERROR: tfx-bsl 0.15.3 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.[0m
[31mERROR: tfx-bsl 0.15.3 has requirement apache-beam[gcp]<2.17,>=2.16, but you'll have apache-beam 2.23.0 which is incompatible.[0m
[31mERROR: tensorflow-transform 0.15.0 has requirement absl-py<0.9,>=0.7, but you'll have absl-py 0.9.0 which is incompatible.[0m
[31mERROR: apache-beam 2.23.0 has requirement dill<0.3.2,>=0.3.1.1, but you'll have dill 0.3.0 which is incompatible.[0m
[31mERROR: apache-beam 2.23.0 has requirement pyarrow<0.18.0,>=0.15.1; python_version >= "3.0" or platform_system != "Windows", but you'll have pyarrow 0.14.1 which is incompatible.[0m
Installing collected packages: pyarrow
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 0.17.1


    Uninstalling pyarrow-0.17.1:
      Successfully uninstalled pyarrow-0.17.1
Successfully installed pyarrow-0.14.1
You should consider upgrading via the '/Users/ashish.srinivasa/search_relevance/ml4ir/python/env/.ranking_venv3/bin/python3 -m pip install --upgrade pip' command.[0m


## Look at the data

In [2]:
import pandas as pd

df_train = pd.read_csv("../ml4ir/applications/ranking/tests/data/csv/train/file_0.csv")
df_train.head(7)

Unnamed: 0,query_id,query_text,rank,text_match_score,page_views_score,quality_score,clicked,domain_id,domain_name,name_match
0,query_2,MHS7A7RJB1Y4BJT,2,0.47373,0.0,0.0,0,2,domain_2,1
1,query_2,MHS7A7RJB1Y4BJT,1,1.06319,0.205381,0.30103,1,2,domain_2,1
2,query_5,KNJNWV,6,1.368108,0.030636,0.0,0,0,domain_0,0
3,query_5,KNJNWV,3,1.370628,0.041261,0.30103,0,0,domain_0,0
4,query_5,KNJNWV,4,1.3667,0.082535,0.30103,0,0,domain_0,0
5,query_5,KNJNWV,1,1.333836,0.042572,0.30103,1,0,domain_0,0
6,query_5,KNJNWV,5,1.325021,0.046478,0.0,0,0,domain_0,1


## Define the FeatureConfig

**YAML File** -> configs/activate_2020/feature_config.yaml



| Feature          | Type    | TFRecord Type | Usage                                    |
| ---------------- | -------- | ------------- | ---------------------------------------- |
| query_text       | Text     | Context       | Character Embeddings -> biLSTM Encoding  |
| domain_name      | Text     | Context       | VocabLookup -> Categorical Embedding     |
| text_match_score | Numeric  | Sequence      | float                                    |
| page_views_score | Numeric  | Sequence      | float                                    |
| quality_score    | Numeric  | Sequence      | float                                    |

## Define the ModelConfig

In [3]:
print(open("configs/activate_2020/model_config.yaml").read())

architecture_key: dnn
layers:
  - type: dense
    name: first_dense
    units: 256
    activation: relu
  - type: dropout
    name: first_dropout
    rate: 0.3
  - type: dense
    name: second_dense
    units: 64
    activation: relu
  - type: dense
    name: final_dense
    units: 1
    activation: null



## Using ml4ir as a script

In [4]:
!python ../ml4ir/applications/ranking/pipeline.py \
--data_format csv \
--data_dir ../ml4ir/applications/ranking/tests/data/csv \
--feature_config configs/activate_2020/feature_config.yaml \
--model_config configs/activate_2020/model_config.yaml \
--execution_mode train_inference_evaluate \
--loss_key rank_one_listnet \
--num_epochs 3 \
--models_dir ../models/activate_2020 \
--logs_dir ../logs/activate_2020 \
--run_id activate_demo

[32mINFO: 2020-09-23 10:20:09.509 
Logging initialized. Saving logs to : ../logs/activate_2020/activate_demo[0m
[32mINFO: 2020-09-23 10:20:09.510 
Run ID: activate_demo[0m
[37mDEBUG: 2020-09-23 10:20:09.512 
CLI args: 
{
    "data_dir": "../ml4ir/applications/ranking/tests/data/csv",
    "data_format": "csv",
    "tfrecord_type": "sequence_example",
    "feature_config": "configs/activate_2020/feature_config.yaml",
    "model_file": "",
    "model_config": "configs/activate_2020/model_config.yaml",
    "optimizer_key": "adam",
    "loss_key": "rank_one_listnet",
    "metrics_keys": "['MRR', 'ACR']",
    "monitor_metric": "new_MRR",
    "monitor_mode": "max",
    "num_epochs": 3,
    "batch_size": 128,
    "learning_rate": 0.01,
    "learning_rate_decay": 1.0,
    "learning_rate_decay_steps": 10000000,
    "compute_intermediate_stats": true,
    "execution_mode": "train_inference_evaluate",
    "random_state": 123,
    "run_id": "activate_demo",
    "run_group": "general",
    "run

[32mINFO: 2020-09-23 10:20:09.680 
Writing SequenceExample protobufs to : ../ml4ir/applications/ranking/tests/data/csv/tfrecord/train/file_0.tfrecord[0m
[32mINFO: 2020-09-23 10:20:12.599 
1 files found under ../ml4ir/applications/ranking/tests/data/csv/tfrecord/train[0m
2020-09-23 10:20:12.602499: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-09-23 10:20:12.628216: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fcc713d5000 executing computations on platform Host. Devices:
2020-09-23 10:20:12.628247: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
[32mINFO: 2020-09-23 10:20:18.191 
Created TFRecordDataset from SequenceExample protobufs from 1 files : ['../ml4ir/applications/ranking/tests/data/csv/tfr[0m
[32mINFO: 2020-09-23 10:20:18.192 
1 files found under ../ml4ir/applications/ranking/tests/data/csv/valid

[32mINFO: 2020-09-23 10:20:34.483 
Training Model[0m
[32mINFO: 2020-09-23 10:20:34.509 
Starting Epoch : 1[0m
[32mINFO: 2020-09-23 10:20:34.510 
{}[0m
Epoch 1/3
2020-09-23 10:20:44.812815: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_standard_lstm_8175_8776' and '__inference___backward_standard_lstm_8175_8776_specialized_for_training_Adam_gradients_gradients_bidirectional_backward_lstm_StatefulPartitionedCall_grad_StatefulPartitionedCall_at___inference_keras_scratch_graph_10322' both implement 'lstm_82c56f19-fc4f-401f-9126-8c5765b87eeb' but their signatures do not match.
2020-09-23 10:20:46.846810: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
[32mINFO: 2020-09-23 10:20:46.846 
[epoch: 1 | batch: 0] {'batch': 0, 'size': 128, 'loss': 2.0756447, 'old_MRR': 0.8084635, 'new_MRR': 0.5641961, 'old_ACR': 1.

2020-09-23 10:22:16.017781: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: MutableGraphView::MutableGraphView error: node 'cond/then/_2/concat' has self cycle fanin 'cond/then/_2/concat'.
2020-09-23 10:22:16.024924: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] arithmetic_optimizer failed: Invalid argument: The graph couldn't be sorted in topological order.
2020-09-23 10:22:16.029617: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2020-09-23 10:22:16.031998: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2020-09-23 10:22:16.068047: W tensorflow/core/common_runtime/process_function_library_runtime.cc:675] Ignoring multi-device function optimization failure: Invalid argument: The g

2020-09-23 10:22:23.476485: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: MutableGraphView::MutableGraphView error: node 'cond/then/_2/concat' has self cycle fanin 'cond/then/_2/concat'.
2020-09-23 10:22:23.483091: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] arithmetic_optimizer failed: Invalid argument: The graph couldn't be sorted in topological order.
2020-09-23 10:22:23.486109: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2020-09-23 10:22:23.490276: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2020-09-23 10:22:23.508571: W tensorflow/core/common_runtime/process_function_library_runtime.cc:675] Ignoring multi-device function optimization failure: Invalid argument: The g

## Using ml4ir as a library

<img src="images/model_framework.png" alt="ml4ir Architecture" style="width: 500px;" align="center"/>

### Setup

In [5]:
MODEL_CONFIG_PATH = "configs/activate_2020/model_config.yaml"
FEATURE_CONFIG_PATH = "configs/activate_2020/feature_config.yaml"

DATA_DIR = "../ml4ir/applications/ranking/tests/data/csv"
MODELS_DIR = '../models/activate_2020'
LOGS_DIR = '../logs/activate_2020'

MAX_SEQUENCE_SIZE = 25

In [6]:
import logging
import tensorflow as tf
import os

from tensorflow.keras.metrics import Metric
from tensorflow.keras.optimizers import Optimizer
from typing import List, Union, Type

from ml4ir.base.io.local_io import LocalIO
from ml4ir.base.io.file_io import FileIO
from ml4ir.base.config.keys import *
from ml4ir.base.data.relevance_dataset import RelevanceDataset
from ml4ir.base.features.feature_config import FeatureConfig, SequenceExampleFeatureConfig
from ml4ir.base.model.scoring.scoring_model import RelevanceScorer
from ml4ir.base.model.relevance_model import RelevanceModel
from ml4ir.base.model.scoring.interaction_model import InteractionModel, UnivariateInteractionModel
from ml4ir.base.model.losses.loss_base import RelevanceLossBase
from ml4ir.base.model.optimizer import get_optimizer
from ml4ir.applications.ranking.model.ranking_model import RankingModel
from ml4ir.applications.ranking.config.keys import LossKey, MetricKey, ScoringTypeKey
from ml4ir.applications.ranking.model.losses import loss_factory
from ml4ir.applications.ranking.model.metrics import metric_factory

In [7]:
# Set up file I/O handler
file_io : FileIO = LocalIO()
    
# Create directories for models and logs
file_io.make_directory(LOGS_DIR, clear_dir=True)
file_io.make_directory(MODELS_DIR, clear_dir=True)

# Set up logger
logger = logging.getLogger()

tf.get_logger().setLevel("ERROR")
tf.autograph.set_verbosity(3)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

### Load the FeatureConfig

In [8]:
feature_config: SequenceExampleFeatureConfig = FeatureConfig.get_instance(
    tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
    feature_config_dict=file_io.read_yaml(FEATURE_CONFIG_PATH),
    logger=logger)
print("Training features\n-----------------")
print("\n".join(feature_config.get_train_features(key="name")))

Training features
-----------------
text_match_score
page_views_score
quality_score
query_text
domain_name


### Create a RelevanceDataset

In [9]:
ranking_dataset = RelevanceDataset(data_dir=DATA_DIR,
                            data_format=DataFormatKey.CSV,
                            feature_config=feature_config,
                            tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
                            max_sequence_size=MAX_SEQUENCE_SIZE,
                            batch_size=128,
                            preprocessing_keys_to_fns={},
                            file_io=file_io,
                            logger=logger)

### Define an InteractionModel

In [10]:
interaction_model: InteractionModel = UnivariateInteractionModel(
                                            feature_config=feature_config,
                                            tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
                                            max_sequence_size=MAX_SEQUENCE_SIZE,
                                            feature_layer_keys_to_fns={},
                                            file_io=file_io,
                                        )

### Define losses, metrics and optimizer
##### Use predefined losses, metrics and optimizers or create your own!

In [11]:
# Define loss object from loss key
loss: RelevanceLossBase = loss_factory.get_loss(
                                loss_key=LossKey.RANK_ONE_LISTNET,
                                scoring_type=ScoringTypeKey.POINTWISE)
    
# Define metrics objects from metrics keys
metric_keys = [MetricKey.MRR, MetricKey.ACR]
metrics: List[Union[Type[Metric], str]] = [metric_factory.get_metric(metric_key=m) for m in metric_keys]
    
# Define optimizer
optimizer: Optimizer = get_optimizer(
                            optimizer_key=OptimizerKey.ADAM,
                            learning_rate=0.001
                        )

### Define a scoring function, or the Scorer

In [12]:
scorer: RelevanceScorer = RelevanceScorer.from_model_config_file(
    model_config_file=MODEL_CONFIG_PATH,
    interaction_model=interaction_model,
    loss=loss,
    logger=logger,
    file_io=file_io,
)

### Combine it all to create a RankingModel

In [13]:
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.debug("Logger is initialized...")

ranking_model: RelevanceModel = RankingModel(
                                    feature_config=feature_config,
                                    tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
                                    scorer=scorer,
                                    metrics=metrics,
                                    optimizer=optimizer,
                                    file_io=file_io,
                                    logger=logger,
                                )

DEBUG:root:Logger is initialized...
INFO:root:Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
query_text (InputLayer)         [(None, 1)]          0                                            
__________________________________________________________________________________________________
mask (InputLayer)               [(None, None)]       0                                            
__________________________________________________________________________________________________
tf_op_layer_DecodePaddedRaw (Te [(None, 1, 20)]      0           query_text[0][0]                 
__________________________________________________________________________________________________
domain_name (InputLayer)        [(None, 1)]          0                                            
________________________________________________

### Train and Evaluate your RankingModel

In [14]:
ranking_model.fit(dataset=ranking_dataset,
                  num_epochs=3, 
                  models_dir=MODELS_DIR,
                  logs_dir=LOGS_DIR,
                  monitor_metric="new_MRR",
                  monitor_mode="max")

INFO:root:Training Model
INFO:root:Starting Epoch : 1
INFO:root:{}


Epoch 1/3


INFO:root:[epoch: 1 | batch: 0] {'batch': 0, 'size': 128, 'loss': 2.0689433, 'old_MRR': 0.8084635, 'new_MRR': 0.5875, 'old_ACR': 1.5859375, 'new_ACR': 2.3828125}


     11/Unknown - 12s 1s/step - loss: 1.9668 - old_MRR: 0.7875 - new_MRR: 0.6597 - old_ACR: 1.6491 - new_ACR: 2.1058

INFO:root:Evaluating Model
INFO:root:[batch: 0] {'batch': 0, 'size': 128, 'loss': 1.9151875, 'old_MRR': 0.79674476, 'new_MRR': 0.70351565, 'old_ACR': 1.546875, 'new_ACR': 1.9765625}
INFO:root:Completed evaluating model
INFO:root:None



Epoch 00001: val_new_MRR improved from -inf to 0.70728, saving model to ../models/activate_2020/checkpoint.tf


INFO:root:End of Epoch 1
INFO:root:{'loss': 1.9667874466289172, 'old_MRR': 0.7874729, 'new_MRR': 0.65965444, 'old_ACR': 1.6491477, 'new_ACR': 2.1058238, 'val_loss': 1.9575951316139915, 'val_old_MRR': 0.7827933, 'val_new_MRR': 0.7072782, 'val_old_ACR': 1.6491477, 'val_new_ACR': 1.9637784}




INFO:root:Starting Epoch : 2
INFO:root:{}


Epoch 2/3


INFO:root:[epoch: 2 | batch: 0] {'batch': 0, 'size': 128, 'loss': 1.9908526, 'old_MRR': 0.8084635, 'new_MRR': 0.6712984, 'old_ACR': 1.5859375, 'new_ACR': 2.0625}




INFO:root:Evaluating Model
INFO:root:[batch: 0] {'batch': 0, 'size': 128, 'loss': 1.8649473, 'old_MRR': 0.79674476, 'new_MRR': 0.70351565, 'old_ACR': 1.546875, 'new_ACR': 1.9765625}
INFO:root:Completed evaluating model
INFO:root:None
INFO:root:End of Epoch 2
INFO:root:{'loss': 1.8891352089968594, 'old_MRR': 0.7874729, 'new_MRR': 0.6892583, 'old_ACR': 1.6491477, 'new_ACR': 1.9985795, 'val_loss': 1.8839583180167458, 'val_old_MRR': 0.7827933, 'val_new_MRR': 0.70715433, 'val_old_ACR': 1.6491477, 'val_new_ACR': 1.9630681}



Epoch 00002: val_new_MRR did not improve from 0.70728


INFO:root:Starting Epoch : 3
INFO:root:{}


Epoch 3/3


INFO:root:[epoch: 3 | batch: 0] {'batch': 0, 'size': 128, 'loss': 1.9148431, 'old_MRR': 0.8084635, 'new_MRR': 0.69986975, 'old_ACR': 1.5859375, 'new_ACR': 2.015625}




INFO:root:Evaluating Model
INFO:root:[batch: 0] {'batch': 0, 'size': 128, 'loss': 1.836719, 'old_MRR': 0.79674476, 'new_MRR': 0.7074219, 'old_ACR': 1.546875, 'new_ACR': 1.96875}
INFO:root:Completed evaluating model
INFO:root:None



Epoch 00003: val_new_MRR improved from 0.70728 to 0.70976, saving model to ../models/activate_2020/checkpoint.tf


INFO:root:End of Epoch 3
INFO:root:{'loss': 1.8340462229468606, 'old_MRR': 0.7874729, 'new_MRR': 0.6983645, 'old_ACR': 1.6491477, 'new_ACR': 2.0014205, 'val_loss': 1.8384090987118809, 'val_old_MRR': 0.7827933, 'val_new_MRR': 0.70975864, 'val_old_ACR': 1.6491477, 'val_new_ACR': 1.9545455}




INFO:root:Completed training model
INFO:root:None


{'train_loss': 1.9667874466289172,
 'train_old_MRR': 0.7874729,
 'train_new_MRR': 0.65965444,
 'train_old_ACR': 1.6491477,
 'train_new_ACR': 2.1058238,
 'val_loss': 1.9575951316139915,
 'val_old_MRR': 0.7827933,
 'val_new_MRR': 0.7072782,
 'val_old_ACR': 1.6491477,
 'val_new_ACR': 1.9637784}

In [15]:
ranking_model.predict(test_dataset=ranking_dataset.test).sample(10)

Unnamed: 0,query_id,clicked,name_match,query_text,domain_name,rank,score,new_rank
270,b'query_141',1,0.0,b'h8w674',b'domain_1',3,0.173588,4
259,b'query_170',0,0.0,b'l8klni',b'domain_0',5,0.165192,2
111,b'query_1257',0,1.0,b'3xu2meegyf3o0',b'domain_2',1,0.426611,2
155,b'query_496',0,1.0,b'2le8fu2g4',b'domain_1',2,0.302333,3
143,b'query_1268',1,1.0,b'zxu7dm',b'domain_3',2,0.672312,1
153,b'query_495',0,1.0,b'v51c859u9v3g1',b'domain_0',1,0.809847,1
457,b'query_443',0,1.0,b'ypr6v2',b'domain_3',2,0.29977,2
399,b'query_204',0,0.0,b'q4xkfkau',b'domain_4',3,0.289666,3
174,b'query_1387',0,1.0,b'8yy9gfscz8uwm',b'domain_2',2,0.172914,1
365,b'query_548',0,1.0,b'5az9sk',b'domain_3',3,0.046309,5


### Save the trained RankingModel

In [16]:
ranking_model.save(models_dir=MODELS_DIR,
                   preprocessing_keys_to_fns={},
                   required_fields_only=True)

INFO:root:Final model saved to : ../models/activate_2020/final


--------

## Let's try some Transfer Learning

### What does ml4ir save?

<img src="images/ml4ir_savedmodel.png" alt="ml4ir Architecture" style="width: 350px;" align="left"/>

### Using pre-trained character embeddings

In [19]:
initialize_layers_dict = {
    "query_text_bytes_embedding" : "models/activate_demo/bytes_embedding.npz"
}
freeze_layers_list = ["query_text_bytes_embedding"]
ranking_model: RelevanceModel = RankingModel(
                                    feature_config=feature_config,
                                    tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
                                    scorer=scorer,
                                    metrics=metrics,
                                    optimizer=optimizer,
                                    initialize_layers_dict=initialize_layers_dict,
                                    freeze_layers_list=freeze_layers_list,
                                    file_io=file_io,
                                    logger=logger,
                                )

INFO:root:Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
query_text (InputLayer)         [(None, 1)]          0                                            
__________________________________________________________________________________________________
mask (InputLayer)               [(None, None)]       0                                            
__________________________________________________________________________________________________
tf_op_layer_DecodePaddedRaw_3 ( [(None, 1, 20)]      0           query_text[0][0]                 
__________________________________________________________________________________________________
domain_name (InputLayer)        [(None, 1)]          0                                            
__________________________________________________________________________________

INFO:root:Setting query_text_bytes_embedding weights from models/activate_demo/bytes_embedding.npz
INFO:root:Freezing query_text_bytes_embedding layer


In [23]:
import numpy as np

ranking_model_embeddings = ranking_model.model.get_layer("query_text_bytes_embedding").get_weights()
query_classification_embeddings_files = np.load("models/activate_demo/bytes_embedding.npz")
query_classification_embeddings = [query_classification_embeddings_files[f] for f in query_classification_embeddings_files.files]

assert tf.reduce_any(tf.equal(ranking_model_embeddings, query_classification_embeddings))

print(ranking_model_embeddings)

[array([[-0.03738469,  0.00727513, -0.02006867, ...,  0.01078511,
        -0.028496  , -0.04102874],
       [ 0.00512887,  0.0062821 , -0.0010671 , ...,  0.04945388,
        -0.0132054 , -0.01177131],
       [-0.00937623, -0.03438247,  0.00176773, ...,  0.046078  ,
        -0.0310035 , -0.04288797],
       ...,
       [-0.04410168,  0.0383402 ,  0.03348425, ...,  0.02123589,
         0.02240864, -0.04049417],
       [ 0.03601265,  0.04585798,  0.00272902, ..., -0.00353998,
        -0.04783431,  0.02852656],
       [ 0.04903785, -0.03518286,  0.00195389, ...,  0.03783921,
        -0.01398294,  0.0107099 ]], dtype=float32)]


---------

## Model Serving

### JVM Serving Logic [Scala]


```
def runQueriesAgainstDocs(
        csvDataPath: String,
        modelPath: String,
        featureConfigPath: String,
        inputTFNode: String,
        scoresTFNode: String): Iterable[(StringMapQueryContextAndDocs, SequenceExample, Array[Float])] = {
  
  val featureConfig = ModelFeaturesConfig.load(featureConfigPath)
  val sequenceExampleBuilder = StringMapSequenceExampleBuilder.withFeatureProcessors(featureConfig)
  val rankingModelConfig = ModelExecutorConfig(inputTFNode, scoresTFNode)
  val rankingModel = new SavedModelBundleExecutor(modelPath, rankingModelConfig)

  val queryContextsAndDocs = StringMapCSVLoader.loadDataFromCSV(csvDataPath, featureConfig)

  queryContextsAndDocs.map {
    case q @ StringMapQueryContextAndDocs(queryContext, docs) =>
      val sequenceExample = sequenceExampleBuilder.build(queryContext, docs)
      (q, sequenceExample, rankingModel(sequenceExample))
  }
}

val allScores: Iterable[
  (StringMapQueryContextAndDocs, SequenceExample, Array[Float])] = runQueriesAgainstDocs(
    pathFor("test_data.csv"),
    pathFor("ranking_model_bundle"),
    pathFor("feature_config.yaml"),
    "serving_tfrecord_protos",
    "ranking_score"
  )
```

### Serving your RankingModel

In [20]:
!cd ../../jvm; mvn test -Dtest=ml4ir.inference.tensorflow.TensorFlowInferenceTest

[[1;34mINFO[m] Scanning for projects...
[[1;34mINFO[m] [1m------------------------------------------------------------------------[m
[[1;34mINFO[m] [1mReactor Build Order:[m
[[1;34mINFO[m] 
[[1;34mINFO[m] ml4ir-parent                                                       [pom]
[[1;34mINFO[m] ml4ir-inference                                                    [jar]
[[1;34mINFO[m] 
[[1;34mINFO[m] [1m-------------------------< [0;36mml4ir:ml4ir-parent[0;1m >-------------------------[m
[[1;34mINFO[m] [1mBuilding ml4ir-parent 0.0.2-SNAPSHOT                               [1/2][m
[[1;34mINFO[m] [1m--------------------------------[ pom ]---------------------------------[m
[[1;34mINFO[m] 
[[1;34mINFO[m] [1m-----------------------< [0;36mml4ir:ml4ir-inference[0;1m >------------------------[m
[[1;34mINFO[m] [1mBuilding ml4ir-inference 0.0.2-SNAPSHOT                            [2/2][m
[[1;34mINFO[m] [1m--------------------------------[ jar ]-----------


Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[[1;34mINFO[m] [1m------------------------------------------------------------------------[m
[[1;34mINFO[m] [1mReactor Summary for ml4ir-parent 0.0.2-SNAPSHOT:[m
[[1;34mINFO[m] 
[[1;34mINFO[m] ml4ir-parent ....................................... [1;32mSUCCESS[m [  0.004 s]
[[1;34mINFO[m] ml4ir-inference .................................... [1;32mSUCCESS[m [ 37.924 s]
[[1;34mINFO[m] [1m------------------------------------------------------------------------[m
[[1;34mINFO[m] [1;32mBUILD SUCCESS[m
[[1;34mINFO[m] [1m------------------------------------------------------------------------[m
[[1;34mINFO[m] Total time:  38.264 s
[[1;34mINFO[m] Finished at: 2020-09-11T04:23:57-07:00
[[1;34mINFO[m] [1m------------------------------------------------------------------------[m
