# Fiddler Ranking Model Quick Start Guide

Fiddler offer the ability for your teams to observe you ranking models to understand thier performance and catch issues like data drift before they affect your applications.

# Quickstart: Expedia Search Ranking
The following dataset is coming from Expedia. It includes shopping and purchase data as well as information on price competitiveness. The data are organized around a set of “search result impressions”, or the ordered list of hotels that the user sees after they search for a hotel on the Expedia website. In addition to impressions from the existing algorithm, the data contain impressions where the hotels were randomly sorted, to avoid the position bias of the existing algorithm. The user response is provided as a click on a hotel. From: https://www.kaggle.com/c/expedia-personalized-sort/overview

# 0. Imports

In [None]:
import pandas as pd
import lightgbm as lgb
import numpy as np
import time as time
import datetime
import fiddler as fdl
print(f"Running Fiddler client version {fdl.__version__}")

# 1. Connect to Fiddler and Create a Project

Before you can add information about your model with Fiddler, you'll need to connect using our API client.


---


**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

These can be found by navigating to the **Settings** page of your Fiddler environment.

In [None]:
URL = 'https://mainbuild.dev.fiddler.ai'  # Make sure to include the full URL (including https://).
TOKEN = 'IkX6tatV_HrrxqoLO4q3rcaKUAPIK2jGk-oI4X4LAQ8'

Next we use these credentials to connect to the Fiddler API.

In [None]:
fdl.init(
    url=URL,
    token=TOKEN
)

Once you connect, you can create a new project by specifying a unique project ID in the client's `Project.create` function.

In [None]:
PROJECT_NAME = 'search_ranking_example_004'

project = fdl.Project(
    name=PROJECT_NAME
)

project.create()

# 2. Load a Data Sample

Now we retrieve the Expedia Dataset as a data sample for this model.

In [None]:
PATH_TO_SAMPLE_CSV = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/expedia_data_sample.csv'

sample_df = pd.read_csv(PATH_TO_SAMPLE_CSV)
sample_df.head()

Fiddler uses this data sample to keep track of important information about your data.
  
This includes **data types**, **data ranges**, and **unique values** for categorical variables.

# 3. Onboard Model Info and Upload the Model Artifact to Fiddler

In [None]:
#create model directory to store your model files
import os
model_dir = "model"
os.makedirs(model_dir)

### 3.a Adding model info to Fiddler
To add a Ranking model you must specify the ModelTask as `RANKING` in the model info object.  

Additionally, you must provide the `group_by` argument that corresponds to the query search id. This `group_by` column should be present either in:
- `features` : if it is used to build and run the model
- `metadata_cols` : if not used by the model

Optionally, you can give a `ranking_top_k` number (default is 50). This will be the number of results within each query to take into account while computing the performance metrics in monitoring.  

Unless the prediction column was part of your baseline dataset, you must provide the minimum and maximum values predictions can take in a dictionary format (see below).  

If your target is categorical (string), you need to provide the `target_class_order` argument. If your target is numerical and you don't specify this argument, Fiddler will infer it.   

This will be the list of possible values for the target **ordered**. The first element should be the least relevant target level, the last element should be the most relevant target level.

In [None]:
model_spec = fdl.ModelSpec(
    inputs=list(sample_df.drop(columns=['binary_relevance', 'score', 'graded_relevance', 'position', 'timestamp']).columns),
    outputs=['score'],
    targets=['binary_relevance'],
    metadata=['timestamp', 'graded_relevance', 'position']
)

In [None]:
model_task = fdl.ModelTask.RANKING

task_params = fdl.ModelTaskParams(
    group_by='srch_id',
    top_k=20,
    target_class_order=[0, 1]
)

In [None]:
timestamp_column = 'timestamp'

In [None]:
MODEL_NAME = 'expedia_model'

model = fdl.Model.from_data(
    name=MODEL_NAME,
    project_id=project.id,
    source=sample_df,
    spec=model_spec,
    task=model_task,
    task_params=task_params,
    event_ts_col=timestamp_column
)

model.create()

# 4. Upload our data sample as a baseline dataset

In order to add a model artifact, we need a baseline first.
A baseline is a dataset which can be used to represent "golden data," or data which our model expects to receive in production.

We can publush the data sample from earlier to add it as a baseline.

For ranking, we need to ingest all events from a given query or search ID together. To do that, we need to transform the data to a grouped format.  
You can use the `group_by` utility function to do the transformation.

In [None]:
DATASET_NAME = 'expedia_dataset'

sample_df_grouped = fdl.utils.helpers.group_by(df=sample_df, group_by_col='srch_id')

model.publish(
    source=sample_df_grouped,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=DATASET_NAME
)

# 5. Upload a model artifact

### 5.a Create a Model Wrapper Script

Package.py is the interface between Fiddler’s backend and your model. This wrapper script helps Fiddler to understand how to load the model, how to run the model, and what its inputs and outputs are.

You need to implement three parts:
- init: Load the model, and any associated files such as feature transformers.
- transform: If you use some pre-processing steps not part of the model file, transform the data into a format that the model recognizes.
- predict: Make predictions using the model.

In [None]:
%%writefile model/package.py

import pickle
from pathlib import Path
import pandas as pd

PACKAGE_PATH = Path(__file__).parent

class ModelPackage:

    def __init__(self):
        """
         Load the model file and any pre-processing files if needed.
        """
        self.output_columns = ['score']

        with open(PACKAGE_PATH / 'model.pkl', 'rb') as infile:
            self.model = pickle.load(infile)

    def transform(self, input_df):
        """
        Accepts a pandas DataFrame object containing rows of raw feature vectors.
        Use pre-processing file to transform the data if needed.
        In this example we don't need to transform the data.
        Outputs a pandas DataFrame object containing transformed data.
        """
        return input_df

    def predict(self, input_df):
        """
        Accepts a pandas DataFrame object containing rows of raw feature vectors.
        Outputs a pandas DataFrame object containing the model predictions whose column labels
        must match the output column names in model info.
        """
        transformed_df = self.transform(input_df)
        pred = self.model.predict(transformed_df)
        return pd.DataFrame(pred, columns=self.output_columns)

def get_model():
    return ModelPackage()

### 5.b Retriving the model files

To explain a model's inner workigs we need to upload the model artifacts. We will retrive a pre-trained model from the Fiddler Repo that was trained with **lightgbm 2.3.0**

In [None]:
import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/models/model_ranking.pkl", "model/model.pkl")

### 5.c Upload the model files to Fiddler


Now as a final step in the setup you can upload the model artifact files using `add_model_artifact`.
   - The `model_dir` is the path for the folder containing the model file(s) and the `package.py` from ther last step.
   - Since each model artifact uploaded to Fiddler gets deployed in its own container, the [deployment params](https://docs.fiddler.ai/reference/fdldeploymentparams) allow us to specify the compute needs and library set of the container.

In [None]:
#Uploading Model files
deployment_params = fdl.DeploymentParams(
    image_uri="md-base/python/machine-learning:1.1.0",
    cpu=100,
    memory=256,
    replicas=1,
)

model.add_artifact(
    model_dir=model_dir,
    deployment_params=deployment_params
)

# 6. Publish Events For Monitoring

### 6.a Gather and prepare Production Events
This is the production log file we are going to upload in Fiddler.

In [None]:
df_logs = pd.read_csv('https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/expedia_logs.csv')
df_logs

In [None]:
#timeshift the data to be current day
df_logs['timestamp'] = df_logs['timestamp'] + (float(time.time()) - df_logs['timestamp'].max())

Again, let's group the data before sending it to Fiddler.

In [None]:
df_logs_grouped = fdl.utils.helpers.group_by(df=df_logs, group_by_col='srch_id')
df_logs_grouped

### 6.b Publish events

In [None]:
output = model.publish(df_logs_grouped)

# 7. Get insights


**You're all done!**
  
You can now head to your Fiddler environment and start getting enhanced observability into your model's performance.

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/ranking_model_1.png" />
        </td>
    </tr>
</table>

--------
**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

Join our [community Slack](http://fiddler-community.slack.com/) to ask any questions!

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.