# How to Debug Model Upload

## Initialize Fiddler Client
We begin this section as usual by establishing a connection to our
Fiddler instance. We can establish this connection either by specifying 
our credentials directly, or by utilizing our `fiddler.ini` file. More
information can be found in the [setup](https://github.com/fiddler-labs/fiddler-samples/blob/master/content_root/tutorial/00%20Setup.ipynb) section.

In [None]:
import fiddler as fdl
import logging

# True for logging debug message 
verbose = True

if verbose:
    logging.basicConfig(level=logging.DEBUG)

# client = fdl.FiddlerApi(url=url, org_id=org_id, auth_token=auth_token, verbose=verbose)
client = fdl.FiddlerApi(verbose=verbose)

## Create Project

Here we will create a project, a convenient container for housing the models and datasets associated with a given ML use case.

In [None]:
project_id = 'debug_model_upload'

In [None]:
# Creating our project using project_id
if project_id not in client.list_projects():
    client.create_project(project_id)

## Load Dataset
Here we will load in our baseline dataset from a csv called `train.csv`. We will
also create a schema using this information.

In [None]:
import pandas as pd
df = pd.read_csv('/app/fiddler_samples/samples/datasets/winequality/train.csv')
df_schema = fdl.DatasetInfo.from_dataframe(df, max_inferred_cardinality=1000)


## Upload Dataset
To upload a model, you first need to upload a sample of the data of the model’s 
inputs, targets, and additional metadata that might be useful for model analysis. 
This data sample helps us (among other things) to infer the model schema and the 
data types and values range of each feature.

In [None]:
if 'wine_quality' not in client.list_datasets(project_id):
    upload_result = client.upload_dataset(
        project_id=project_id,
        dataset={'train': df}, 
        dataset_id='wine_quality')



## Create Model Schema
As you must have noted, in the dataset upload step we did not ask for the model’s 
features and targets, or any model specific information. That’s because we 
allow for linking multiple models to a given dataset schema. Hence we require 
an Infer model schema step which helps us know the features relevant to the 
model and the model task. Here you can specify the input features, the target 
column, decision columns and metadata columns, and also the type of model.

In [None]:
target = 'quality'
train_input = df.drop(columns=['row_id', 'quality'])
train_target = df[target]

feature_columns = list(train_input.columns)

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=client.get_dataset_info(project_id, 'wine_quality'),
    target=target, 
    features=feature_columns,
    display_name='debug model',
    description='this is a sklearn model from tutorial that shows how to debug model upload'
)

## Train model
Build and train your model.

In [None]:
import sklearn.linear_model
import sklearn.pipeline
import sklearn.preprocessing


regressor = sklearn.linear_model.LinearRegression()

full_model = sklearn.pipeline.Pipeline(steps=[
        ('standard_scaling', sklearn.preprocessing.StandardScaler()),
        ('model_name', regressor),
    ])

full_model.fit(train_input, train_target)
full_model.predict(train_input)


## Save model and schema
Next step, we need to save the model and any pre-processing step you had 
on the input features (for example Categorical encoder, Tokenization, ...).

In [None]:
import pathlib
import shutil
import pickle
import yaml

project_id = 'tutorial'
model_id = 'debug_model'

# create temp dir
model_dir = pathlib.Path(model_id)
shutil.rmtree(model_dir, ignore_errors=True)
model_dir.mkdir()

# save model
with open(model_dir / 'model.pkl', 'wb') as pkl_file:
    pickle.dump(full_model, pkl_file)

# save model schema
with open(model_dir / 'model.yaml', 'w') as yaml_file:
    yaml.dump({'model': model_info.to_dict()}, yaml_file)


## Write package.py wrapper
A wrapper is needed between Fiddler and the model. This wrapper can be used to 
translate the inputs and outputs to fit what the model expects and what Fiddler 
is able to consume. More information can be found [here](https://docs.fiddler.ai/api-reference/package-py/)

In [None]:
%%writefile debug_model/package.py

import pickle
from pathlib import Path
import pandas as pd
import logging

PACKAGE_PATH = Path(__file__).parent

class SklearnModelPackage:
    is_classifier = False
    output_columns = ['predicted_quality']

    def __init__(self):
        with open(PACKAGE_PATH / 'model.pkl', 'rb') as infile:
            self.model = pickle.load(infile)

    def predict(self, input_df):
        
        # this will log the dataframe after transforming
        logging.info(f'log dataframe {input_df}')
        f = self.model.predict if not self.is_classifier else self.model.predict_proba
        return pd.DataFrame(f(input_df), columns=self.output_columns)
    
def get_model():
    return SklearnModelPackage()



## Show generated files

In [None]:
!ls -l debug_model

## Test package.py locally before uploading
You may have to restart the kernel for the class to be loaded locally

In [None]:
from debug_model import package as model_class

In [None]:
model_class.get_model().predict(train_input[0:5])

## Upload model
Now that we have all the parts that we need, we can go ahead and upload the model to the Fiddler platform. You can use the [upload_model_package](https://docs.fiddler.ai/api-reference/python-package/#upload-model-package) to upload this entire directory in one shot. We need the following for uploading a model:
- The `path` to the directory
- The `project_id` to which the model belongs
- The `model_id`, which is the name you want to give the model. You can access it in Fiddler henceforth via this ID
- The `dataset` which the model is linked to (optional)  

In total, we will have a `model.yaml`, a `*.pkl`, and a `package.py` file within our model directory.

In [None]:
client.delete_model(project_id, model_id)
client.upload_model_package(model_dir, project_id, model_id)

## Run model
Now, let's test out our model by interfacing with the client and 
calling [run model](https://docs.fiddler.ai/api-reference/python-package/#run-model).

In [None]:
prediction_input = train_input[0: 10]
result = client.run_model(project_id, model_id, prediction_input, log_events=True)

In [None]:
result