# Model Versions

In this notebook, we present the steps for updating a model schema/version.  When a model is onboarded on to fiddler as a version 1, there can be multiple incremental updates or iterations to that model, the history to which is maintained in fiddler, called model versioning. The users can update existing model schema/versions and also access the older versions. 

This notebook is an example of how changes can be made in a model/schema and how fiddler maintains them.


Fiddler is the pioneer in enterprise A Observability, offering a unified platform that enables Data Science, MLOps, Risk, Compliance, Analytics, and LOB teams to **monitor, explain, analyze, and improve AI deployments at enterprise scale**. 
Obtain contextual insights at any stage of the ML lifecycle, improve predictions, increase transparency and fairness, and optimize business revenue.

# Introduction

This notebook creates different scenarios for adding new versions for a model.

Model Versions are supported on fiddler client version 3.1.0 and above
Make sure that the python version is 3.10 and above

In [None]:
pip install -q fiddler-client==3.1.0

: 

In [None]:
import sys
print(sys.version)

: 

In [None]:
import fiddler as fdl
import tempfile
import time as time
import numpy as np
import pandas as pd
import logging
from uuid import uuid4
from datetime import datetime, timedelta

fdl.__version__

# Set log levels

Set the log level for verbose information. Python Client mostly focus on programatic usage, rather than being interactive. Set the log level appropriately for notebook friendly usage.

In [None]:
fdl.set_logging(level=logging.DEBUG)

# Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using our Python client.

---

**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler
3. Your authorization token

The latter two of these can be found by pointing your browser to your Fiddler URL and navigating to the **Settings** page.

In [None]:
URL = 'https://preprod.fiddler.ai' # UPDATE ME
TOKEN = '-B5h3iKsUBk2yrYEbamxGHcDggXZTPb7URD6lvzWkrk' # UPDATE ME

# Initialization

Initilize the connection to Fiddler Client. This call will also validate the client vs server version compatibility.

In [None]:
fdl.init(url=URL, token=TOKEN)

print(f'Client version: {fdl.__version__}')
print(f'Server version: {fdl.conn.server_version}')
print(f'Organization id: {fdl.conn.organization_id}')
print(f'Organization name: {fdl.conn.organization_name}')

In [None]:
DATASET_FILE_PATH = "https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/churn_data_sample.csv" # UPDATE ME    
PROJECT_NAME = 'konark_project_1' # UPDATE ME
DATASET_NAME = 'dataset_1' # UPDATE ME
MODEL_NAME = 'model_1' # UPDATE ME

Drop some output columns from the CSV file and pick the columns for inputs

In [None]:
sample_df = pd.read_csv(DATASET_FILE_PATH)
column_list  = sample_df.columns

input_columns  = list(column_list.drop(["predicted_churn","churn", "customer_id", "timestamp"]))
# list(column_list.drop(["predicted_churn","churn", "customer_id", "timestamp"]))

# sample_df
input_columns

## Utility methods

In [None]:
def _add_timestamp(df, event_ts_col: str, start: datetime = datetime.now(), end: datetime = datetime.now() - timedelta(days=30)):
    """
    This function will add a random timestamp to df between
    two datetime objects - start and end.
    """
    start_time = start.timestamp() * 1000
    end_time = end.timestamp() * 1000
    df[event_ts_col] = np.linspace(start_time, end_time, df.shape[0]).astype(int)
    df.sort_values(by=[event_ts_col], ascending=True)

## Create project

In [None]:
try:
    # Create project
    project = fdl.Project(name=PROJECT_NAME).create()
    print(f'New project created with id = {project.id}')
except fdl.Conflict:
    # Get project by name
    project = fdl.Project.from_name(name=PROJECT_NAME)
    print(f'Loaded existing project with id = {project.id}')

In [None]:
for x in fdl.Project.list():
    print(f'Project: {x.id} - {x.name}')

## First version with no task

Create the first version of model in the project with NOT_SET task and pre-publish production and production events

In [None]:
version_v1 = 'v1'

model_spec = fdl.ModelSpec(
    inputs=input_columns,
    outputs=['predicted_churn'],
    targets=['churn'],
    metadata=['customer_id', 'timestamp'],
    decisions=[],
    metadata=[],
    custom_features=[],
)

try:
    model_v1 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v1
    )
    print(f'Loaded existing model with id = {model_v1.id}')
except fdl.NotFound:
    model_v1 = fdl.Model.from_data(
        source=sample_df, 
        name=MODEL_NAME, 
        version=version_v1,
        project_id=project.id,
        spec=model_spec,
        task=fdl.ModelTask.BINARY_CLASSIFICATION,
        event_ts_col='__timestamp',
        event_id_col='__event_id',
    )

    model_v1.create()
    print(f'New model created with id = {model_v1.id}')

## Second version with a task
Add Second version with binary classification task and publish production and pre-production events
Update the datatype of input feature Geography & update the age min/max

In [None]:
version_v2 = 'v2'

model_spec1 = fdl.ModelSpec(
    inputs = [
        fdl.schemas.model_schema.Column(
            name='Geography',
            data_type=fdl.DataType.STRING
        )
    ]
)

task_params = fdl.ModelTaskParams(
    binary_classification_threshold=0.5,
    target_class_order=['no', 'yes'],
    class_weights=None,
    group_by=None,
    top_k=None,
    weighted_ref_histograms=None,
)

xai_params = fdl.XaiParams(
    custom_explain_methods=[],
    default_explain_method=None,
)

try:
    model_v2 = fdl.Model.from_name(
        name=MODEL_NAME,
        spec=model_spec1,
        project_id=project.id,
        version=version_v2
    )
    print(f'Loaded existing model with id = {model_v2.id}')
except fdl.NotFound:
    model_v2 = model_v1.duplicate(version=version_v2)
    model_v2.schema['age'].min = 21
    model_v2.schema['age'].max = 55
    model_v2.task_params = task_params
    model_v2.xai_params = xai_params
    model_v2.task = fdl.ModelTask.BINARY_CLASSIFICATION
    model_v2.create()
    print(f'New model created with id = {model_v2.id}')


## Third version with schema change
Add third version with change in schema
here we are changing the age min/max, deleting an input param

In [None]:
version_v3 = 'v3'

try:
    model_v3 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v3
    )
    print(f'Loaded existing model with id = {model_v3.id}')
except fdl.NotFound:
    model_v3 = model_v2.duplicate(version=version_v3)
    model_v3.schema['creditscore'].name = 'CreditScore'
    model_v3.schema['geography'].name = 'Geography'
    model_v3.schema['balance'].name = 'BalanceNew'
    model_v3.schema['numofproducts'].name = 'NumOfProducts'
    model_v3.schema['hascrcard'].name = 'HasCrCard'
    model_v3.schema['isactivemember'].name = 'IsActiveMember'
    model_v3.schema['estimatedsalary'].name = 'EstimatedSalary'
    model_v3.schema['age'].name = 'Age'
    model_v3.schema['Age'].min = 18
    model_v3.schema['Age'].max = 85
    del model_v3.schema['tenure']

    model_v3.spec.inputs = ['CreditScore', 'Geography', 'Age', 'BalanceNew', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
    
    model_v3.create()
    print(f'New model created with id = {model_v3.id}')

## Fourth version with schema change
Add fourth version with change in schema, where 
we are changing the weights of the class, removing some input params, 

In [None]:
version_v4 = 'v4'

try:
    model_v4 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v4
    )
    print(f'Loaded existing model with id = {model_v4.id}')
except fdl.NotFound as e:
    print('konark is here 1st block ')
    print(e.message)
    model_v4 = model_v3.duplicate(version=version_v4)
    
    model_v4.spec.inputs = ['CreditScore', 'Geography', 'Age', 'BalanceNew', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
    model_v4.schema['BalanceNew'].max = 250000

    task_params = fdl.ModelTaskParams(
        class_weights = [23.0, 12.0, 25.0, 12.5, 12.5, 7.5, 7.5, 0.0],
    weighted_ref_histograms = True,
    )
    
    model_v4.task_params = task_params    
    model_v4.create()
    print(f'New model created with id = {model_v4.id}')

In [None]:
version_v5 = 'v5'

try:
    model_v5 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v5
    )
    print(f'Loaded existing model with id = {model_v4.id}')
except fdl.NotFound as e:
    model_v5 = model_v4.duplicate(version=version_v5)
    
    model_v5.spec.inputs = ['CreditScore', 'Geography', 'Age', 'BalanceNew', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
    model_v5.schema['BalanceNew'].max = 1250000    
    model_v5.create()
    print(f'New model created with id = {model_v5.id}')


## Publish pre-production events

In [None]:
for model in [model_v1, model_v2, model_v3, model_v4. model_v5]:
    try:
        fdl.Dataset.from_name(name=DATASET_NAME, model_id=model.id)
    except fdl.NotFound:
        print(f"Publishing dataset for {model.name}/{model.version}")
        job = model.publish(
            source=DATASET_FILE_PATH,
            environment=fdl.EnvType.PRE_PRODUCTION,
            dataset_name=DATASET_NAME,
        )
        job.wait()

## Publish events

In [None]:
events_df = pd.read_csv(DATASET_FILE_PATH)

for model in [model_v1, model_v2, model_v3, model_v4]:    
    print(f"Publishing events for {model.name}/{model.version}")
    
    events_df[model.event_id_col] = [str(uuid4()) for _ in range(len(events_df))]
    _add_timestamp(df=events_df, event_ts_col=model.event_ts_col)
    
    job = model.publish(source=events_df)
    job.wait()

## Update version name

In [None]:
model_v4.version = 'v4-old'
model_v4.update()

## List model versions

List all the versions of a model

In [None]:
for x in fdl.Model.list(project_id=project.id, name=MODEL_NAME):
    print(f'Model: {x.id} - {x.name} {x.version}')

## Delete model version
Delete v4 version

In [None]:
job = model_v5.delete()
job.wait()

In [None]:
new_model = model.duplicate(version='v2')

new_model.schema['Age'].min = 18
new_model.schema['Age'].max = 60
new_model.task = fdl.ModelTask.BINARY_CLASSIFICATION

try:
    new_model.create()
    print(f'New model version created with id = {model.id}')
except fdl.Conflict:
    new_model = fdl.Model.from_name(name=model.name, project_id=project.id, version=new_model.version)
    print(f'Loaded existing model version with id = {model.id}')

**You're all done!**
  
Now just head to your Fiddler environment's UI and explore the model's explainability by navigating to the model and selecting the **Explain** tab on the top right.



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.