# Model Versions

In this notebook, we present the steps for updating a model schema/version.  When a model is onboarded on to fiddler as a version 1, there can be multiple incremental updates or iterations to that model, the history to which is maintained in fiddler, called model versioning. The users can update existing model schema/versions and also access the older versions.

This notebook is an example of how changes can be made in a model/schema and how fiddler maintains them.

---

Model versioning docs can be referred [here](https://docs.fiddler.ai/platform-guide/monitoring-platform/model-versions)

Model Versions are supported on fiddler client version 3.1.0 and above, and the python version is 3.10 and above.

You can experience Fiddler's Model Versioning in minutes by following these quick steps:

1. Connect to fiddler - Initialisation, Load a Data Sample
2. Create a Project
3. Create a first version of model with no task
4. Second version with target class and binary classification task & defined threshold
5. Third version - Changes in the Datatype of a column and Delete a column
6. Fourth version - Update Schema by changing the column names
7. Fifth version - Update the range of Age column(min, max) & define the max balance
8. Update version name
9. Delete a model version

In [None]:
!pip install -q fiddler-client  #fiddler client version needs to be 3.1.0 and above
import sys
print(sys.version)   #python version needs to be 3.10.11 and above

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.6/151.6 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.1/115.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m52.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.9/137.9 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]


# 0. Imports

In [None]:
import fiddler as fdl
import time as time
import numpy as np
import pandas as pd
from datetime import datetime, timedelta

# 1. Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using our Python client.

---

**We need a few pieces of information to get started.**
1. The URL you're using to connect to Fiddler
3. Your authorization token

The latter two of these can be found by pointing your browser to your Fiddler URL and navigating to the **Settings** page.

In [None]:
URL = 'https://telus.fiddler.ai/' # UPDATE ME
TOKEN = 'F4U5BCyMp46ozayJGwvJc8gIHEiyO53TfARFxrT4gZE' # UPDATE ME

# Initialization

Initilize the connection to Fiddler Client. This call will also validate the client vs server version compatibility.

In [None]:
fdl.init(url=URL, token=TOKEN)

# Load a Data Sample

In [None]:
DATASET_FILE_PATH = "https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/churn_data_sample.csv"
EVENTS_PATH = "https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/data/v3/churn_production_data.csv"

PROJECT_NAME = 'model_versioning_test' # UPDATE ME
DATASET_NAME = 'dataset_model_version' # UPDATE ME
MODEL_NAME = 'model_versioning' # UPDATE ME
STATIC_BASELINE_NAME = 'baseline_dataset'

Drop some output columns from the CSV file, update timestamp field to spread over last 30 days, and pick the columns for inputs.

In [None]:
sample_df = pd.read_csv(DATASET_FILE_PATH)

# update timestamp to make it linear over last 30 days
def update_timestamp(dataframe, number_of_days_lookback=30):
  # end time = now
  end_time = int(time.time() * 1000)
  # start time = 30 days ago
  start_time = end_time - (number_of_days_lookback * 86400000)
  # modify timestamp
  dataframe['timestamp'] = np.linspace(start_time, end_time, dataframe.shape[0]).astype(int)
  dataframe = dataframe.sort_values(by=['timestamp'], ascending=True)
  return dataframe


sample_df = update_timestamp(sample_df, 30)
column_list  = sample_df.columns
input_columns  = list(column_list.drop(["predicted_churn","churn", "customer_id", "timestamp"]))

## 2. Create project

In [None]:
try:
    # Create project
    project = fdl.Project(name=PROJECT_NAME).create()
    print(f'New project created with id = {project.id} and name = {project.name}')
except fdl.Conflict:
    # Get project by name
    project = fdl.Project.from_name(name=PROJECT_NAME)
    print(f'Loaded existing project with id = {project.id} and name = {project.name}')

New project created with id = e56f65f7-a512-4a44-84b7-391aec69bbaa and name = model_versioning_test


## 3. First version with no task

Create the first version of model in the project with NOT_SET task

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/model_versions_1.png" />
        </td>
    </tr>
</table>

In [None]:
version_v1 = 'v1'

model_spec = fdl.ModelSpec(
    inputs = input_columns,
    outputs = ['predicted_churn'],
    targets = ['churn'],
    metadata = ['customer_id', 'timestamp'],
    decisions = [],
    custom_features = [],
)

try:
    model_v1 = fdl.Model.from_name(
        name = MODEL_NAME,
        project_id = project.id,
        version = version_v1
    )
    print(f'Loaded existing model with id = {model_v1.id}')
except fdl.NotFound:
    model_v1 = fdl.Model.from_data(
        source = sample_df,
        name = MODEL_NAME,
        version = version_v1,
        project_id = project.id,
        spec = model_spec,
        task = fdl.ModelTask.NOT_SET,           # this sets the modeltask as NOT SET
        event_ts_col='timestamp',               # use 'timestamp' field of data as event timestamp
    )

    model_v1.create()                           # this creates the model
    print(f'New model created with id = {model_v1.id}')

New model created with id = 259c3dbc-c02f-48f8-b910-2661cd5e4631


### Publish events against first version

In [None]:
output = model_v1.publish(
    source=sample_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=STATIC_BASELINE_NAME
)

print(f'Baseline dataset is published against model, Job id: {output.id}')

production_df = pd.read_csv(EVENTS_PATH)
production_df = update_timestamp(production_df, 30)
output = model_v1.publish(production_df)
print(f'Production event dataset is published against model, Job id: {output.id}')

Baseline dataset is published against model, Job id: e529016f-7aae-43b8-a5ee-ee2dbf768473
Production event dataset is published against model, Job id: ff8a5938-bc58-47be-88d6-0b21a82e138a


## 4. Second version with a task
Add Second version with binary classification task

Update the version and provide target class and binary classification task & threshold

In [None]:
version_v2 = 'v2'

task_params = fdl.ModelTaskParams(
    binary_classification_threshold = 0.5,
    target_class_order = ['no', 'yes'],
    class_weights = None,
    group_by = None,
    top_k = None,
    weighted_ref_histograms = None,
)

try:
    model_v2 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v2
    )
    print(f'Loaded existing model with id = {model_v2.id}')
except fdl.NotFound:
    model_v2 = model_v1.duplicate(version=version_v2)
    model_v2.task_params = task_params
    model_v2.task = fdl.ModelTask.BINARY_CLASSIFICATION
    model_v2.create()
    print(f'New model created with id = {model_v2.id}')

New model created with id = f719f258-464f-4ba2-a20f-0a303111e7fc


## Publish events against second version

In [None]:
output = model_v2.publish(
    source=sample_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=STATIC_BASELINE_NAME
)
print(f'Baseline dataset is published against model, Job id: {output.id}')
output = model_v2.publish(production_df)
print(f'Production event dataset is published against model, Job id: {output.id}')

Baseline dataset is published against model, Job id: 6e2452f2-70be-497f-a8c7-de7708edc388
Production event dataset is published against model, Job id: cb2b40e1-f87d-4f4e-9923-dc8871d0b5e5




```
# This is formatted as code
```

## 5. Third version with schema change
Add third version with change in schema
here we are deleting an input param
And changing the datatype of a column Geography from Category to String.

In [None]:
version_v3 = 'v3'

try:
    model_v3 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v3
    )
    print(f'Loaded existing model with id = {model_v3.id}')
except fdl.NotFound:
    model_v3 = model_v2.duplicate(version=version_v3)
    del model_v3.schema['tenure']                                   #this deletes the tenure column from the inputs

    model_v3.schema['hascrcard'].min = None                            #Removing min and mix of a numerical column before changing datatype
    model_v3.schema['hascrcard'].max = None
    model_v3.schema['hascrcard'].data_type= fdl.DataType.BOOLEAN
    model_v3.schema['hascrcard'].categories = [True, False]

    model_v3.spec.inputs = ['creditscore', 'geography', 'age', 'balance', 'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary']
    model_v3.create()
    print(f'New model created with id = {model_v3.id}')

New model created with id = 4d96a684-2927-46cd-b5b8-f21ecaf4010d


# Publish events against third version

In [None]:
output = model_v3.publish(
    source=sample_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=STATIC_BASELINE_NAME
)
print(f'Baseline dataset is published against model, Job id: {output.id}')
output = model_v3.publish(production_df)
print(f'Production event dataset is published against model, Job id: {output.id}')

Baseline dataset is published against model, Job id: 5ebe640e-44fe-437c-b0b4-530b484c22b8
Production event dataset is published against model, Job id: f331f37a-751b-4e2c-b788-632481b13371


## 6. Fourth version with schema change
Add fourth version with change in schema, where
we are changing the name of columns

In [None]:
version_v4 = 'v4'

try:
    model_v4 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v4
    )
    print(f'Loaded existing model with id = {model_v4.id}')
except fdl.NotFound:
    model_v4 = model_v3.duplicate(version=version_v4)
    model_v4.schema['age'].name = 'Age'                                                 #we are renaming the column names
    model_v4.schema['creditscore'].name = 'CreditScore'
    model_v4.schema['geography'].name = 'Geography'
    model_v4.schema['balance'].name = 'BalanceNew'
    model_v4.schema['numofproducts'].name = 'NumOfProducts'
    model_v4.schema['hascrcard'].name = 'HasCrCard'
    model_v4.schema['isactivemember'].name = 'IsActiveMember'
    model_v4.schema['estimatedsalary'].name = 'EstimatedSalary'
    model_v4.spec.inputs = ['CreditScore', 'Geography', 'Age', 'BalanceNew', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']

    model_v4.create()
    print(f'New model created with id = {model_v4.id}')

New model created with id = e8aa97c0-5534-406f-b35b-ac60ac898e71


# Publish events against fourth version

In [None]:
output = model_v4.publish(
    source=sample_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=STATIC_BASELINE_NAME
)
print(f'Baseline dataset is published against model, Job id: {output.id}')
output = model_v4.publish(production_df)
print(f'Production event dataset is published against model, Job id: {output.id}')

Baseline dataset is published against model, Job id: d492916e-b336-4f79-9578-7dc18c8d99c1
Production event dataset is published against model, Job id: 980d864b-ff17-444f-8810-f3f304792058


## 7. Fifth version with schema change
Add fifth version with change in schema, where
we are changing the changing the max limit of the balance field

<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/model_versions_2.png" />
        </td>
    </tr>
</table>

In [None]:
version_v5 = 'v5'

try:
    model_v5 = fdl.Model.from_name(
        name=MODEL_NAME,
        project_id=project.id,
        version=version_v5
    )
    print(f'Loaded existing model with id = {model_v5.id}')
except fdl.NotFound as e:
    model_v5 = model_v4.duplicate(version=version_v5)
    model_v5.schema['Age'].min = 18                                             #this sets the min and max of the age column
    model_v5.schema['Age'].max = 85

    model_v5.schema['BalanceNew'].max = 1250000                                     #this sets the max value for the balance column
    model_v5.create()
    print(f'New model created with id = {model_v5.id}')


New model created with id = 8ad69595-0e43-409e-aeb4-a11154ef66ca


# Publish events against fifth version

In [None]:
output = model_v5.publish(
    source=sample_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=STATIC_BASELINE_NAME
)
print(f'Baseline dataset is published against model, Job id: {output.id}')
output = model_v5.publish(production_df)
print(f'Production event dataset is published against model, Job id: {output.id}')

## 8. Update version name

In [None]:
model_v4.version = 'v4-old'                                                     #this renames the v4 version name to 'v4-old'

model_v4.update()


<table>
    <tr>
        <td>
            <img src="https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/quickstart/images/model_versions_3.png" />
        </td>
    </tr>
</table>

## 9. Delete model version
Delete v5 version

In [None]:
job = model_v5.delete()                                     #this deletes a specified version of the model
job.wait()



---


**Questions?**  
  
Check out [our docs](https://docs.fiddler.ai/) for a more detailed explanation of what Fiddler has to offer.

If you're still looking for answers, fill out a ticket on [our support page](https://fiddlerlabs.zendesk.com/) and we'll get back to you shortly.