###  Drift Analysis: get the Reference Dataset from Model Catalog

Model Drift Analysis require two dataset containing not only the features (xi) but also the target.

After the Model has beeen trained, the best thing to do is to save it in the Model Catalog and to memorize the url of the dataset used
to get train/test split as a custom metadata, that we call "reference dataset"

In this Notebook I show how to get the dataset (as a Pandas DataFrame) from the custom metadata.

In [1]:
import pandas as pd
import numpy as np

import ads
from ads import set_auth

import logging
import warnings

from drift_analysis import get_reference_dataset_url

warnings.filterwarnings('ignore')
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

In [2]:
# we need ads 2.5.10 or greater
print(ads.__version__)

2.5.10


In [3]:
# set RP
set_auth(auth='resource_principal')



### Getting the reference dataset from the Model Catalog

In [4]:
# the function has been moved to drift_analysis.py

In [5]:
# take the OCID of the model from the Model Catalog UI
MODEL_OCID = "ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaangencdyasojemavtoshdggls4rg27i2qctcin6xz3yi3yevhnaha"

ref_url = get_reference_dataset_url(MODEL_OCID)

Start loading model.joblib from model directory /tmp/tmpowv3z5ke ...
Model is successfully loaded.
Start loading model.joblib from model directory /tmp/tmpowv3z5ke ...
Model is successfully loaded.


In [6]:
print(f"Reference Dataset url: {ref_url}")

Reference Dataset url: oci://drift_input@frqap2zhtzbe/reference.csv


In [7]:
# read the dataset
ref_df = pd.read_csv(ref_url)

# have a look
ref_df.head()

Unnamed: 0,TravelForWork,MonthlyRate,PercentSalaryHike,CommuteLength,SalaryLevel,YearsOnJob,JobInvolvement,PerformanceRating,Gender,TrainingTimesLastYear,...,HourlyRate,MonthlyIncome,OverTime,JobSatisfaction,EducationField,JobFunction,EducationalLevel,NumCompaniesWorked,StockOptionLevel,YearsWithCurrManager
0,infrequent,19146,22,2,5640,2,2,4,Male,2,...,33,4775,No,4,Life Sciences,Software Developer,L2,6,2,2
1,none,3395,23,2,5678,23,2,4,Male,3,...,74,10748,No,3,Life Sciences,Software Developer,L1,3,1,4
2,infrequent,4510,18,15,2022,5,3,3,Female,2,...,72,4963,Yes,2,Life Sciences,Software Developer,L4,9,3,3
3,none,17071,16,25,6782,1,4,3,Female,2,...,100,13194,Yes,2,Life Sciences,Product Management,L3,4,0,0
4,infrequent,18725,23,10,1980,4,3,4,Male,4,...,96,2075,No,4,Life Sciences,Software Developer,L4,3,2,3
