# OCI Data Science Model Catalog - Fundamentals

- conda environment: generalml_p311_cpu_x86_64_v1
- Author: Assaf Rabinowicz
- Date: 14Jan2026 

# Notebook Description

* This notebook covers key Model Catalog topics, including:
1. Model serialization
2. Model registration
3. Loading the registered model and using it for inference
4. Retrieving model metadata from the catalog (in multiple ways)
5. Updating a registered model’s metadata (in multiple ways)
* The code heavily relies on the ADS SDK
* Model deployment is out of scope for this notebook

# Packages import and resource principal autentitication

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import fetch_openml
import pandas as pd

import joblib
import tempfile

import ads
from ads.model import SklearnModel
from ads.model.datascience_model import DataScienceModel
from ads.catalog.model import ModelCatalog

In [None]:
ads.set_auth(auth="resource_principal")

# Data import and model training

In [None]:
data = fetch_openml(name="adult", version=2, as_frame=True) # https://www.openml.org/search?type=data&sort=version&status=any&order=asc&exact_name=adult
df = data.frame

In [None]:
df.drop(['fnlwgt'], axis=1,inplace=True) # dropping 'sampling weights' column for simplification
df['class'] = (df['class'] == '>50K').astype(int)

In [None]:
X = df.drop('class', axis=1)
y = df['class']
X = pd.get_dummies(X)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3)

In [None]:
rfc = RandomForestClassifier()
random_forest_model = rfc.fit(X_train, y_train)

# Serialization and Registration

* Model registration and deployment require multiple artifacts beyond model weights, including score.py, runtime.yaml, and inference_conda_env.
* Additional metadata can be added for better documentation and governance, such as input_schema.json, output_schema.json, training_conda_env, and custom parameters (e.g., accuracy).
* ADS simplifies the artifact preparation process, reducing manual effort and errors.
* ADS supports multiple ML frameworks, with the strongest native support for TensorFlow, PyTorch, scikit-learn, XGBoost, LightGBM, SparkPipelineModel, AutoMLx, and Transformers.
* For other frameworks, ADS can still simplify parts of the workflow, but some artifacts (e.g., score.py) may need to be created manually and passed to ADS as arguments.

In [None]:
random_forest_model = SklearnModel(estimator=rfc, artifact_dir='random-forest-model/')

* The prepare method creates all the required artifacts for valid model registration. This is the first place where ADS simplification comes into action.

In [None]:
random_forest_model.prepare(inference_conda_env="generalml_p311_cpu_x86_64_v1",
training_conda_env="generalml_p311_cpu_x86_64_v1",
X_sample=X_test,
y_sample=y_test)

Now that we have a local folder with the prepared artifacts, we use ADS to simplify the following steps:
* Operational validation of the model (not accuracy-related)
* Model registration — pushing the local folder contents to the Model Catalog
* Optionally, deploying the model with ADS to the Model Deployment Catalog

In [None]:
random_forest_model.summary_status()

In [None]:
random_forest_model.verify(X_test.iloc[:20], auto_serialize_data=True)

In [None]:
random_forest_model.schema_input = None # saving the schema is optional. In our case the schema is heavy and therefore is not saved.
random_forest_model.schema_output = None
model_id = random_forest_model.save(display_name="Adults Income - random forest model")

In [None]:
random_forest_model.summary_status()

* Deployment is also simplified via ADS
* Deployment is out of scope for this tutorial

In [None]:
#random_forest_model.deploy(display_name="Adults Income - random forest model")

# Inference

After saving a model in the Model Catalog, users can reuse it in two ways:
1. Load the model from the Model Catalog to a local folder and run predictions locally.
2. Deploy the model to a serving environment and invoke it via an endpoint (recommended for production use).

Here we will demonstrate the first option

## Fetching the model and scoring

In [None]:
downloaded_model = SklearnModel.from_model_catalog(model_id,artifact_dir='downloaded-random_forest/',ignore_conda_error=True)

In [None]:
downloaded_model.predict(X_train.iloc[0].values.reshape(1, -1), local=True) # using ADS

In [None]:
downloaded_model_artifact = joblib.load('downloaded-random_forest/model.joblib')  # Using the raw model directly
downloaded_model_artifact.predict(X_train.iloc[0].values.reshape(1, -1).reshape(1, -1))

# Viewing model metadata

* There are two modules enable fetching metadata from the Model Catalog:
1. ads.model.datascience_model.DataScienceModel
2. ads.catalog.model.ModelCatalog
* In both framework, we can filter results with relevant matedata, such as description and tags.

## Using ads.model.datascience_model.DataScienceModel

In [None]:
ds_model = (DataScienceModel()
   .with_display_name("Adults Income - random forest model")
#   .with_compartment_id()
#   .with_project_id()
#   .with_description()
#   .with_freeform_tags(tag1="", tag2="")
#   .with_artifact("/path/to/the/model/artifacts/"))
           )
print(ds_model)

In [None]:
models = DataScienceModel.list(display_name="Adults Income - random forest model")
for m in models:
    print(f"Display name: {m.display_name}, OCID: {m.id}")

In [None]:
model_info=models[0]
print(model_info)

## Similar code with ads.catalog.model.ModelCatalog

In [None]:
catalog = ModelCatalog()
models = catalog.list_models(display_name="Adults Income - random forest model")
for m in models:
    print(f"Display name: {m.display_name}, OCID: {m.id}")

In [None]:
model_info = catalog.get_model(models[0].id)
print(model_info)

# Updating model metadata

* There are several ways to update metadata parameters:
1. Update the catalog metadata directly using DataScienceModel or ModelCatalog APIs
2. Load the model locally using SklearnModel.from_model_catalog, modify the metadata, and push the updates back to the catalog

## Direct metadata update

In [None]:
(DataScienceModel.from_id(model_id)
 .with_description("predicting high adult income (higher than >50k)")
 .with_freeform_tags(status="post-review")
 .update())

In [None]:
catalog = ModelCatalog()
catalog.update_model(
    model_id=model_id,
    description="predicting high adult income (higher than >50k)",
    freeform_tags={"project": "IncomePrediction"}
)

## Loading the model and then updating

In [None]:
model = SklearnModel.from_model_catalog(
    model_id=model_id, 
    artifact_dir= tempfile.mkdtemp(), 
    ignore_conda_error=True,
    force_overwrite=True
)

model.update(
    description="predicting high adult income (>50k)",
    freeform_tags={"project": "IncomePrediction"}
)