# OCI Data Science Model Catalog - Fundamentals

- conda environment: generalml_p311_cpu_x86_64_v1
- Author: Assaf Rabinowicz
- Date: 14Jan2026 

# Notebook Description

* This notebook covers key Model Catalog topics, including:
1. Model serialization
2. Model registration
3. Loading the registered model and using it for inference
4. Retrieving model metadata from the catalog (in multiple ways)
5. Updating a registered model’s metadata (in multiple ways)
* The code heavily relies on the ADS SDK
* Model deployment is out of scope for this notebook

# Packages import and resource principal autentitication

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import fetch_openml
import pandas as pd

import joblib
import tempfile

import ads
from ads.model import SklearnModel
from ads.model.datascience_model import DataScienceModel
from ads.catalog.model import ModelCatalog

In [2]:
ads.set_auth(auth="resource_principal")

# Data import and model training

In [8]:
data = fetch_openml(name="adult", version=2, as_frame=True) # https://www.openml.org/search?type=data&sort=version&status=any&order=asc&exact_name=adult
df = data.frame

In [9]:
df.drop(['fnlwgt'], axis=1,inplace=True) # dropping 'sampling weights' column for simplification
df['class'] = (df['class'] == '>50K').astype(int)

In [20]:
X = df.drop('class', axis=1)
y = df['class']
X = pd.get_dummies(X)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3)

In [21]:
rfc = RandomForestClassifier()
random_forest_model = rfc.fit(X_train, y_train)

# Serialization and Registration

* Model registration and deployment require multiple artifacts beyond model weights, including score.py, runtime.yaml, and inference_conda_env.
* Additional metadata can be added for better documentation and governance, such as input_schema.json, output_schema.json, training_conda_env, and custom parameters (e.g., accuracy).
* ADS simplifies the artifact preparation process, reducing manual effort and errors.
* ADS supports multiple ML frameworks, with the strongest native support for TensorFlow, PyTorch, scikit-learn, XGBoost, LightGBM, SparkPipelineModel, AutoMLx, and Transformers.
* For other frameworks, ADS can still simplify parts of the workflow, but some artifacts (e.g., score.py) may need to be created manually and passed to ADS as arguments.

In [22]:
random_forest_model = SklearnModel(estimator=rfc, artifact_dir='random-forest-model/')



* The prepare method creates all the required artifacts for valid model registration. This is the first place where ADS simplification comes into action.

In [23]:
random_forest_model.prepare(inference_conda_env="generalml_p311_cpu_x86_64_v1",
training_conda_env="generalml_p311_cpu_x86_64_v1",
X_sample=X_test,
y_sample=y_test)




algorithm: RandomForestClassifier
artifact_dir:
  /home/datascience/code/model catalog/random-forest-model:
  - - .model-ignore
    - score.py
    - model.joblib
    - output_schema.json
    - runtime.yaml
    - input_schema.json
framework: scikit-learn
model_deployment_id: null
model_id: null

Now that we have a local folder with the prepared artifacts, we use ADS to simplify the following steps:
* Operational validation of the model (not accuracy-related)
* Model registration — pushing the local folder contents to the Model Catalog
* Optionally, deploying the model with ADS to the Model Deployment Catalog

In [24]:
random_forest_model.summary_status()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Actions Needed
Step,Status,Details,Unnamed: 3_level_1
initiate,Done,Initiated the model,
prepare(),Done,Generated runtime.yaml,
prepare(),Done,Generated score.py,
prepare(),Done,Serialized model,
prepare(),Done,"Populated metadata(Custom, Taxonomy and Provenance)",
verify(),Available,Local tested .predict from score.py,
save(),Available,Conducted Introspect Test,
save(),Available,Uploaded artifact to model catalog,
deploy(),UNKNOWN,Deployed the model,
predict(),Not Available,Called deployment predict endpoint,


In [25]:
random_forest_model.verify(X_test.iloc[:20], auto_serialize_data=True)

Start loading model.joblib from model directory /home/datascience/code/model catalog/random-forest-model ...
Model is successfully loaded.
  return pd.read_json(json_data, dtype=fetch_data_type_from_schema(input_schema_path))



{'prediction': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1]}

In [29]:
random_forest_model.schema_input = None # saving the schema is optional. In our case the schema is heavy and therefore is not saved.
random_forest_model.schema_output = None
model_id = random_forest_model.save(display_name="Adults Income - random forest model")

Start loading model.joblib from model directory /home/datascience/code/model catalog/random-forest-model ...
Model is successfully loaded.
['.model-ignore', 'score.py', 'model.joblib', 'test_json_output.json', 'output_schema.json', 'runtime.yaml', 'input_schema.json']


loop1:   0%|          | 0/4 [00:00<?, ?it/s]

In [87]:
random_forest_model.summary_status()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Actions Needed
Step,Status,Details,Unnamed: 3_level_1
initiate,Done,Initiated the model,
prepare(),Done,Generated runtime.yaml,
prepare(),Done,Generated score.py,
prepare(),Done,Serialized model,
prepare(),Done,"Populated metadata(Custom, Taxonomy and Provenance)",
verify(),Done,Local tested .predict from score.py,
save(),Done,Conducted Introspect Test,
save(),Done,Uploaded artifact to model catalog,
deploy(),UNKNOWN,Deployed the model,
predict(),Not Available,Called deployment predict endpoint,


* Deployment is also simplified via ADS
* Deployment is out of scope for this tutorial

In [None]:
#random_forest_model.deploy(display_name="Adults Income - random forest model")

# Inference

After saving a model in the Model Catalog, users can reuse it in two ways:
1. Load the model from the Model Catalog to a local folder and run predictions locally.
2. Deploy the model to a serving environment and invoke it via an endpoint (recommended for production use).

Here we will demonstrate the first option

## Fetching the model and scoring

In [31]:
downloaded_model = SklearnModel.from_model_catalog(model_id,artifact_dir='downloaded-random_forest/',ignore_conda_error=True)

loop1:   0%|          | 0/4 [00:00<?, ?it/s]



In [58]:
downloaded_model.predict(X_train.iloc[0].values.reshape(1, -1), local=True) # using ADS

Start loading model.joblib from model directory /home/datascience/code/model catalog/downloaded-random_forest ...
Model is successfully loaded.



{'prediction': [0]}

In [62]:
downloaded_model_artifact = joblib.load('downloaded-random_forest/model.joblib')  # Using the raw model directly
downloaded_model_artifact.predict(X_train.iloc[0].values.reshape(1, -1).reshape(1, -1))




array([0])

# Viewing model metadata

* There are two modules enable fetching metadata from the Model Catalog:
1. ads.model.datascience_model.DataScienceModel
2. ads.catalog.model.ModelCatalog
* In both framework, we can filter results with relevant matedata, such as description and tags.

## Using ads.model.datascience_model.DataScienceModel

In [51]:
ds_model = (DataScienceModel()
   .with_display_name("Adults Income - random forest model")
#   .with_compartment_id()
#   .with_project_id()
#   .with_description()
#   .with_freeform_tags(tag1="", tag2="")
#   .with_artifact("/path/to/the/model/artifacts/"))
           )
print(ds_model)


kind: datascienceModel
spec:
  compartmentId: ocid1.compartment.oc1..aaaaaaaaenvaxcmsbmrio4gieevntz7ryuji6quq65rnbwjqtweahitw4dza
  displayName: Adults Income - random forest model
  projectId: ocid1.datascienceproject.oc1.eu-frankfurt-1.amaaaaaaeicj2tia3noqgbegva53whrsznt2oy7txmxjcm4lggskw7n7i2sq
type: dataScienceModel



In [52]:
models = DataScienceModel.list(display_name="Adults Income - random forest model")
for m in models:
    print(f"Display name: {m.display_name}, OCID: {m.id}")

Display name: Adults Income - random forest model, OCID: ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaaeicj2tiakvfv3ews3o6hpwfckkjtohrleljqaciphkuxa3tbc6qq


In [69]:
model_info=models[0]
print(model_info)


kind: datascienceModel
spec:
  artifact: ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaaeicj2tiakvfv3ews3o6hpwfckkjtohrleljqaciphkuxa3tbc6qq.zip
  compartmentId: ocid1.compartment.oc1..aaaaaaaaenvaxcmsbmrio4gieevntz7ryuji6quq65rnbwjqtweahitw4dza
  customMetadataList:
    data:
    - category: Training Environment
      description: The URI of the training conda environment.
      has_artifact: false
      key: CondaEnvironmentPath
      value: oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General_Machine_Learning_for_CPUs_on_Python_3.11/1.0/generalml_p311_cpu_x86_64_v1
    - category: Training Environment
      description: The conda environment where the model was trained.
      has_artifact: false
      key: CondaEnvironment
      value: oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General_Machine_Learning_for_CPUs_on_Python_3.11/1.0/generalml_p311_cpu_x86_64_v1
    - category: Training Environment
      description: The slug name of the training conda enviro

## Similar code with ads.catalog.model.ModelCatalog

In [10]:
catalog = ModelCatalog()
models = catalog.list_models(display_name="Adults Income - random forest model")
for m in models:
    print(f"Display name: {m.display_name}, OCID: {m.id}")

  catalog = ModelCatalog()

Display name: Adults Income - random forest model, OCID: ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaaeicj2tiakvfv3ews3o6hpwfckkjtohrleljqaciphkuxa3tbc6qq


In [11]:
model_info = catalog.get_model(models[0].id)
print(model_info)

<style type="text/css">
#T_56ab1_row0_col0, #T_56ab1_row1_col0, #T_56ab1_row2_col0, #T_56ab1_row3_col0, #T_56ab1_row4_col0, #T_56ab1_row5_col0, #T_56ab1_row6_col0, #T_56ab1_row7_col0 {
  margin-left: 0px;
}
</style>
<table id="T_56ab1">
  <thead>
    <tr>
      <th class="blank level0" >&nbsp;</th>
      <th id="T_56ab1_level0_col0" class="col_heading level0 col0" ></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th id="T_56ab1_level0_row0" class="row_heading level0 row0" >display_name</th>
      <td id="T_56ab1_row0_col0" class="data row0 col0" >Adults Income - random forest model</td>
    </tr>
    <tr>
      <th id="T_56ab1_level0_row1" class="row_heading level0 row1" >description</th>
      <td id="T_56ab1_row1_col0" class="data row1 col0" >predicting high adult income (higher than >50k)</td>
    </tr>
    <tr>
      <th id="T_56ab1_level0_row2" class="row_heading level0 row2" >freeform_tags</th>
      <td id="T_56ab1_row2_col0" class="data row2 col0" >{'project': 'IncomePredic

# Updating model metadata

* There are several ways to update metadata parameters:
1. Update the catalog metadata directly using DataScienceModel or ModelCatalog APIs
2. Load the model locally using SklearnModel.from_model_catalog, modify the metadata, and push the updates back to the catalog

## Direct metadata update

In [12]:
(DataScienceModel.from_id(model_id)
 .with_description("predicting high adult income (higher than >50k)")
 .with_freeform_tags(status="post-review")
 .update())


kind: datascienceModel
spec:
  artifact: ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaaeicj2tiakvfv3ews3o6hpwfckkjtohrleljqaciphkuxa3tbc6qq.zip
  compartmentId: ocid1.compartment.oc1..aaaaaaaaenvaxcmsbmrio4gieevntz7ryuji6quq65rnbwjqtweahitw4dza
  customMetadataList:
    data:
    - category: Training Profile
      description: The model serialization format.
      has_artifact: false
      key: ModelSerializationFormat
      value: joblib
    - category: Training Environment
      description: The conda environment where the model was trained.
      has_artifact: false
      key: CondaEnvironment
      value: oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General_Machine_Learning_for_CPUs_on_Python_3.11/1.0/generalml_p311_cpu_x86_64_v1
    - category: Training Environment
      description: The slug name of the training conda environment.
      has_artifact: false
      key: SlugName
      value: generalml_p311_cpu_x86_64_v1
    - category: Training Environment
      desc

In [5]:
catalog = ModelCatalog()
catalog.update_model(
    model_id=model_id,
    description="predicting high adult income (higher than >50k)",
    freeform_tags={"project": "IncomePrediction"}
)

  from ads.catalog.model import ModelCatalog

  from ads.catalog.model import ModelCatalog

  import pkg_resources

  catalog = ModelCatalog()



Unnamed: 0,Unnamed: 1
display_name,Adults Income - random forest model
description,predicting high adult income (higher than >50k)
freeform_tags,{'project': 'IncomePrediction'}
defined_tags,"{'Default_Tags': {'CostTrackingCompartment': 'Specialists', 'CreatedBy': 'ocid1.datasciencenotebooksession.oc1.eu-frankfurt-1.amaaaaaaeicj2tia5kesm5xrcumc5fpc7kflmawra64gborapmu2w2dnxfgq', 'AutoStop': 'Yes'}}"
schema_input,"{'schema': [], 'version': '1.1'}"
schema_output,"{'schema': [], 'version': '1.1'}"
metadata_custom,"{'data': [{'key': 'ModelSerializationFormat', 'value': 'joblib', 'description': 'The model serialization format.', 'category': 'Training Profile', 'has_artifact': False}, {'key': 'CondaEnvironment', 'value': 'oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General_Machine_Learning_for_CPUs_on_Python_3.11/1.0/generalml_p311_cpu_x86_64_v1', 'description': 'The conda environment where the model was trained.', 'category': 'Training Environment', 'has_artifact': False}, {'key': 'SlugName', 'value': 'generalml_p311_cpu_x86_64_v1', 'description': 'The slug name of the training conda environment.', 'category': 'Training Environment', 'has_artifact': False}, {'key': 'ModelArtifacts', 'value': '.model-ignore, score.py, model.joblib, runtime.yaml', 'description': 'The list of files located in artifacts folder.', 'category': 'Training Environment', 'has_artifact': False}, {'key': 'ClientLibrary', 'value': 'ADS', 'description': None, 'category': 'Other', 'has_artifact': False}, {'key': 'EnvironmentType', 'value': 'data_science', 'description': 'The conda environment type, can be published or datascience.', 'category': 'Training Environment', 'has_artifact': False}, {'key': 'CondaEnvironmentPath', 'value': 'oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General_Machine_Learning_for_CPUs_on_Python_3.11/1.0/generalml_p311_cpu_x86_64_v1', 'description': 'The URI of the training conda environment.', 'category': 'Training Environment', 'has_artifact': False}, {'key': 'ModelFileName', 'value': 'model.joblib', 'description': 'The model file name.', 'category': 'Other', 'has_artifact': False}]}"
metadata_taxonomy,"{'data': [{'key': 'Framework', 'value': 'scikit-learn', 'has_artifact': False}, {'key': 'FrameworkVersion', 'value': '1.5.2', 'has_artifact': False}, {'key': 'Hyperparameters', 'value': {'bootstrap': 'True', 'ccp_alpha': '0.0', 'class_weight': 'None', 'criterion': 'gini', 'max_depth': 'None', 'max_features': 'sqrt', 'max_leaf_nodes': 'None', 'max_samples': 'None', 'min_impurity_decrease': '0.0', 'min_samples_leaf': '1', 'min_samples_split': '2', 'min_weight_fraction_leaf': '0.0', 'monotonic_cst': 'None', 'n_estimators': '100', 'n_jobs': 'None', 'oob_score': 'False', 'random_state': 'None', 'verbose': '0', 'warm_start': 'False'}, 'has_artifact': False}, {'key': 'Algorithm', 'value': 'RandomForestClassifier', 'has_artifact': False}, {'key': 'ArtifactTestResults', 'value': {'score_py': {'key': 'score_py', 'category': 'Mandatory Files Check', 'description': 'Check that the file ""score.py"" exists and is in the top level directory of the artifact directory', 'error_msg': ""The file 'score.py' is missing."", 'success': True}, 'runtime_yaml': {'category': 'Mandatory Files Check', 'description': 'Check that the file ""runtime.yaml"" exists and is in the top level directory of the artifact directory', 'error_msg': ""The file 'runtime.yaml' is missing."", 'success': True}, 'score_syntax': {'category': 'score.py', 'description': 'Check for Python syntax errors', 'error_msg': 'There is Syntax error in score.py: ', 'success': True}, 'score_load_model': {'category': 'score.py', 'description': 'Check that load_model() is defined', 'error_msg': 'Function load_model is not present in score.py.', 'success': True}, 'score_predict': {'category': 'score.py', 'description': 'Check that predict() is defined', 'error_msg': 'Function predict is not present in score.py.', 'success': True}, 'score_predict_data': {'category': 'score.py', 'description': 'Check that the only required argument for predict() is named ""data""', 'error_msg': ""The predict function in score.py must have a formal argument named 'data'."", 'success': True}, 'score_predict_arg': {'category': 'score.py', 'description': 'Check that all other arguments in predict() are optional and have default values', 'error_msg': ""All formal arguments in the predict function must have default values, except that 'data' argument."", 'success': True}, 'runtime_version': {'category': 'runtime.yaml', 'description': 'Check that field MODEL_ARTIFACT_VERSION is set to 3.0', 'error_msg': 'In runtime.yaml, the key MODEL_ARTIFACT_VERSION must be set to 3.0.', 'success': True}, 'runtime_env_python': {'category': 'conda_env', 'description': 'Check that field MODEL_DEPLOYMENT.INFERENCE_PYTHON_VERSION is set to a value of 3.6 or higher', 'error_msg': 'In runtime.yaml, the key MODEL_DEPLOYMENT.INFERENCE_PYTHON_VERSION must be set to a value of 3.6 or higher.', 'success': True, 'value': '3.11'}, 'runtime_env_path': {'category': 'conda_env', 'description': 'Check that field MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is set', 'error_msg': 'In runtime.yaml, the key MODEL_DEPLOYMENT.INFERENCE_ENV_PATH must have a value.', 'success': True, 'value': 'oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General_Machine_Learning_for_CPUs_on_Python_3.11/1.0/generalml_p311_cpu_x86_64_v1'}, 'runtime_path_exist': {'category': 'conda_env', 'description': 'Check that the file path in MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is correct.', 'error_msg': 'In runtime.yaml, the key MODEL_DEPLOYMENT.INFERENCE_ENV_PATH does not exist.', 'success': True}}, 'has_artifact': False}, {'key': 'UseCaseType', 'value': None, 'has_artifact': False}]}"


## Loading the model and then updating

In [86]:
model = SklearnModel.from_model_catalog(
    model_id=model_id, 
    artifact_dir= tempfile.mkdtemp(), 
    ignore_conda_error=True,
    force_overwrite=True
)

model.update(
    description="predicting high adult income (>50k)",
    freeform_tags={"project": "IncomePrediction"}
)

loop1:   0%|          | 0/4 [00:00<?, ?it/s]



algorithm: NoneType
artifact_dir:
  /tmp/tmpnuwjxnv4:
  - - output_schema.json
    - score.py
    - test_json_output.json
    - runtime.yaml
    - model.joblib
    - .model-ignore
    - input_schema.json
framework: scikit-learn
model_deployment_id: null
model_id: ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaaeicj2tiakvfv3ews3o6hpwfckkjtohrleljqaciphkuxa3tbc6qq