## Export, Save and Register the Model

# 🫡 Save the Model

We need to save this model so that we can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage.

>NOTE: Don't run all the cells all-in-one shot without changing the cluster specific variables.

In [None]:
!pip -q install boto3 botocore model-registry=="0.2.9"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


‼️⚠️ ¡IMPORTANT! ⚠️‼️

Add your user name, we need them for the Model Registry and Save Model.

In [None]:
## Change this before continue
user = "USER_REPLACE_ME"

model_path = f"{user}-yolo-rps"
print(model_path)

In [None]:
import os

HOME = os.getcwd()
print(f"Current Working Directory: {HOME}")
destination_dir = os.path.join(HOME, "models")

# Define models directory
models_dir = os.path.join(destination_dir, "yolo", "1")

# Define full model paths
YOLO_Rock_Paper = os.path.join(models_dir, f"yolo-rps.pt")
YOLO_Rock_Paper_ONNX = os.path.join(models_dir, f"yolo-rps.onnx")

# Print paths for verification
print(f"YOLO model path: {YOLO_Rock_Paper}")
print(f"YOLO ONNX model path: {YOLO_Rock_Paper_ONNX}")

In [None]:
import os
import boto3
import botocore

# Fetch AWS credentials from environment variables
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
region_name = os.environ.get('AWS_DEFAULT_REGION')
bucket_name = os.environ.get('AWS_S3_BUCKET')

# Validate that all required AWS variables are set
if not all([aws_access_key_id, aws_secret_access_key, endpoint_url, region_name, bucket_name]):
    raise ValueError("One or more AWS connection variables are missing. "
                     "Please check your S3 bucket connection.")

# Create an S3 session
session = boto3.session.Session(
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)

# Initialize S3 resource
s3_resource = session.resource(
    's3',
    config=botocore.client.Config(signature_version='s3v4'),
    endpoint_url=endpoint_url,
    region_name=region_name
)

# Select the bucket
bucket = s3_resource.Bucket(bucket_name)


def upload_directory_to_s3(local_directory, s3_prefix):
    """
    Uploads only .onnx files from `models/yolo/` and keeps the correct relative path.

    :param local_directory: The local directory containing models (models/yolo).
    :param s3_prefix: The destination prefix in S3 (models/yolo).
    :return: The number of .onnx files uploaded.
    """
    num_files = 0
    local_directory = os.path.abspath(local_directory)  # Get absolute path

    # Ensure the local directory exists
    if not os.path.isdir(local_directory):
        raise ValueError(f"Local directory '{local_directory}' does not exist.")

    # Walk through directory and upload only .onnx files
    for root, _, files in os.walk(local_directory):
        for filename in files:
            if filename.endswith(".onnx"):  # Only upload .onnx files
                file_path = os.path.join(root, filename)
                
                # Preserve only the relative path from models/yolo/
                relative_path = os.path.relpath(file_path, local_directory)
                
                # Ensure the correct relative S3 path (models/yolo/best2epochs.onnx)
                s3_key = os.path.join(s3_prefix, relative_path).replace(os.path.sep, '/')
                
                print(f"Uploading {file_path} -> {s3_key}")
                bucket.upload_file(file_path, s3_key)
                num_files += 1

    return num_files


def list_objects(prefix):
    """
    Lists objects stored in the S3 bucket under the given prefix.

    :param prefix: The S3 prefix to list objects from.
    """
    objects = bucket.objects.filter(Prefix=prefix)
    for obj in objects.all():
        print(obj.key)


In [None]:
# List objects in the "models" directory on S3
list_objects("models")

# Define the local models directory correctly
local_models_directory = os.path.join(destination_dir, "yolo", "1")

# Ensure the local directory exists before proceeding
if not os.path.isdir(local_models_directory):
    raise ValueError(f"The directory '{local_models_directory}' does not exist. "
                     "Did you finish training the model in the previous notebook?")

# Upload only ONNX files from local 'models/yolo' to S3 'models/yolo'
num_files = upload_directory_to_s3(local_models_directory, f"models/{model_path_s3}-v1/1")

# Validate that files were uploaded
if num_files == 0:
    raise ValueError("No files uploaded. Did you finish training and "
                     "saving the model to the 'models' directory? "
                     "Check for 'models/userX-yolo-rps/yolo-rps.onnx'.")

print(f"Successfully uploaded {num_files} ONNX files to S3.")


# 🤩 Kubeflow Registry

We need a metadata registry for storing information such as version, author, and model location of the models we are building.

We are using Kubeflow model registry as a canonical data source by storing such information.

Here are some reasons to use a registry (_from Kubeflow website_):

- Track models available on storage: once the model is stored, it can then be tracked in the Kubeflow Model Registry for managing its lifecycle. The Model Registry can catalog, list, index, share, record, organize this information. This allows the Data Scientist to compare different versions and revert to previous versions if needed.

- Track and compare performance: View key metrics like accuracy, recall, and precision for each model version. This helps identify the best-performing model for deployment.

- Create lineage: Capture the relationships between data, code, and models. This enables the Data Scientist to understand the origin of each model and reproduce specific experiments.

- Collaborate: Share models and experiment details with the MLOps Engineer for deployment preparation. This ensures a seamless transition from training to production.

An instance of the registry is available in your dev environment as well. 


In [None]:
from model_registry import ModelRegistry
from model_registry.exceptions import StoreError

‼️⚠️ ¡IMPORTANT! ⚠️‼️

Add your user name and cluster domain (apps.xxx) that are shared with you before 

we need them for the model registry URL.

In [None]:
USERNAME = "xxx"
CLUSTER_DOMAIN = "xxxx"

# Model Registry Configuration
MODEL_REGISTRY_URL = f"https://registry-rest.{CLUSTER_DOMAIN}"
AUTHOR_NAME = USERNAME

# Initialize Model Registry Connection
registry = ModelRegistry(
    server_address=MODEL_REGISTRY_URL,
    port=443,
    author=AUTHOR_NAME,
    is_secure=False
)

In [None]:
# Model Registration Details
registered_model_name = model_path_s3
version = "v2"  # Ensure lowercase `version` for consistency

# Ensure correct S3 endpoint format
s3_endpoint_url = endpoint_url.replace("https://", "")

# Define the model path in S3 (keeping it consistent with the uploaded structure)
model_path = f"models/{model_path_s3}-v1/1"  # Ensure model_path is defined

# Full S3 URI for the model
s3_model_uri = f"s3://{s3_endpoint_url}/{model_path}/yolo-rps.onnx"

# Register model (if not already registered)
try:
    rm = registry.register_model(
        registered_model_name,
        s3_model_uri,  # Now correctly references `s3_model_uri`
        model_format_name="onnx",
        model_format_version="1",
        version=version,
        description="Yolo-v11 trained with Rock Paper Scissors dataset",
        metadata={
            "license": "apache-2.0"
        }
    )
    print(f"Model and version registered successfully as:\n{rm}")

except StoreError:
    rmver = registry.get_model_version(registered_model_name, version)
    print(f"Model and version already exists:\n{rmver}")