# End-to-End: Serve scikit-learn model with OpenVINO on Red Hat OpenShift AI

This notebook performs the full workflow:
1. Upload an existing **`retail_sales_model.joblib`** to **MinIO**
2. Convert it to **ONNX** (with `skl2onnx`)
3. Upload the ONNX using an **OVMS-compatible versioned layout**
4. Deploy an **OpenVINO Model Server (OVMS)** InferenceService on **OpenShift AI (KServe)**
5. Wait for readiness and **run a test inference** via the **V2 Inference Protocol**

✅ *This notebook does **not** set a namespace; it uses the current OpenShift project context.*

----
### Prerequisites
- Access to a running **MinIO** endpoint reachable from this environment
- Your trained scikit-learn pipeline saved as `retail_sales_model.joblib`
- OpenShift cluster with **Red Hat OpenShift AI** and **KServe** installed
- Permissions to create Secrets/ServiceAccounts/InferenceServices in your current project


In [None]:
# Install dependencies if missing
try:
    import minio  # noqa: F401
except Exception:
    %pip install --quiet minio
try:
    import joblib  # noqa: F401
except Exception:
    %pip install --quiet joblib
try:
    import skl2onnx  # noqa: F401
except Exception:
    %pip install --quiet skl2onnx onnx
try:
    import sklearn  # noqa: F401
except Exception:
    %pip install --quiet scikit-learn
try:
    import kubernetes  # noqa: F401
except Exception:
    %pip install --quiet kubernetes
try:
    import yaml  # noqa: F401
except Exception:
    %pip install --quiet pyyaml
try:
    import requests  # noqa: F401
except Exception:
    %pip install --quiet requests

print('Dependencies ready.')

## Configure connection and paths
Set these environment variables or edit below. `MINIO_SECURE=false` is fine for a quick start; use `true` + TLS in production.

In [None]:
import os
from dataclasses import dataclass

@dataclass
class Config:
    model_local_path: str
    model_name: str
    minio_endpoint: str
    minio_access_key: str
    minio_secret_key: str
    minio_bucket: str
    minio_secure: bool
    s3_prefix: str  # e.g., 'retail'

cfg = Config(
    model_local_path=os.getenv('MODEL_LOCAL_PATH', '/mnt/data/retail_sales_model.joblib'),
    model_name=os.getenv('MODEL_NAME', 'retail-sales'),
    minio_endpoint=os.getenv('MINIO_ENDPOINT'),
    minio_access_key=os.getenv('MINIO_ACCESS_KEY'),
    minio_secret_key=os.getenv('MINIO_SECRET_KEY'),
    minio_bucket=os.getenv('MINIO_BUCKET', 'models'),
    minio_secure=os.getenv('MINIO_SECURE', 'false').lower() == 'true',
    s3_prefix=os.getenv('S3_PREFIX', 'retail')
)
cfg

## Upload the original `.joblib` to MinIO

In [None]:
from pathlib import Path
from minio import Minio

model_path = Path(cfg.model_local_path)
assert model_path.exists(), f"Model file not found at {model_path}. Place retail_sales_model.joblib there or set MODEL_LOCAL_PATH."

s3 = Minio(cfg.minio_endpoint, access_key=cfg.minio_access_key, secret_key=cfg.minio_secret_key, secure=cfg.minio_secure)
if not s3.bucket_exists(cfg.minio_bucket):
    s3.make_bucket(cfg.minio_bucket)
    print(f"Created bucket '{cfg.minio_bucket}'")
else:
    print(f"Bucket '{cfg.minio_bucket}' exists")

joblib_key = f"{cfg.s3_prefix}/{model_path.name}"
s3.fput_object(cfg.minio_bucket, joblib_key, str(model_path), content_type='application/octet-stream')
print('Uploaded joblib to s3://%s/%s' % (cfg.minio_bucket, joblib_key))

## Convert scikit-learn pipeline to ONNX
This assumes your `joblib` contains either the pipeline directly or a dict with a `model` key.

In [None]:
import joblib, numpy as np
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType, Int64TensorType

loaded = joblib.load(model_path)
pipeline = loaded.get('model') if isinstance(loaded, dict) and 'model' in loaded else loaded
print(type(pipeline))

# Infer input feature width if available
n_features = getattr(pipeline, 'n_features_in_', None)
if n_features is None:
    # Fallback: let the user set FEATURE_COUNT env or default to 10
    n_features = int(os.getenv('FEATURE_COUNT', '10'))
print('Using feature count:', n_features)

initial_types = [("input", FloatTensorType([None, n_features]))]
onnx_model = convert_sklearn(pipeline, initial_types=initial_types)

onnx_dir = Path('ovms_model')/ '1'
onnx_dir.mkdir(parents=True, exist_ok=True)
onnx_path = onnx_dir / 'model.onnx'
with open(onnx_path, 'wb') as f:
    f.write(onnx_model.SerializeToString())
print('Saved ONNX at', onnx_path)

## Upload ONNX to MinIO (OVMS layout)
OVMS expects a versioned directory structure: `<model_root>/<version>/model.onnx`.

In [None]:
openvino_root_key = f"{cfg.s3_prefix}/openvino"  # e.g., retail/openvino
target_key = f"{openvino_root_key}/1/model.onnx"
s3.fput_object(cfg.minio_bucket, target_key, str(onnx_path), content_type='application/octet-stream')
print('Uploaded ONNX to s3://%s/%s' % (cfg.minio_bucket, target_key))

## Create Secret, ServiceAccount, and OVMS InferenceService (KServe)
This uses the Kubernetes Python client. It assumes your Workbench is running in-cluster with a valid service account (or `oc login`).

**Note:** No namespace is set explicitly; resources are created in the current project (namespace) configured for this environment.

In [None]:
import base64, time, yaml
from kubernetes import client, config

def b64(s: str):
    return base64.b64encode(s.encode()).decode()

# Attempt to load in-cluster or local kubeconfig
try:
    config.load_incluster_config()
    print('Loaded in-cluster kube config')
except Exception:
    config.load_kube_config()
    print('Loaded local kube config')

core = client.CoreV1Api()
custom = client.CustomObjectsApi()

# Determine current namespace (try serviceaccount namespace file; else default)
ns = 'default'
try:
    with open('/var/run/secrets/kubernetes.io/serviceaccount/namespace') as f:
        ns = f.read().strip()
except Exception:
    pass
print('Using namespace:', ns)

# 1) Secret with S3 creds + endpoint URL used by KServe storage initializer
secret_name = 's3-credentials'
secret_body = client.V1Secret(
    metadata=client.V1ObjectMeta(name=secret_name),
    type='Opaque',
    string_data={
        'AWS_ACCESS_KEY_ID': cfg.minio_access_key,
        'AWS_SECRET_ACCESS_KEY': cfg.minio_secret_key,
        'AWS_ENDPOINT_URL': ('https://' if cfg.minio_secure else 'http://') + cfg.minio_endpoint
    }
)
try:
    core.create_namespaced_secret(ns, secret_body)
    print('Created Secret', secret_name)
except client.exceptions.ApiException as e:
    if e.status == 409:
        core.patch_namespaced_secret(secret_name, ns, secret_body)
        print('Patched existing Secret', secret_name)
    else:
        raise

# 2) ServiceAccount that mounts the secret
sa_name = 'minio-s3-sa'
sa_body = client.V1ServiceAccount(
    metadata=client.V1ObjectMeta(name=sa_name),
    secrets=[client.V1ObjectReference(name=secret_name)]
)
try:
    core.create_namespaced_service_account(ns, sa_body)
    print('Created ServiceAccount', sa_name)
except client.exceptions.ApiException as e:
    if e.status == 409:
        core.patch_namespaced_service_account(sa_name, ns, sa_body)
        print('Patched existing ServiceAccount', sa_name)
    else:
        raise

# 3) InferenceService for OVMS
model_root = f"s3://{cfg.minio_bucket}/{openvino_root_key}"
isvc_name = f"{cfg.model_name}-ovms"
isvc_spec = {
  'apiVersion': 'serving.kserve.io/v1beta1',
  'kind': 'InferenceService',
  'metadata': {'name': isvc_name},
  'spec': {
    'predictor': {
      'model': {
        'runtime': 'ovms',
        'protocolVersion': 'v2',
        'storageUri': model_root
      },
      'serviceAccountName': sa_name
    }
  }
}

group = 'serving.kserve.io'
version = 'v1beta1'
plural = 'inferenceservices'

try:
    custom.create_namespaced_custom_object(group, version, ns, plural, isvc_spec)
    print('Created InferenceService', isvc_name)
except client.exceptions.ApiException as e:
    if e.status == 409:
        custom.patch_namespaced_custom_object(group, version, ns, plural, isvc_name, isvc_spec)
        print('Patched existing InferenceService', isvc_name)
    else:
        raise

# Wait for Ready and fetch URL
def get_isvc_status():
    obj = custom.get_namespaced_custom_object(group, version, ns, plural, isvc_name)
    return obj.get('status', {})

print('Waiting for InferenceService to be Ready...')
url = None
for _ in range(60):  # ~5-10 minutes max depending on cluster
    st = get_isvc_status()
    conditions = st.get('conditions', [])
    if any(c.get('type') == 'Ready' and c.get('status') == 'True' for c in conditions):
        url = st.get('url')
        break
    time.sleep(10)

print('InferenceService URL:', url)

## Test inference (V2 protocol)
This probes the model metadata to discover input names and sends a dummy request with zeros.

> Adjust `payload` to your real feature vector and dtypes for production.

In [None]:
import requests, json, numpy as np

assert url, 'InferenceService URL not found. Check the events/logs of the service.'
model_endpoint = url.rstrip('/')
name = cfg.model_name + '-ovms'

# Try to get metadata
meta = requests.get(f"{model_endpoint}/v2/models/{name}", timeout=30)
if meta.status_code != 200:
    # Some runtimes expose metadata at /versions/1 or use model name without suffix
    alt_name = cfg.model_name
    meta = requests.get(f"{model_endpoint}/v2/models/{alt_name}", timeout=30)
    name = alt_name if meta.status_code == 200 else name

print('Metadata status:', meta.status_code)
if meta.ok:
    print('Metadata:', meta.json())

# Determine input signature
input_name = 'input'
datatype = 'FP32'
shape = [1, n_features]
try:
    j = meta.json()
    if 'inputs' in j and j['inputs']:
        input_name = j['inputs'][0].get('name', input_name)
        datatype = j['inputs'][0].get('datatype', datatype)
        shape = j['inputs'][0].get('shape', shape)
except Exception:
    pass

dummy = [0.0] * (shape[-1] if isinstance(shape, list) and len(shape) > 0 else n_features)
payload = {
  'inputs': [{
      'name': input_name,
      'shape': shape,
      'datatype': datatype,
      'data': dummy
  }]
}

infer_url = f"{model_endpoint}/v2/models/{name}/infer"
resp = requests.post(infer_url, json=payload, timeout=60)
print('Infer status:', resp.status_code)
print('Response:', resp.text[:1000])