# Train a scikit-learn model and upload it to MinIO

This notebook trains a simple model with **scikit-learn**, saves it to disk, and uploads the artifact to a **MinIO** bucket using the Python MinIO SDK.

### Prerequisites
- A running MinIO instance (endpoint + credentials)
- A bucket name to upload the model to
- Internet access to install packages if they are not already present in the environment

### How to use
1. Set the MinIO connection variables in the next cell (or rely on environment variables).
2. Run the cells in order.
3. Your trained model will be saved locally and uploaded to the MinIO bucket.


In [1]:
# If needed, install dependencies (uncomment if your image doesn't already have them)
try:
    import sklearn, joblib  # noqa: F401
except Exception:
    %pip install --quiet scikit-learn joblib

try:
    import minio  # noqa: F401
except Exception:
    %pip install --quiet minio

print('Dependencies ready.')

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.
Dependencies ready.



[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
import os
from dataclasses import dataclass

@dataclass
class MinIOConfig:
    endpoint: str
    access_key: str
    secret_key: str
    bucket: str
    secure: bool = False  # set True if your MinIO uses TLS

cfg = MinIOConfig(
    endpoint=os.getenv('MINIO_ENDPOINT'),  # e.g. 'minio.minio.svc:9000' or the Route host
    access_key=os.getenv('MINIO_ACCESS_KEY'),
    secret_key=os.getenv('MINIO_SECRET_KEY'),
    bucket=os.getenv('MINIO_BUCKET', 'artifacts'),
    secure=os.getenv('MINIO_SECURE', 'false').lower() == 'true'
)
cfg

In [None]:
#from datetime import datetime
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import joblib
from pathlib import Path

# Train a small model on the Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
acc = accuracy_score(y_test, preds)
print(f'Accuracy: {acc:.3f}')

# Save the model locally
artifact_dir = Path('artifacts')
artifact_dir.mkdir(exist_ok=True)
model_path = artifact_dir/'iris_logreg.joblib'
joblib.dump({'model': clf, 'accuracy': float(acc), 'timestamp': datetime.utcnow().isoformat() + 'Z'}, model_path)
print(f'Model saved to: {model_path}')

In [None]:
from minio import Minio

client = Minio(
    endpoint=cfg.endpoint,
    access_key=cfg.access_key,
    secret_key=cfg.secret_key,
    secure=cfg.secure,
)

# Ensure bucket exists
if not client.bucket_exists(cfg.bucket):
    client.make_bucket(cfg.bucket)
    print(f"Created bucket '{cfg.bucket}'")
else:
    print(f"Bucket '{cfg.bucket}' already exists")

# Upload the artifact
object_name = f"artifact_dir/{model_path.name}"
result = client.fput_object(
    bucket_name=cfg.bucket,
    object_name=object_name,
    file_path=str(model_path),
    content_type='application/octet-stream',
)
print('Uploaded:', result.object_name)

# List objects to confirm
print('Objects in bucket:')
for obj in client.list_objects(cfg.bucket, prefix='artifact_dir/', recursive=True):
    print('-', obj.object_name)

## Notes
- If you're running on OpenShift AI (Red Hat OpenShift AI Workbench), you can set environment variables in the Workbench settings or directly in a cell before initializing the client.
- For production, enable TLS on MinIO and set `cfg.secure=True`.
- If using an OpenShift Route for the S3 API, ensure your signature and TLS settings align with the router configuration (passthrough is usually safest for S3).
