# MinIO Object Storage Example

This notebook demonstrates how to use MinIO object storage for ML workflows.

**Prerequisites**: MinIO service running at http://sites/minio

## What You'll Learn:
- Connect to MinIO from Jupyter
- Upload datasets and models
- Download and load data
- Manage experiment artifacts
- Integrate with ML workflows

## 1. Setup and Connection

First, let's import required libraries and connect to MinIO.

In [None]:
import os
import json
import pandas as pd
import numpy as np
from datetime import datetime
from pathlib import Path

# MinIO and S3 clients
from minio import Minio
from minio.error import S3Error
import boto3
from botocore.exceptions import ClientError

# Load environment variables
from dotenv import load_dotenv
load_dotenv('../.env')

print('📦 Libraries imported successfully!')

In [None]:
# MinIO connection configuration
MINIO_ENDPOINT = os.getenv('MINIO_ENDPOINT', 'sites:80/minio-api')
MINIO_ACCESS_KEY = os.getenv('MINIO_ACCESS_KEY', 'minioadmin')
MINIO_SECRET_KEY = os.getenv('MINIO_SECRET_KEY', 'minioadmin123')
MINIO_SECURE = os.getenv('MINIO_SECURE', 'false').lower() == 'true'

print(f'🔗 Connecting to MinIO at: {MINIO_ENDPOINT}')
print(f'🔒 Secure connection: {MINIO_SECURE}')

In [None]:
# Create MinIO client
minio_client = Minio(
    endpoint=MINIO_ENDPOINT,
    access_key=MINIO_ACCESS_KEY,
    secret_key=MINIO_SECRET_KEY,
    secure=MINIO_SECURE
)

# Create boto3 client (S3-compatible)
s3_client = boto3.client(
    's3',
    endpoint_url=f'http://{MINIO_ENDPOINT}',
    aws_access_key_id=MINIO_ACCESS_KEY,
    aws_secret_access_key=MINIO_SECRET_KEY
)

print('✅ MinIO clients created successfully!')

## 2. List Available Buckets

Let's see what buckets are available in our MinIO instance.

In [None]:
# List all buckets
try:
    buckets = minio_client.list_buckets()
    print('📁 Available buckets:')
    for bucket in buckets:
        print(f'  • {bucket.name} (created: {bucket.creation_date})')
except S3Error as e:
    print(f'❌ Error listing buckets: {e}')

## 3. Create Sample Dataset

Let's create a sample dataset and upload it to MinIO.

In [None]:
# Create sample dataset
np.random.seed(42)
sample_data = {
    'feature_1': np.random.randn(1000),
    'feature_2': np.random.randn(1000),
    'feature_3': np.random.randn(1000),
    'target': np.random.choice([0, 1], 1000)
}

df = pd.DataFrame(sample_data)
print('📊 Sample dataset created:')
print(f'Shape: {df.shape}')
print(f'Columns: {list(df.columns)}')
df.head()

In [None]:
# Save dataset locally first
dataset_file = '../data/sample_dataset.csv'
df.to_csv(dataset_file, index=False)
print(f'💾 Dataset saved locally: {dataset_file}')

# Get file size
file_size = os.path.getsize(dataset_file)
print(f'📏 File size: {file_size / 1024:.2f} KB')

## 4. Upload Dataset to MinIO

Now let's upload our dataset to the 'datasets' bucket.

In [None]:
# Upload dataset to MinIO
bucket_name = 'datasets'
object_name = 'sample_dataset.csv'

try:
    # Upload file
    minio_client.fput_object(
        bucket_name=bucket_name,
        object_name=object_name,
        file_path=dataset_file
    )
    print(f'✅ Dataset uploaded successfully to {bucket_name}/{object_name}')
except S3Error as e:
    print(f'❌ Error uploading dataset: {e}')

## 5. List Objects in Bucket

Let's see what's in our datasets bucket now.

In [None]:
# List objects in datasets bucket
try:
    objects = minio_client.list_objects(bucket_name, recursive=True)
    print(f'📋 Objects in {bucket_name} bucket:')
    
    total_size = 0
    for obj in objects:
        size_kb = obj.size / 1024
        total_size += obj.size
        print(f'  • {obj.object_name} ({size_kb:.2f} KB, {obj.last_modified})')
    
    print(f'
📊 Total size: {total_size / 1024:.2f} KB')
except S3Error as e:
    print(f'❌ Error listing objects: {e}')

## 6. Download and Load Dataset

Let's download the dataset from MinIO and load it back into a DataFrame.

In [None]:
# Download dataset from MinIO
downloaded_file = '../data/downloaded_dataset.csv'

try:
    minio_client.fget_object(
        bucket_name=bucket_name,
        object_name=object_name,
        file_path=downloaded_file
    )
    print(f'✅ Dataset downloaded to: {downloaded_file}')
    
    # Load downloaded dataset
    df_downloaded = pd.read_csv(downloaded_file)
    print(f'📊 Downloaded dataset shape: {df_downloaded.shape}')
    
    # Verify data integrity
    if df.equals(df_downloaded):
        print('✅ Data integrity verified - datasets match!')
    else:
        print('⚠️ Data integrity check failed - datasets differ')
        
except S3Error as e:
    print(f'❌ Error downloading dataset: {e}')

## 7. Train a Simple Model

Let's train a simple model and save it to MinIO.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import joblib

# Prepare data
X = df[['feature_1', 'feature_2', 'feature_3']]
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f'🎯 Training set size: {X_train.shape[0]}')
print(f'🎯 Test set size: {X_test.shape[0]}')

In [None]:
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'🎯 Model accuracy: {accuracy:.4f}')
print('
📊 Classification Report:')
print(classification_report(y_test, y_pred))

## 8. Save Model to MinIO

Now let's save our trained model to the 'models' bucket.

In [None]:
# Save model locally first
model_file = '../models/random_forest_model.pkl'
os.makedirs('../models', exist_ok=True)

joblib.dump(model, model_file)
print(f'💾 Model saved locally: {model_file}')

# Upload model to MinIO
model_bucket = 'models'
model_object = f'random_forest_model_{datetime.now().strftime("%Y%m%d_%H%M%S")}.pkl'

try:
    minio_client.fput_object(
        bucket_name=model_bucket,
        object_name=model_object,
        file_path=model_file
    )
    print(f'✅ Model uploaded to {model_bucket}/{model_object}')
except S3Error as e:
    print(f'❌ Error uploading model: {e}')

## 9. Save Experiment Results

Let's save experiment metadata to the 'experiments' bucket.

In [None]:
# Create experiment results
experiment_results = {
    'experiment_id': f'exp_{datetime.now().strftime("%Y%m%d_%H%M%S")}',
    'timestamp': datetime.now().isoformat(),
    'model_type': 'RandomForestClassifier',
    'parameters': {
        'n_estimators': 100,
        'random_state': 42
    },
    'metrics': {
        'accuracy': float(accuracy),
        'train_size': len(X_train),
        'test_size': len(X_test)
    },
    'dataset': 'sample_dataset.csv',
    'model_file': model_object
}

print('📋 Experiment results:')
print(json.dumps(experiment_results, indent=2))

In [None]:
# Save experiment results
results_file = '../data/experiment_results.json'

with open(results_file, 'w') as f:
    json.dump(experiment_results, f, indent=2)

# Upload to MinIO
experiments_bucket = 'experiments'
results_object = f"experiment_{experiment_results['experiment_id']}.json"

try:
    minio_client.fput_object(
        bucket_name=experiments_bucket,
        object_name=results_object,
        file_path=results_file
    )
    print(f'✅ Experiment results uploaded to {experiments_bucket}/{results_object}')
except S3Error as e:
    print(f'❌ Error uploading results: {e}')

## 10. Load Model from MinIO

Finally, let's demonstrate loading a model back from MinIO.

In [None]:
# List available models
print('🤖 Available models in MinIO:')

try:
    model_objects = minio_client.list_objects(model_bucket, recursive=True)
    model_list = []
    
    for obj in model_objects:
        model_list.append(obj.object_name)
        size_kb = obj.size / 1024
        print(f'  • {obj.object_name} ({size_kb:.2f} KB)')
        
except S3Error as e:
    print(f'❌ Error listing models: {e}')
    model_list = []

In [None]:
# Download and load the latest model
if model_list:
    latest_model = sorted(model_list)[-1]  # Get latest by name
    downloaded_model_file = '../models/downloaded_model.pkl'
    
    try:
        minio_client.fget_object(
            bucket_name=model_bucket,
            object_name=latest_model,
            file_path=downloaded_model_file
        )
        
        # Load the model
        loaded_model = joblib.load(downloaded_model_file)
        
        print(f'✅ Model loaded from MinIO: {latest_model}')
        
        # Test the loaded model
        test_predictions = loaded_model.predict(X_test[:5])
        print(f'🧪 Test predictions: {test_predictions}')
        
        # Verify model equivalence
        original_pred = model.predict(X_test[:5])
        if np.array_equal(test_predictions, original_pred):
            print('✅ Model integrity verified - predictions match!')
        else:
            print('⚠️ Model integrity check failed')
            
    except S3Error as e:
        print(f'❌ Error loading model: {e}')
else:
    print('❌ No models found in MinIO')

## 11. Generate Presigned URLs

Create shareable URLs for accessing objects without credentials.

In [None]:
from datetime import timedelta

# Generate presigned URL for dataset (valid for 1 hour)
try:
    dataset_url = minio_client.presigned_get_object(
        bucket_name=bucket_name,
        object_name=object_name,
        expires=timedelta(hours=1)
    )
    
    print('🔗 Presigned URLs (valid for 1 hour):')
    print(f'📊 Dataset: {dataset_url[:100]}...')
    
    if model_list:
        model_url = minio_client.presigned_get_object(
            bucket_name=model_bucket,
            object_name=latest_model,
            expires=timedelta(hours=1)
        )
        print(f'🤖 Model: {model_url[:100]}...')
        
except S3Error as e:
    print(f'❌ Error generating presigned URLs: {e}')

## 🎉 Summary

In this notebook, we demonstrated:

✅ **Connected** to MinIO object storage
✅ **Uploaded** datasets and models
✅ **Downloaded** and verified data integrity
✅ **Trained** and stored ML models
✅ **Saved** experiment metadata
✅ **Generated** presigned URLs for sharing

### Next Steps:
- Integrate MinIO with your ML pipeline
- Set up automated model versioning
- Configure bucket policies for team access
- Implement data lifecycle management

### MinIO Console:
Visit http://sites/minio to explore your data visually!