# Model Serialization and Persistence

This notebook demonstrates how to save and load cuML models using various serialization methods, including pickle, joblib, and cross-platform deployment strategies.

## Single GPU Model Serialization

All single-GPU cuML estimators support serialization using standard Python libraries. This section demonstrates:

1. **Training a model** on synthetic data
2. **Saving the model** using pickle and joblib
3. **Loading the model** for future use

Trained single-GPU models can also be used for distributed inference on Dask clusters, as shown in the [Distributed Model Serialization](#distributed-model-serialization) section.

In [None]:
from cuml.cluster import KMeans
from cuml.datasets import make_blobs

# Generate synthetic dataset for clustering
X, y = make_blobs(
    n_samples=50, n_features=10, centers=5, cluster_std=0.4, random_state=0
)
# Initialize and fit KMeans model
kmeans = KMeans(n_clusters=5).fit(X)

**Recommendation:** Use Pickle protocol 5 for better performance with large arrays and models. Protocol 5 provides significant speed improvements for NumPy arrays and cuML models with large parameter sets.

In [None]:
import pickle

# Save the fitted model to disk
with open("kmeans_model.pkl", "wb") as output_file:
    pickle.dump(kmeans, output_file, protocol=5)

**Important:** The model can be restored using pickle, but requires the same cuML version used for training. If you need to load models across different cuML versions, consider using the [scikit-learn conversion](#converting-between-cuml-and-scikit-learn-models) approach instead.

In [None]:
# Load the model from disk
with open("kmeans_model.pkl", "rb") as input_file:
    kmeans_loaded_model = pickle.load(input_file)

# Display the loaded model's cluster centers
kmeans_loaded_model.cluster_centers_

## Using joblib for Model Serialization

joblib is an optimized alternative to pickle for machine learning models, offering:

- **Better performance** for large NumPy arrays and cuML models
- **Efficient compression** for models with many parameters
- **Memory mapping** for faster loading of large models
- **Optimized serialization** specifically designed for ML workloads

**Note:** While pickle and joblib files are often compatible, we recommend using the same library for both saving and loading to ensure reliability.

In [None]:
import joblib

joblib.dump(kmeans, "kmeans_model.joblib")

Then reload the model with joblib.

In [None]:
kmeans_loaded_model = joblib.load("kmeans_model.joblib")
kmeans_loaded_model.cluster_centers_

## Distributed Model Serialization

When working with distributed cuML models using Dask, the distributed estimator wrappers in `cuml.dask` are not designed to be pickled directly. Instead, cuML provides a specialized workflow:

### Workflow Steps

1. **Extract the combined model**: Use `get_combined_model()` to extract a single-GPU version of the trained distributed model
2. **Serialize the combined model**: Save the extracted model using pickle or joblib (same as any cuML model)
3. **Flexible inference**: Use the saved model in multiple ways:
   - **Single-GPU inference**: Load directly for single-GPU predictions
   - **Distributed inference**: Use `ParallelPostFit` from [Dask-ML](https://ml.dask.org/meta-estimators.html) to distribute inference across a Dask cluster

This approach allows you to choose the optimal resources for both training and inference phases.

In [None]:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

# Set up Dask cluster
cluster = LocalCUDACluster()
client = Client(cluster)

In [None]:
from cuml.dask.datasets import make_blobs
from cuml.dask.cluster import KMeans as DistributedKMeans

# Get number of workers
n_workers = client.scheduler_info()["n_workers"]

# Generate distributed dataset
X, y = make_blobs(
    n_samples=5000,
    n_features=30,
    centers=5,
    cluster_std=0.4,
    random_state=0,
    # 5 parts per worker to demonstrate distributed inference
    n_parts=n_workers * 5, 
)

# Initialize and train the distributed KMeans model
distributed_kmeans = DistributedKMeans(n_clusters=5).fit(X)

Now we can save it with pickle like before, but we have to _combine_ it into a non-distributed model first.

In [None]:
# Extract single-GPU model and save it
combined_kmeans = distributed_kmeans.get_combined_model()

with open("kmeans_model.pkl", "wb") as output_file:
    pickle.dump(combined_kmeans, output_file, protocol=5)

And we can reload this model just like before.

In [None]:
# Load the single-GPU model
with open("kmeans_model.pkl", "rb") as input_file:
    combined_kmeans_loaded_model = pickle.load(input_file)

# Display the first 3 rows of the loaded model's cluster centers
combined_kmeans_loaded_model.cluster_centers_[:3]

## Converting Between cuML and scikit-learn Models

Many cuML estimators provide `as_sklearn()` and `from_sklearn()` methods for seamless conversion between cuML and scikit-learn formats.

### Use Cases

- **Cross-platform deployment**: Train on GPU systems, deploy on CPU-only machines
- **Maximum compatibility**: Use standard scikit-learn serialization tools
- **Hybrid workflows**: Mix cuML and scikit-learn in the same pipeline
- **Legacy integration**: Convert existing scikit-learn models to cuML for GPU acceleration

This approach eliminates the need to install cuML on deployment machines while maintaining model compatibility.

In [None]:
import pickle

from cuml.cluster import KMeans
from cuml.datasets import make_blobs
from cuml.metrics.cluster import adjusted_rand_score

# Generate synthetic dataset for clustering
X, y = make_blobs(
    n_samples=1000, n_features=20, centers=5, cluster_std=0.5, random_state=42
)

# Train cuML KMeans
kmeans = KMeans(n_clusters=5, random_state=42).fit(X)

# Make predictions with cuML model
predictions = kmeans.predict(X)
score = adjusted_rand_score(y, predictions)
print(f"cuML KMeans ARI score: {score:.4f}")
print(f"cuML KMeans cluster centers shape: {kmeans.cluster_centers_.shape}")

We can convert this cuML model into a native scikit-learn estimator using the `as_sklearn()` method. This enables standard scikit-learn serialization and deployment on any Python environment.

In [None]:
# Convert cuML model to scikit-learn model
kmeans_sklearn = kmeans.as_sklearn()
print(f"Converted to scikit-learn model: {type(kmeans_sklearn)}")

# Save scikit-learn model to disk
pickle.dump(kmeans_sklearn, open("kmeans_model_sklearn.pkl", "wb"), protocol=5)
print("scikit-learn KMeans model saved with pickle")

The pickled scikit-learn model can be loaded and executed on any Python environment with only scikit-learn installed – no cuML or GPU required.

In [None]:
from cupy import asnumpy

# Load scikit-learn model and verify prediction quality
kmeans_loaded_sklearn = pickle.load(open("kmeans_model_sklearn.pkl", "rb"))
sklearn_predictions = kmeans_loaded_sklearn.predict(asnumpy(X))
sklearn_score = adjusted_rand_score(y, sklearn_predictions)
print(f"Loaded sklearn KMeans ARI score: {sklearn_score:.4f}")

You can also reconstruct a cuML model from a scikit-learn model using `from_sklearn()`. This is particularly useful for:

- **Pre-trained models**: Convert existing scikit-learn models for GPU acceleration
- **Performance optimization**: Run faster inference on GPU hardware
- **Hybrid workflows**: Switch between CPU and GPU execution as needed

In [None]:
# Re-construct the cuML model from the scikit-learn model
kmeans_from_sklearn = KMeans.from_sklearn(kmeans_loaded_sklearn)
predictions = kmeans_from_sklearn.predict(X)
print("Re-constructed cuML KMeans ARI Score: ", adjusted_rand_score(y, predictions))

## Exporting Random Forest Models for CPU-Only Deployment

You can export cuML Random Forest models for deployment on machines without NVIDIA GPUs using the [Treelite](https://github.com/dmlc/treelite) library.

### Benefits

- **CPU-only deployment**: Run trained models on any machine
- **Optimized inference**: Treelite provides highly optimized CPU inference
- **Small footprint**: No cuML or GPU dependencies required
- **Production ready**: Efficient serialization and fast loading

### Export Process

1. **Convert to Treelite format**: Use `as_treelite()` to transform your cuML Random Forest model
2. **Serialize the model**: Call `.serialize()` to create a portable checkpoint file
3. **Deploy anywhere**: Install Treelite on the target machine and load the model for inference

In [None]:
import numpy as np
from cuml.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load and prepare iris dataset
X, y = load_iris(return_X_y=True)
X, y = X.astype(np.float32), y.astype(np.int32)

# Train Random Forest model
random_forest = RandomForestClassifier(
    max_depth=3, random_state=0, n_estimators=10
).fit(X, y)

# Export cuML RF model as Treelite checkpoint
treelite_checkpoint_path = "./checkpoint.tl"
random_forest.as_treelite().serialize(treelite_checkpoint_path)

### Deployment Steps

1. **Copy the checkpoint file**: Transfer `checkpoint.tl` to your target machine
2. **Install Treelite**: Run `pip install treelite` or `conda install -c conda-forge treelite`
   - No NVIDIA GPUs required
   - No cuML installation needed
3. **Load and use the model**: Run the code below on the target machine

In [None]:
import treelite

# Load the Treelite model (checkpoint file has been copied over)
treelite_checkpoint_path = "./checkpoint.tl"
treelite_model = treelite.Model.deserialize(treelite_checkpoint_path)

# Make predictions using Treelite
predictions = treelite.gtil.predict(treelite_model, X, pred_margin=True)