# Pickling Models for Persistence

This notebook demonstrates simple pickling of both single-GPU and multi-GPU cuML models for persistence

## Single GPU Model Pickling

All single-GPU estimators are pickleable. The following example demonstrates the creation of a synthetic dataset, training, and pickling of the resulting model for storage. Trained single-GPU models can also be used to distribute the inference on a Dask cluster, which the [Distributed Model Pickling](#distributed-model-pickling) section below demonstrates.

In [None]:
from cuml.datasets import make_blobs

# Generate synthetic dataset for clustering
X, y = make_blobs(
    n_samples=50,
    n_features=10,
    centers=5,
    cluster_std=0.4,
    random_state=0
)

In [None]:
from cuml.cluster import KMeans as cuKMeans

# Initialize and fit KMeans model
kmeans = cuKMeans(n_clusters=5).fit(X)

We recommend to use Pickle protocol 5 as it is more efficient especially for large arrays (models).

In [None]:
import pickle

# Save the fitted model to disk
pickle.dump(kmeans, open("kmeans_model.pkl", "wb"), protocol=5)

In [None]:
# Load the model from disk
kmeans_loaded_model = pickle.load(open("kmeans_model.pkl", "rb"))

In [None]:
# Display the loaded model's cluster centers
kmeans_loaded_model.cluster_centers_

## Using joblib for Model Serialization

joblib is a popular alternative to pickle for serializing machine learning models, offering better performance and compression for large NumPy arrays. It's particularly well-suited for cuML models with many parameters or large datasets. joblib provides efficient memory mapping and faster serialization compared to pickle for ML workloads.


Note that pickle and joblib are often directly compatible, but we do not recommend to rely on that.

In [None]:
import joblib

kmeans_loaded_model = joblib.load("kmeans_model.pkl")
kmeans_loaded_model.cluster_centers_

## Distributed Model Pickling

When working with distributed cuML models using Dask, the distributed estimator wrappers in `cuml.dask` are not designed to be pickled directly. Instead, cuML provides a specialized workflow for persisting distributed models:

1. **Extract the combined model**: After training a distributed model, use the `get_combined_model()` method to extract a single-GPU version of the trained model
2. **Serialize the combined model**: The extracted single-GPU model can be pickled or saved using pickle or joblib, just like any other cuML model
3. **Flexible inference**: The saved model can be used for inference in multiple ways:
   - **Single-GPU inference**: Load the model directly for single-GPU predictions
   - **Distributed inference**: Use the `ParallelPostFit` wrapper from [Dask-ML](https://ml.dask.org/meta-estimators.html) to distribute inference across a Dask cluster

This approach allows you to choose the right amount of resources for both training and inference.

In [None]:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

# Set up Dask cluster
cluster = LocalCUDACluster()
client = Client(cluster)
client

In [None]:
from cuml.dask.datasets import make_blobs

# Get number of workers
n_workers = len(client.scheduler_info()["workers"].keys())

# Generate distributed dataset
X, y = make_blobs(
    n_samples=5000,
    n_features=30,
    centers=5,
    cluster_std=0.4,
    random_state=0,
    n_parts=n_workers * 5
)

# Persist data in memory
X = X.persist()
y = y.persist()

In [None]:
from cuml.dask.cluster import KMeans as cuDaskKMeans

# Initialize and train the distributed KMeans model
distributed_kmeans = cuDaskKMeans(n_clusters=5).fit(X)

In [None]:
# Extract single-GPU model and save it
combined_kmeans = distributed_kmeans.get_combined_model()
pickle.dump(combined_kmeans, open("kmeans_model.pkl", "wb"), protocol=5)

In [None]:
# Load the single-GPU model
combined_kmeans_loaded_model = pickle.load(open("kmeans_model.pkl", "rb"))

In [None]:
# Display the loaded model's cluster centers
combined_kmeans_loaded_model.cluster_centers_

## Converting Between cuML and scikit-learn Models

Many cuML estimators provide an `as_sklearn()` / `from_sklearn` methods that enable direct conversion to and from native scikit-learn estimators. This functionality is particularly valuable for serialization scenarios requiring maximum compatibility, as well as for hybrid deployment workflows where you can train models on GPU-accelerated systems and then run inference on CPU-only environments without needing to install cuML on the target machine.

In [None]:
from cuml.cluster import KMeans as cuKMeans
from cuml.datasets import make_blobs
from cuml.metrics.cluster import adjusted_rand_score
import pickle
import numpy as np

# Generate synthetic dataset for clustering
X, y = make_blobs(
    n_samples=1000,
    n_features=20,
    centers=5,
    cluster_std=0.5,
    random_state=42
)

# Train cuML KMeans
kmeans = cuKMeans(n_clusters=5, random_state=42).fit(X)

# Make predictions with cuML model
cu_predictions = kmeans.predict(X)
cu_ari_score = adjusted_rand_score(y, cu_predictions)
print(f"cuML KMeans ARI score: {cu_ari_score:.4f}")
print(f"cuML KMeans cluster centers shape: {kmeans.cluster_centers_.shape}")

In [None]:
# Convert cuML model to scikit-learn model
kmeans_sklearn = kmeans.as_sklearn()
print(f"Converted to scikit-learn model: {type(kmeans_sklearn)}")

# Save scikit-learn model to disk
pickle.dump(kmeans_sklearn, open("kmeans_model_sklearn.pkl", "wb"), protocol=5)
print("scikit-learn KMeans model saved with pickle")


In [None]:
from cupy import asnumpy

# Load scikit-learn model and verify prediction quality
kmeans_loaded_sklearn = pickle.load(open("kmeans_model_sklearn.pkl", "rb"))
sklearn_predictions = kmeans_loaded_sklearn.predict(asnumpy(X))
sklearn_ari_score = adjusted_rand_score(y, sklearn_predictions)
print(f"Loaded sklearn KMeans ARI score: {sklearn_ari_score:.4f}")


You can also construct a cuML model from a scikit-learn model. This is especially useful if you are working with a pre-trained model and want to run faster inference on GPUs.

In [None]:
# Re-construct the cuML model from the scikit-learn model
kmeans_from_sklearn = cuKMeans.from_sklearn(kmeans_loaded_sklearn)
cu_predictions = kmeans_from_sklearn.predict(X)
print("Re-constructed cuML KMeans ARI Score: ", adjusted_rand_score(y, cu_predictions))

## Exporting cuML Random Forest models for inferencing on machines without GPUs

You can export cuML Random Forest models and run predictions on machines without NVIDIA GPUs using the [Treelite](https://github.com/dmlc/treelite) library. This enables you to train on GPU-accelerated systems and deploy on CPU-only infrastructure.

### Export Process

1. **Convert to Treelite format**: Use `as_treelite()` to transform your cuML Random Forest model
2. **Serialize the model**: Call `.serialize()` to create a portable checkpoint file
3. **Deploy anywhere**: Install Treelite on the target machine and load the model for inference

Treelite provides optimized CPU inference with efficient serialization, making it ideal for production deployment.

In [None]:
from cuml.ensemble import RandomForestClassifier as cuRandomForestClassifier
from sklearn.datasets import load_iris
import numpy as np

# Load and prepare iris dataset
X, y = load_iris(return_X_y=True)
X, y = X.astype(np.float32), y.astype(np.int32)

# Train Random Forest model
random_forest = cuRandomForestClassifier(max_depth=3, random_state=0, n_estimators=10).fit(X, y)

# Export cuML RF model as Treelite checkpoint
treelite_checkpoint_path = "./checkpoint.tl"
random_forest.as_treelite().serialize(treelite_checkpoint_path)

2. Copy the generated checkpoint file `checkpoint.tl` to another machine on which you'd like to run predictions.

3. On the target machine, install Treelite by running `pip install treelite` or `conda install -c conda-forge treelite`. The machine does not need to have NVIDIA GPUs and does not need to have cuML installed.

4. You can now load the model from the checkpoint, by running the following on the target machine:

In [None]:
import treelite

# Load the Treelite model (checkpoint file has been copied over)
treelite_checkpoint_path = "./checkpoint.tl"
treelite_model = treelite.Model.deserialize(treelite_checkpoint_path)

# Make predictions using Treelite
predictions = treelite.gtil.predict(treelite_model, X, pred_margin=True)
print(predictions)