# Pickling Models for Persistence

This notebook demonstrates simple pickling of both single-GPU and multi-GPU cuML models for persistence

In [None]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Single GPU Model Pickling

All single-GPU estimators are pickleable. The following example demonstrates the creation of a synthetic dataset, training, and pickling of the resulting model for storage. Trained single-GPU models can also be used to distribute the inference on a Dask cluster, which the `Distributed Model Pickling` section below demonstrates.

In [None]:
from cuml.datasets import make_blobs

# Generate synthetic dataset for clustering
X, y = make_blobs(
    n_samples=50,
    n_features=10,
    centers=5,
    cluster_std=0.4,
    random_state=0
)

In [None]:
from cuml.cluster import KMeans as cuKMeans

# Initialize and fit KMeans model
kmeans_model = cuKMeans(n_clusters=5)
kmeans_fitted = kmeans_model.fit(X)

We recommend to use Pickle protocol 5 as it is more efficient especially for large arrays (models).

In [None]:
import pickle

# Save the fitted model to disk
pickle.dump(kmeans_fitted, open("kmeans_model.pkl", "wb"), protocol=5)

In [None]:
# Load the model from disk
kmeans_loaded = pickle.load(open("kmeans_model.pkl", "rb"))

In [None]:
# Display the loaded model's cluster centers
kmeans_loaded.cluster_centers_

## Using joblib for Model Serialization

joblib is another popular library for serializing Python objects, particularly NumPy arrays and scikit-learn models. It's often faster than pickle for large arrays and provides better compression. cuML models can also be serialized using joblib.

The following example demonstrates how to use joblib to save and load cuML models:


In [None]:
from joblib import dump, load
from cuml.ensemble import RandomForestClassifier as cuRF
from cuml.datasets.classification import make_classification
from cuml.model_selection import train_test_split
from cuml.metrics import accuracy_score
import numpy as np

# Generate synthetic dataset for classification
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_classes=2,
    random_state=42
)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Random Forest model
rf_model = cuRF(n_estimators=100, max_depth=10, random_state=42)
rf_fitted = rf_model.fit(X_train, y_train)

# Make predictions and evaluate
predictions = rf_fitted.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Model accuracy: {accuracy:.4f}")


In [None]:
# Save the fitted model using joblib
model_filename = "rf_model.joblib"
dump(rf_fitted, model_filename)
print(f"Model saved to {model_filename}")


In [None]:
# Load the model from disk using joblib
loaded_model = load(model_filename)
print("Model loaded successfully")

# Verify the loaded model works by making predictions
loaded_predictions = loaded_model.predict(X_test)
loaded_accuracy = accuracy_score(y_test, loaded_predictions)
print(f"Loaded model accuracy: {loaded_accuracy:.4f}")

# Verify predictions are identical
predictions_match = np.array_equal(predictions, loaded_predictions)
print(f"Predictions match: {predictions_match}")


### Advantages of joblib over pickle

- **Better compression**: joblib uses more efficient compression algorithms, resulting in smaller file sizes
- **Faster serialization**: Particularly efficient for large NumPy arrays and scikit-learn-style models
- **Memory efficient**: Can handle large objects that might cause memory issues with pickle
- **Cross-platform compatibility**: Works well across different operating systems and Python versions

### When to use joblib vs pickle

- Use **joblib** when working with machine learning models, especially those with large arrays (like Random Forest models with many trees)
- Use **pickle** for general Python object serialization or when you need maximum compatibility
- Both work well with cuML models, so the choice often comes down to personal preference and specific use case requirements


## Distributed Model Pickling

The distributed estimator wrappers inside of the `cuml.dask` are not intended to be pickled directly. The Dask cuML estimators provide a function `get_combined_model()`, which returns the trained single-GPU model for pickling. The combined model can be used for inference on a single-GPU, and the `ParallelPostFit` wrapper from the [Dask-ML](https://ml.dask.org/meta-estimators.html) library can be used to perform distributed inference on a Dask cluster.

In [None]:
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

# Set up Dask cluster
cluster = LocalCUDACluster()
client = Client(cluster)
client

In [None]:
from cuml.dask.datasets import make_blobs

# Get number of workers
n_workers = len(client.scheduler_info()["workers"].keys())

# Generate distributed dataset
X, y = make_blobs(
    n_samples=5000,
    n_features=30,
    centers=5,
    cluster_std=0.4,
    random_state=0,
    n_parts=n_workers * 5
)

# Persist data in memory
X = X.persist()
y = y.persist()

In [None]:
from cuml.dask.cluster import KMeans as cuDaskKMeans

# Initialize distributed KMeans model
dask_kmeans_model = cuDaskKMeans(n_clusters=5)

In [None]:
# Fit the distributed model
# dask_kmeans_fitted = dask_kmeans_model.fit(X)

In [None]:
import pickle

# Extract single-GPU model and save it
# single_gpu_kmeans = dask_kmeans_fitted.get_combined_model()
# pickle.dump(single_gpu_kmeans, open("kmeans_model.pkl", "wb"))

In [None]:
# Load the single-GPU model
# single_gpu_kmeans_loaded = pickle.load(open("kmeans_model.pkl", "rb"))

In [None]:
# Display the loaded model's cluster centers
# single_gpu_kmeans_loaded.cluster_centers_

## Exporting cuML Random Forest models for inferencing on machines without GPUs

Starting with cuML version 21.06, you can export cuML Random Forest models and run predictions with them on machines without NVIDIA GPUs. The [Treelite](https://github.com/dmlc/treelite) package defines an efficient exchange format that lets you portably move the cuML Random Forest models to other machines. We will refer to the exchange format as 'checkpoints.'

Here are the steps to export the model:

1. Call `as_treelite().serialize()` to obtain the checkpoint file from the cuML Random Forest model.

In [None]:
from cuml.ensemble import RandomForestClassifier as cuRF
from sklearn.datasets import load_iris
import numpy as np

# Load and prepare iris dataset
X, y = load_iris(return_X_y=True)
X, y = X.astype(np.float32), y.astype(np.int32)

# Train Random Forest model
rf_model = cuRF(max_depth=3, random_state=0, n_estimators=10)
rf_fitted = rf_model.fit(X, y)

# Export cuML RF model as Treelite checkpoint
rf_fitted.as_treelite().serialize(checkpoint_path)

2. Copy the generated checkpoint file `checkpoint.tl` to another machine on which you'd like to run predictions.

3. On the target machine, install Treelite by running `pip install treelite` or `conda install -c conda-forge treelite`. The machine does not need to have NVIDIA GPUs and does not need to have cuML installed.

4. You can now load the model from the checkpoint, by running the following on the target machine:

In [None]:
import treelite

# Load the Treelite model (checkpoint file has been copied over)
checkpoint_path = "./checkpoint.tl"
tl_model = treelite.Model.deserialize(checkpoint_path)

# Make predictions using Treelite
out_prob = treelite.gtil.predict(tl_model, X, pred_margin=True)
print(out_prob)