# cuML on GPU and CPU

cuML is a Scikit-learn based suite of fast, GPU-accelerated machine learning algorithms designed for data science and analytical tasks. Starting with version 23.10, cuML can also be run on CPU systems, increasing the ease of use (without code changes) in the following manners: 

- Allow users to prototype in systems without GPUs. 
- Allow library integrations without the need of dispatching and boilerplate code. 
- Allow users to train on one type of system and infer with the other in a subset of estimators (that will grow with each version). 
- Provide compatibility with the GPU/CPU open source pydata ecosystem. (brief cudf accel mention later) 

The majority of estimators of cuML can run in both CPU and GPU systems, with a subset of them allowing exporting models between GPU and CPU systems:

[table to be inserted]

This allows the same code to be guaranteed to run in both GPU and CPU systems: 



In [1]:
import cuml 

n_samples = 2**20
n_features = 399

random_state = 23

#todo: change to sklearn dataset
X, y = make_regression(n_samples=n_samples, 
                       n_features=n_features, 
                       random_state=random_state)

#todo: mention briefly xdf in ending and link
X = pandas.DataFrame(X)
y = pandas.DataFrame(y)[0]

X_cudf, X_cudf_test, y_cudf, y_cudf_test = train_test_split(X, 
                                                            y, 
                                                            test_size = 0.2, 
                                                            random_state=random_state)

# is OLS the best example? 
ols_cuml = cuLinearRegression(fit_intercept=True,
                              algorithm='eig')

ols_cuml.fit(X_cudf, y_cudf)
predict_cuml = ols_cuml.predict(X_cudf_test)

print(predict_cuml)

ModuleNotFoundError: No module named 'cuml'



## Installation

For GPU systems, cuML still follows the [RAPIDS requirements] and nothing has changed for installing it. The cuML package and wheels are universal and can run in both GPU and CPU modes. For installing in CPU systems, similar to other packages it can be installed from conda/mamba with:

```bash
mamba install -c rapidsai cuml-cpu
```

This will install the CPU package (that doesn't require CUDA or GPUs in general), but usage and imports don't need to change. 
pip wheels 23.12


This package can run most of the cuML estimators,as mentioned in the section above. 

## Managing Execution Platform

Additionally to allowing the same code to be run in CPU systems, users can control which device exedcutes parts of the code. So in addition to the first example that can just be run in a CPU system with `cuml-cpu`, a system with the full cuML can execute in CPU mode as well. 

First we get some data:

In [None]:
import cuml
from cuml.neighbors import NearestNeighbors
from cuml.datasets import make_regression, make_blobs
from cuml.model_selection import train_test_split

# todo: use sklearn.get_dataset
X_blobs, y_blobs = make_blobs(n_samples=2000, 
                              n_features=20)
X_train_blobs, X_test_blobs, y_train_blobs, y_test_blobs = train_test_split(X_blobs, 
                                                                            y_blobs, 
                                                                            test_size=0.2, shuffle=True)

X_reg, y_reg = make_regression(n_samples=2000, 
                               n_features=20)
X_train_reg, X_test_reg, y_train_reg, y_tes_reg = train_test_split(X_reg, 
                                                                   y_reg, 
                                                                   test_size=0.2, 
                                                                   shuffle=True)

Don't have a GPU with enough memory at your disposal at the moment? You can work on prototyping and run estimators in CPU-mode: 

In [None]:
nn = NearestNeighbors()
with using_device_type('cpu'):
    nn.fit(X_train_blobs)
    nearest_neighbors = nn.kneighbors(X_test_blobs)

Need to train your estimator with a special feature or hyperparameter only available in the paired CPU library? Initialize the cuML model with it and train on CPU.

In [None]:
from cuml.manifold import UMAP

umap_model = UMAP(angular_rp_forest=True) # `angular_rp_forest` hyperparameter only available in UMAP library
with using_device_type('cpu'):
    umap_model.fit(X_train_blobs) # will run the UMAP library with the hyperparameter
with using_device_type('gpu'):
    transformed = umap_model.transform(X_test_blobs) # will run the cuML implementation of UMAP

### Mechanisms to control execution

The GPU/device is the default execution platform :

In [None]:
from cuml.common.device_selection import using_device_type
from cuml.common.device_selection import set_global_device_type, get_global_device_type

initial_device_type = get_global_device_type()
print('default execution device:', initial_device_type)

Estimators trainings and inferences inside a `using_device_type` context will be executed according to the execution platform selected :

In [None]:
for param in ['cpu', 'host', 'gpu', 'device']:
    with using_device_type(param):
        print('using_device_type({}):'.format(param), get_global_device_type())

The execution platform can also be set at the global level from the `set_global_device_type` function.

In [None]:
set_global_device_type('gpu')
print('new device type:', get_global_device_type())

## Cross Device Training and Inference. 

While ML training workflows almost always benefit from the superior speed of GPUs, small-scale applications with limited traffic and loose latency requirements may be able to perform inference on CPU. Please note that this feature would only work with models implementing pickle serialization and GPU to CPU transfers.

To train a model on GPU but deploy it on CPU : first, train the estimator on device and save it to disk

In [None]:
import pickle

lin_reg = LinearRegression()
with using_device_type('gpu'):
    lin_reg.fit(X_train_reg, y_train_reg)

pickle.dump(lin_reg, open("lin_reg.pkl", "wb"))
del lin_reg

Then, on the server, recover the estimator and run the inference on host.

In [None]:
recovered_lin_reg = pickle.load(open("lin_reg.pkl", "rb"))
with using_device_type('cpu'):
    predictions = recovered_lin_reg.predict(X_test_reg)

## Conclusions