# Introduction to Ray 🩻

## What you will learn in this course 🧐🧐

[Ray](https://www.ray.io/) is an open-source library made for distributed Machine Learning. Compatible with Kubernetes, it will speed up your training process. In this course, you will learn:

- How to use Ray locally
- How to install Ray on a Kubernetes Cluster
- Ray main components
  - Ray ML
  - Ray Core
  - Ray Cluster
- How to use Ray with Sklearn
- How to monitor your Ray Cluster

> This course has been written using version 2.9.0 of ray, note that ray is undergoing very frequent updates given the current thriving in the field of AI.

## Install Ray locally

Let's start small by installing ray locally on a virtual environment to get the hang of things (we'll need this local installation in order to interract with a k8 cluster running ray later anµyway).

First let's create a local virtual environment using conda:

```shell
conda create -n ray python=3.10
```

Now let's activate this environment:

```shell
conda activate ray
```


### Ray with Scikit-Learn locally

In what follows, we'll cover how to use ray in order to train Scikit-learn models.

Start by intalling scikit-learn in our ray virtual environment :

```shell
pip install -U "ray[train]" "ray[data]" scikit-learn joblib
```

Then let's run the follwing code:

In [1]:
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
digits = load_digits()

param_space = {
    'C': np.logspace(-6, 6, 30),
    'gamma': np.logspace(-8, 8, 30),
    'tol': np.logspace(-4, -1, 30),
    'class_weight': [None, 'balanced'],
}
model = SVC(kernel='rbf')
search = RandomizedSearchCV(model, param_space, cv=5, n_iter=300, verbose=10)
search.fit(digits.data, digits.target)

Fitting 5 folds for each of 300 candidates, totalling 1500 fits
[CV 1/5; 1/300] START C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874054
[CV 1/5; 1/300] END C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874054;, score=0.100 total time=   0.1s
[CV 2/5; 1/300] START C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874054
[CV 2/5; 1/300] END C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874054;, score=0.100 total time=   0.1s
[CV 3/5; 1/300] START C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874054
[CV 3/5; 1/300] END C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874054;, score=0.203 total time=   0.1s
[CV 4/5; 1/300] START C=0.0007880462815669912, class_weight=None, gamma=1.2689610031679235e-07, tol=0.001082636733874

0,1,2
,estimator,SVC()
,param_distributions,"{'C': array([1.0000...00000000e+06]), 'class_weight': [None, 'balanced'], 'gamma': array([1.0000...00000000e+08]), 'tol': array([0.0001..., 0.1 ])}"
,n_iter,300
,scoring,
,n_jobs,
,refit,True
,cv,5
,verbose,10
,pre_dispatch,'2*n_jobs'
,random_state,

0,1,2
,C,np.float64(1.6102620275609392)
,kernel,'rbf'
,degree,3
,gamma,np.float64(0....6708571873865)
,coef0,0.0
,shrinking,True
,probability,False
,tol,np.float64(0....3357536499335)
,cache_size,200
,class_weight,


In [2]:
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
digits = load_digits()

param_space = {
    'C': np.logspace(-6, 6, 30),
    'gamma': np.logspace(-8, 8, 30),
    'tol': np.logspace(-4, -1, 30),
    'class_weight': [None, 'balanced'],
}
model = SVC(kernel='rbf')
search = RandomizedSearchCV(model, param_space, cv=5, n_iter=300, verbose=10)

import joblib
from ray.util.joblib import register_ray
register_ray()
with joblib.parallel_backend('ray'):
    search.fit(digits.data, digits.target)

2025-06-20 10:07:51,332	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2025-06-20 10:07:51,339	INFO ray_backend.py:74 -- Starting local ray cluster
2025-06-20 10:07:53,127	INFO worker.py:1917 -- Started a local Ray instance.


Fitting 5 folds for each of 300 candidates, totalling 1500 fits
[36m(PoolActor pid=98213)[0m [CV 2/5; 1/300] START C=489.3900918477499, class_weight=None, gamma=0.0009236708571873865, tol=0.00032903445623126676
[36m(PoolActor pid=98213)[0m [CV 2/5; 1/300] END C=489.3900918477499, class_weight=None, gamma=0.0009236708571873865, tol=0.00032903445623126676;, score=0.950 total time=   0.1s
[36m(PoolActor pid=98207)[0m [CV 1/5; 5/300] START C=10.82636733874054, class_weight=balanced, gamma=85.31678524172814, tol=0.03039195382313198
[36m(PoolActor pid=98207)[0m [CV 1/5; 5/300] END C=10.82636733874054, class_weight=balanced, gamma=85.31678524172814, tol=0.03039195382313198;, score=0.100 total time=   0.2s
[36m(PoolActor pid=98212)[0m [CV 5/5; 46/300] START C=1e-06, class_weight=None, gamma=1.8873918221350996, tol=0.06210169418915616[32m [repeated 153x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/

Using `register_ray()` and the scope `with joblib.parallel_backend('ray'):` will create a local ray cluster on which the model will train.

## Resources 📚📚

[ray with scikit-learn](https://docs.ray.io/en/latest/ray-more-libs/joblib.html)