# A Guided Tour of Ray Core: JobLib

[*Distributed scikit-learn*](https://docs.ray.io/en/latest/joblib.html) provides a drop-in replacement to parallelize the [`JobLib`](https://joblib.readthedocs.io/en/latest/) backend for [`scikit-learn`](https://scikit-learn.org/stable/)


---

First, let's start Ray…

In [1]:
from icecream import ic
import logging
import ray

ray.init(
    ignore_reinit_error=True,
    logging_level=logging.ERROR,
)

print(f"Dashboard URL: http://{ray.get_dashboard_url()}")

Dashboard URL: http://127.0.0.1:8265


## JobLib example

Set up for this example...

In [2]:
from ray.util.joblib import register_ray
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
import numpy as np
import joblib

First, let's register Ray as the parallelized [*joblib*](https://scikit-learn.org/stable/modules/generated/sklearn.utils.parallel_backend.html) backend for `scikit-learn`, using  Ray actors instead of local processes.
This makes it easy to scale existing applications running on a single node to running on a cluster.

See: <https://docs.ray.io/en/master/joblib.html>

In [3]:
register_ray()

Next, load a copy of the UCI machine learning data repository's hand-written *digits* dataset.
See: <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html>

In [4]:
digits = load_digits()

We'll define the hyper-parameter space for training a *support vector machines* model:

In [5]:
param_space = {
    "C": np.logspace(-6, 6, 30),
    "gamma": np.logspace(-8, 8, 30),
    "tol": np.logspace(-4, -1, 30),
    "class_weight": [None, "balanced"],
}

model = SVC(kernel="rbf")

Then use a randomized search to optimize these hyper-parameters. See: <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html>

We'll use 5 cross-validation splits and 10 iterations, which will result in a total of 50 "fits". This is enough to illustrate the `joblib` being parallelized, although in practice you'd probably use more iterations.

In [6]:
clf = RandomizedSearchCV(model, param_space, cv=5, n_iter=10, verbose=10)
clf

RandomizedSearchCV(cv=5, error_score=nan,
                   estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                                 class_weight=None, coef0=0.0,
                                 decision_function_shape='ovr', degree=3,
                                 gamma='scale', kernel='rbf', max_iter=-1,
                                 probability=False, random_state=None,
                                 shrinking=True, tol=0.001, verbose=False),
                   iid='deprecated', n_iter=10, n_jobs=None,
                   param_distributions={'C': array([1.000...
       0.00032903, 0.00041753, 0.00052983, 0.00067234, 0.00085317,
       0.00108264, 0.00137382, 0.00174333, 0.00221222, 0.00280722,
       0.00356225, 0.00452035, 0.00573615, 0.00727895, 0.00923671,
       0.01172102, 0.01487352, 0.01887392, 0.02395027, 0.03039195,
       0.0385662 , 0.04893901, 0.06210169, 0.07880463, 0.1       ])},
                   pre_dispatch='2*n_jobs', random_state=None, re

Run the cross-validation fits (i.e., the random search for hyper-parameter optimization) using Ray to parallelize the backend processes:

In [7]:
with joblib.parallel_backend("ray"):
    search = clf.fit(digits.data, digits.target)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=-1)]: Using backend RayBackend with 8 concurrent workers.


[2m[36m(pid=104)[0m [CV] tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06 
[2m[36m(pid=98)[0m [CV] tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401 


[2m[36m(pid=98)[0m [CV]  tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401, score=0.928, total=   0.2s
[2m[36m(pid=98)[0m [CV] tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06 
[2m[36m(pid=100)[0m [CV] tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233 
[2m[36m(pid=102)[0m [CV] tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401 
[2m[36m(pid=102)[0m 


[2m[36m(pid=105)[0m [CV] tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06 
[2m[36m(pid=101)[0m [CV] tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401 
[2m[36m(pid=99)[0m [CV] tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401 


[2m[36m(pid=103)[0m [CV] tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401 
[2m[36m(pid=100)[0m [CV]  tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233, score=0.958, total=   0.4s
[2m[36m(pid=100)[0m [CV] tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486 
[2m[36m(pid=102)[0m [CV]  tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401, score=0.983, total=   0.3s
[2m[36m(pid=99)[0m [CV]  tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401, score=0.983, total=   0.2s
[2m[36m(pid=101)[0m [CV]  tol=0.00032903445623126676, gamma=7.278953843983146e-05, class_weight=None, C=4.175318936560401, score=0.950, total=   0.3s
[2m[36m(pid=101)[0m [CV] tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233 
[2m[36m(pid=99)[0m [CV] tol=0.0

[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    1.5s


[2m[36m(pid=101)[0m [CV]  tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233, score=0.894, total=   0.4s
[2m[36m(pid=101)[0m [CV] tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486 


[2m[36m(pid=104)[0m [CV]  tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06, score=0.100, total=   1.5s
[2m[36m(pid=104)[0m [CV] tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06 
[2m[36m(pid=103)[0m [CV] tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912 
[2m[36m(pid=98)[0m [CV]  tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06, score=0.100, total=   1.3s
[2m[36m(pid=98)[0m [CV] tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912 
[2m[36m(pid=102)[0m [CV] tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912 


[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    2.3s


[2m[36m(pid=100)[0m [CV]  tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486, score=0.100, total=   1.3s
[2m[36m(pid=100)[0m [CV] tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486 
[2m[36m(pid=105)[0m [CV]  tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06, score=0.100, total=   1.8s
[2m[36m(pid=105)[0m [CV] tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912 


[2m[36m(pid=99)[0m [CV]  tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06, score=0.097, total=   1.6s
[2m[36m(pid=99)[0m [CV] tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486 


[2m[36m(pid=101)[0m [CV]  tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486, score=0.100, total=   1.8s


[2m[36m(pid=102)[0m [CV]  tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912, score=0.097, total=   1.6s
[2m[36m(pid=104)[0m [CV]  tol=0.00020433597178569417, gamma=6.723357536499335, class_weight=balanced, C=2.592943797404667e-06, score=0.097, total=   1.8s
[2m[36m(pid=104)[0m [CV] tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233 


[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    4.4s


[2m[36m(pid=99)[0m [CV]  tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486, score=0.103, total=   1.2s
[2m[36m(pid=103)[0m [CV]  tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912, score=0.097, total=   2.1s
[2m[36m(pid=103)[0m [CV] tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671 


[2m[36m(pid=98)[0m [CV]  tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912, score=0.100, total=   2.2s
[2m[36m(pid=98)[0m [CV] tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671 
[2m[36m(pid=104)[0m [CV]  tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233, score=0.922, total=   0.5s
[2m[36m(pid=104)[0m [CV] tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233 
[2m[36m(pid=100)[0m [CV]  tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486, score=0.100, total=   1.8s
[2m[36m(pid=100)[0m [CV] tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486 


[2m[36m(pid=104)[0m [CV]  tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233, score=0.911, total=   0.3s
[2m[36m(pid=104)[0m [CV] tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233 
[2m[36m(pid=105)[0m [CV]  tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912, score=0.095, total=   2.0s
[2m[36m(pid=105)[0m [CV] tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118 


[2m[36m(pid=104)[0m [CV]  tol=0.00032903445623126676, gamma=3.562247890262444e-08, class_weight=None, C=1268.9610031679233, score=0.972, total=   0.3s
[2m[36m(pid=104)[0m [CV] tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912 
[2m[36m(pid=103)[0m [CV]  tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671, score=0.100, total=   0.8s


[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    5.4s


[2m[36m(pid=100)[0m [CV]  tol=0.1, gamma=303.9195382313195, class_weight=None, C=0.2395026619987486, score=0.100, total=   0.8s
[2m[36m(pid=100)[0m [CV] tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671 


[2m[36m(pid=98)[0m [CV]  tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671, score=0.100, total=   1.1s
[2m[36m(pid=98)[0m [CV] tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671 
[2m[36m(pid=105)[0m [CV]  tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118, score=0.103, total=   0.8s
[2m[36m(pid=105)[0m [CV] tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06 


[2m[36m(pid=104)[0m [CV]  tol=0.011721022975334805, gamma=28072162.039411698, class_weight=balanced, C=0.0007880462815669912, score=0.100, total=   0.7s
[2m[36m(pid=104)[0m [CV] tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671 


[2m[36m(pid=100)[0m [CV]  tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671, score=0.100, total=   0.8s
[2m[36m(pid=100)[0m [CV] tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118 
[2m[36m(pid=99)[0m [CV] tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499 
[2m[36m(pid=98)[0m [CV]  tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671, score=0.100, total=   0.7s
[2m[36m(pid=98)[0m [CV] tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118 
[2m[36m(pid=101)[0m [CV] tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486 


[2m[36m(pid=99)[0m [CV]  tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499, score=0.961, total=   0.2s
[2m[36m(pid=99)[0m [CV] tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486 
[2m[36m(pid=102)[0m [CV] tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486 
[2m[36m(pid=105)[0m [CV]  tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06, score=0.100, total=   0.7s
[2m[36m(pid=105)[0m [CV] tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06 
[2m[36m(pid=104)[0m [CV]  tol=0.0028072162039411755, gamma=28072162.039411698, class_weight=balanced, C=3290.344562312671, score=0.103, total=   0.7s
[2m[36m(pid=104)[0m [CV] tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118 


[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    7.0s


[2m[36m(pid=100)[0m [CV]  tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118, score=0.100, total=   0.8s
[2m[36m(pid=100)[0m [CV] tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118 
[2m[36m(pid=101)[0m [CV]  tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486, score=0.100, total=   0.7s


[2m[36m(pid=98)[0m [CV]  tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118, score=0.100, total=   0.9s
[2m[36m(pid=98)[0m [CV] tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06 
[2m[36m(pid=102)[0m [CV]  tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486, score=0.103, total=   0.8s
[2m[36m(pid=105)[0m [CV]  tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06, score=0.100, total=   0.8s
[2m[36m(pid=99)[0m [CV]  tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486, score=0.100, total=   0.8s
[2m[36m(pid=99)[0m [CV] tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486 
[2m[36m(pid=104)[0m [CV]  tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118, score=0.100, total=   0.8s
[2m[36m(pid=104)[0m [CV] tol=0.0117210229

[Parallel(n_jobs=-1)]: Done  41 out of  50 | elapsed:    7.7s remaining:    1.7s


[2m[36m(pid=100)[0m [CV]  tol=0.04893900918477494, gamma=85.31678524172814, class_weight=balanced, C=148735.21072935118, score=0.100, total=   0.7s
[2m[36m(pid=100)[0m [CV] tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06 
[2m[36m(pid=98)[0m [CV]  tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06, score=0.100, total=   0.6s
[2m[36m(pid=98)[0m [CV] tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499 
[2m[36m(pid=99)[0m [CV]  tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486, score=0.100, total=   0.6s


[2m[36m(pid=104)[0m [CV]  tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06, score=0.103, total=   0.6s
[2m[36m(pid=104)[0m [CV] tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499 
[2m[36m(pid=98)[0m [CV]  tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499, score=0.950, total=   0.2s
[2m[36m(pid=98)[0m [CV] tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486 
[2m[36m(pid=104)[0m [CV]  tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499, score=0.978, total=   0.1s
[2m[36m(pid=104)[0m [CV] tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499 


[Parallel(n_jobs=-1)]: Done  47 out of  50 | elapsed:    8.3s remaining:    0.5s


[2m[36m(pid=104)[0m [CV]  tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499, score=0.983, total=   0.2s
[2m[36m(pid=100)[0m [CV]  tol=0.011721022975334805, gamma=48939.00918477499, class_weight=None, C=1e-06, score=0.100, total=   0.6s
[2m[36m(pid=100)[0m [CV] tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499 


[2m[36m(pid=100)[0m [CV]  tol=0.007278953843983146, gamma=0.0009236708571873865, class_weight=balanced, C=489.3900918477499, score=0.986, total=   0.2s
[2m[36m(pid=98)[0m [CV]  tol=0.06210169418915616, gamma=174332.88221999872, class_weight=None, C=0.2395026619987486, score=0.100, total=   0.6s


[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:    8.6s finished


So far, what is the best set of hyper-parameters found?

In [8]:
search.best_params_

{'tol': 0.007278953843983146,
 'gamma': 0.0009236708571873865,
 'class_weight': 'balanced',
 'C': 489.3900918477499}

Finally, shutdown Ray

In [9]:
ray.shutdown()