## Setup

In [None]:
%%capture
! pip install "ray[rllib, serve, tune]==2.2.0"
! pip install "pyarrow==10.0.0"
! pip install "tensorflow>=2.9.0"
! pip install "transformers>=4.24.0"
! pip install "pygame==2.1.2" "gym==0.25.0"

In [2]:
import ray
ray.init()

2023-03-18 08:06:40,198	INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


0,1
Python version:,3.9.16
Ray version:,2.2.0
Dashboard:,http://127.0.0.1:8265


## Data Processing with Ray Datasets

The following simple example creates a distributed `Dataset` on your local Ray Cluster from a Python data structure. Specifically, you’ll create a dataset from a Python dictionary containing a string `name` and an integer-valued `data` for 10,000 entries:

In [3]:
items = [{"name": str(i), "data": i} for i in range(10000)]
ds = ray.data.from_items(items)
ds.show(5)

{'name': '0', 'data': 0}
{'name': '1', 'data': 1}
{'name': '2', 'data': 2}
{'name': '3', 'data': 3}
{'name': '4', 'data': 4}


Great, now you have some rows, but what can you do with that data? The `Dataset` API bets heavily on functional programming, as this paradigm is well suited for data transformations.

Even though Python 3 made a point of hiding some of its functional programming capabilities, you’re probably familiar with functionality such as `map`, `filter`, `flat_map`, and others. If not, it’s easy enough to pick up: `map` takes each element of your dataset and transforms it into something else, in parallel; `filter` removes data points according to a Boolean filter function; and the slightly more elaborate `flat_map` first maps values similarly to `map`, but then it also “flattens” the result. For instance, if `map` produced a list of lists, `flat_map` would flatten out the nested lists and give you just a list. Equipped with these three functional API calls, let’s see how easily you can transform your dataset `ds`:

In [4]:
#We map each row of ds to only keep the square value of its data entry.
squares = ds.map(lambda x: x["data"] ** 2)

#Then we filter the squares to keep only even numbers (a total of five thousand elements).
evens = squares.filter(lambda x: x % 2 == 0)
evens.count()

#We then use flat_map to augment the remaining values with their respective cubes.
cubes = evens.flat_map(lambda x: [x, x**3])

#To take a total of 10 values means to leave Ray and return a Python list with 
#these values that we can print.
sample = cubes.take(10)
print(sample)

Map: 100%|██████████| 200/200 [00:02<00:00, 78.04it/s] 
Filter: 100%|██████████| 200/200 [00:00<00:00, 403.82it/s]
Flat_Map: 100%|██████████| 200/200 [00:00<00:00, 329.68it/s]

[0, 0, 4, 64, 16, 4096, 36, 46656, 64, 262144]





The drawback of `Dataset` transformations is that each step gets executed synchronously. In this example that is a nonissue, but for complex tasks that, for example, mix reading files and processing data, you would want an execution that can overlap individual tasks. `DatasetPipeline` does exactly that. Let’s rewrite the previous example into a pipeline:

In [5]:
#You can turn a Dataset into a pipeline by calling .window() on it.
pipe = ds.window()

#Pipeline steps can be chained to yield the same result as before.
result = pipe\
    .map(lambda x: x["data"] ** 2)\
    .filter(lambda x: x % 2 == 0)\
    .flat_map(lambda x: [x, x**3])
result.show(10)

2023-03-18 08:20:49,252	INFO dataset.py:3693 -- Created DatasetPipeline with 20 windows: 7390b min, 8000b max, 7944b mean
2023-03-18 08:20:49,255	INFO dataset.py:3703 -- Blocks per window: 10 min, 10 max, 10 mean
2023-03-18 08:20:49,262	INFO dataset.py:3725 -- ✔️  This pipeline's per-window parallelism is high enough to fully utilize the cluster.
2023-03-18 08:20:49,266	INFO dataset.py:3742 -- ✔️  This pipeline's windows likely fit in object store memory without spilling.
Stage 0:   0%|          | 0/20 [00:00<?, ?it/s]
  0%|          | 0/20 [00:00<?, ?it/s][A
Stage 1:   0%|          | 0/20 [00:00<?, ?it/s][A
Stage 1:   5%|▌         | 1/20 [00:00<00:03,  5.80it/s]
Stage 0:  10%|█         | 2/20 [00:00<00:01, 10.96it/s]

0
0
4
64
16
4096
36
46656
64
262144





## Model Training

Moving on to the next set of libraries, let’s look at the distributed training capabilities of Ray. For that, you have access to two libraries. One is dedicated to reinforcement learning specifically; the other one has a different scope and is aimed primarily at supervised learning tasks.

### Reinforcement learning with Ray RLlib

Let’s start with _Ray RLlib_ for reinforcement learning (RL). This library is powered by the modern ML frameworks TensorFlow and PyTorch, and you can choose which one to use. Both frameworks seem to converge more and more conceptually, so you can pick the one you like most without losing much in the process.

One of the easiest ways to run examples with RLlib is to use the command-line tool `rllib`, which we already installed implicitly when we ran `pip install "ray[rllib]"`.

We’ll look at a fairly classic control problem of balancing a pole on a cart. Imagine you have a pole like the one in figure below, fixed at a joint of a cart, and subject to gravity. The cart is free to move along a frictionless track, and you can manipulate the cart by giving it a push from the left or the right with a fixed force. If you do this well enough, the pole will remain in an upright position. For each time step the pole didn’t fall over, we get a reward of 1. Collecting a high reward is our goal, and the question is whether we can teach a reinforcement learning algorithm to do this for us.

![2](https://user-images.githubusercontent.com/62965911/226094441-f0ea908f-e6ec-44a1-81e5-24719c7279c6.png)

Specifically, we want to train a reinforcement learning agent that can carry out two actions, namely, push to the left or to the right, observe what happens when interacting with the environment in that way, and learn from the experience by maximizing the reward.

To tackle this problem with Ray RLlib, we can use a so-called _tuned_ example, which is a preconfigured algorithm that runs well for a given problem. You can run a tuned example with a single command. RLlib comes with many such examples, and you can list them all with `rllib example list`.

In [7]:
! rllib example list

[3m                                 RLlib Examples                                 [0m
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mExample ID                     [0m[1m [0m┃[1m [0m[1mDescription                               [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m[36matari-a2c                      [0m[36m [0m│[35m [0m[35mRuns grid search over several Atari games [0m[35m [0m│
│[36m                                 [0m│[35m [0m[35mon A2C.                                   [0m[35m [0m│
│[36m [0m[36matari-dqn                      [0m[36m [0m│[35m [0m[35mRun grid search on Atari environments with[0m[35m [0m│
│[36m                                 [0m│[35m [0m[35mDQN.                                      [0m[35m [0m│
│[36m [0m[36matari-duel-ddqn                [0m[36m [0m│[35m [0m[35mRun grid search on Atari environments w

One of the available examples is `cartpole-ppo`, a tuned example that uses the PPO algorithm to solve the cart–pole problem, specifically, the `CartPole-v1` environment from OpenAI Gym.

```yaml
cartpole-ppo:
    env: CartPole-v1  [1]
    run: PPO  [2]
    stop:
        episode_reward_mean: 150  [3]
        timesteps_total: 100000
    config: [4]
        framework: tf
        gamma: 0.99
        lr: 0.0003
        num_workers: 1
        observation_filter: MeanStdFilter
        num_sgd_iter: 6
        vf_loss_coeff: 0.01
        model:
            fcnet_hiddens: [32]
            fcnet_activation: linear
            vf_share_layers: true
        enable_connectors: True
```

1. The `CartPole-v1` environment simulates the problem we just described.
2. Use a powerful RL algorithm called Proximal Policy Optimization, or PPO.
3. Once we reach a reward of 150, stop the experiment.
4. PPO needs some RL-specific configuration to make it work for this problem.

The details of this configuration file don’t matter much at this point, so don’t get distracted by them. The important part is that you specify the `Cartpole-v1` environment and sufficient RL-specific configuration to ensure the training procedure works. Running this configuration doesn’t require any special hardware and finishes in a matter of minutes.

In [6]:
! rllib example run cartpole-ppo

== Status ==
Current time: 2023-03-18 08:34:37 (running for 00:03:28.54)
Memory usage on this node: 2.9/12.7 GiB 
Using FIFO scheduling algorithm.
Resources requested: 2.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects
Result logdir: /root/ray_results/cartpole-ppo
Number of trials: 1/1 (1 RUNNING)
+-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+
| Trial name                  | status   | loc              |   iter |   total time (s) |    ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|-----------------------------+----------+------------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------|
| PPO_CartPole-v1_3bb16_00000 | RUNNING  | 172.28.0.12:7624 |      5 |          168.401 | 20000 |   108.29 |                  500 |                   13 |        

Your local Ray checkpoint folder is _~/ray-results_ by default. For the training configuration we used, your _<checkpoint-path>_ should be of the form _~/ray\_results/cartpole-ppo/PPO\_CartPole-v1\_\<experiment\_id>_. During the training procedure, your intermediate and final model checkpoints get generated into this folder.

To evaluate the performance of your trained RL algorithm, you can now evaluate it _from checkpoint_ by copying the command the previous example training run printed.

Running this command will print evaluation results, namely, the rewards achieved by your trained RL algorithm on the `CartPole-v1` environment.

In [8]:
! rllib evaluate /root/ray_results/cartpole-ppo/PPO_CartPole-v1_3bb16_00000_0_2023-03-18_08-31-09/checkpoint_000007 --algo PPO

2023-03-18 08:40:10,744	INFO algorithm.py:1005 -- Ran round 1 of parallel evaluation (1/1 episodes done)
Episode #23: reward: 500.0
[0m

### Distributed training with Ray Train

Ray RLlib is dedicated to reinforcement learning, but what do you do if you need to train models for other types of machine learning, like supervised learning? You can use another Ray library for distributed training in this case: _Ray Train_.

## Hyperparameter tuning with Ray Tune

In [9]:
from ray import tune
import math
import time


#Simulate an expensive training function that depends on two hyperparameters, x and y, read from a config.
def training_function(config):
    x, y = config["x"], config["y"]
    time.sleep(10)
    score = objective(x, y)
    #After sleeping for 10 seconds to simulate training and computing the objective, the 
    #score is reported to tune.
    tune.report(score=score)


#The objective computes the mean of the squares of x and y and returns the square root 
#of this term. This type of objective is fairly common in ML.
def objective(x, y):
    return math.sqrt((x**2 + y**2)/2)


#Use tune.run to initialize hyperparameter optimization on our training_function.
result = tune.run(
    training_function,
    config={
        #A key part is to provide a parameter space for x and y for tune to search over.
        "x": tune.grid_search([-1, -.5, 0, .5, 1]),
        "y": tune.grid_search([-1, -.5, 0, .5, 1])
    })

print(result.get_best_config(metric="score", mode="min"))

0,1
Current time:,2023-03-18 08:52:36
Running for:,00:02:15.24
Memory:,1.5/12.7 GiB

Trial name,status,loc,x,y,iter,total time (s),score
training_function_e994b_00000,TERMINATED,172.28.0.12:13138,-1.0,-1.0,1,10.193,1.0
training_function_e994b_00001,TERMINATED,172.28.0.12:13186,-0.5,-1.0,1,10.05,0.790569
training_function_e994b_00002,TERMINATED,172.28.0.12:13138,0.0,-1.0,1,10.0499,0.707107
training_function_e994b_00003,TERMINATED,172.28.0.12:13186,0.5,-1.0,1,10.0483,0.790569
training_function_e994b_00004,TERMINATED,172.28.0.12:13138,1.0,-1.0,1,10.0472,1.0
training_function_e994b_00005,TERMINATED,172.28.0.12:13186,-1.0,-0.5,1,10.0501,0.790569
training_function_e994b_00006,TERMINATED,172.28.0.12:13138,-0.5,-0.5,1,10.0503,0.5
training_function_e994b_00007,TERMINATED,172.28.0.12:13186,0.0,-0.5,1,10.0493,0.353553
training_function_e994b_00008,TERMINATED,172.28.0.12:13138,0.5,-0.5,1,10.0502,0.5
training_function_e994b_00009,TERMINATED,172.28.0.12:13186,1.0,-0.5,1,10.0474,0.790569


Trial name,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,node_ip,pid,score,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
training_function_e994b_00000,2023-03-18_08-50-35,True,,a2865e213e9242f5a4c2741709618e0a,"0_x=-1,y=-1",0738217da70e,1,172.28.0.12,13138,1.0,10.193,10.193,10.193,1679129435,0,,1,e994b_00000,0.0200734
training_function_e994b_00001,2023-03-18_08-50-38,True,,96496a0a28f24fc7904fb5b942aa64f1,"1_x=-0.5000,y=-1",0738217da70e,1,172.28.0.12,13186,0.790569,10.05,10.05,10.05,1679129438,0,,1,e994b_00001,0.00650549
training_function_e994b_00002,2023-03-18_08-50-45,True,,a2865e213e9242f5a4c2741709618e0a,"2_x=0,y=-1",0738217da70e,1,172.28.0.12,13138,0.707107,10.0499,10.0499,10.0499,1679129445,0,,1,e994b_00002,0.0200734
training_function_e994b_00003,2023-03-18_08-50-48,True,,96496a0a28f24fc7904fb5b942aa64f1,"3_x=0.5000,y=-1",0738217da70e,1,172.28.0.12,13186,0.790569,10.0483,10.0483,10.0483,1679129448,0,,1,e994b_00003,0.00650549
training_function_e994b_00004,2023-03-18_08-50-55,True,,a2865e213e9242f5a4c2741709618e0a,"4_x=1,y=-1",0738217da70e,1,172.28.0.12,13138,1.0,10.0472,10.0472,10.0472,1679129455,0,,1,e994b_00004,0.0200734
training_function_e994b_00005,2023-03-18_08-50-59,True,,96496a0a28f24fc7904fb5b942aa64f1,"5_x=-1,y=-0.5000",0738217da70e,1,172.28.0.12,13186,0.790569,10.0501,10.0501,10.0501,1679129459,0,,1,e994b_00005,0.00650549
training_function_e994b_00006,2023-03-18_08-51-05,True,,a2865e213e9242f5a4c2741709618e0a,"6_x=-0.5000,y=-0.5000",0738217da70e,1,172.28.0.12,13138,0.5,10.0503,10.0503,10.0503,1679129465,0,,1,e994b_00006,0.0200734
training_function_e994b_00007,2023-03-18_08-51-09,True,,96496a0a28f24fc7904fb5b942aa64f1,"7_x=0,y=-0.5000",0738217da70e,1,172.28.0.12,13186,0.353553,10.0493,10.0493,10.0493,1679129469,0,,1,e994b_00007,0.00650549
training_function_e994b_00008,2023-03-18_08-51-15,True,,a2865e213e9242f5a4c2741709618e0a,"8_x=0.5000,y=-0.5000",0738217da70e,1,172.28.0.12,13138,0.5,10.0502,10.0502,10.0502,1679129475,0,,1,e994b_00008,0.0200734
training_function_e994b_00009,2023-03-18_08-51-19,True,,96496a0a28f24fc7904fb5b942aa64f1,"9_x=1,y=-0.5000",0738217da70e,1,172.28.0.12,13186,0.790569,10.0474,10.0474,10.0474,1679129479,0,,1,e994b_00009,0.00650549


2023-03-18 08:52:36,761	INFO tune.py:762 -- Total run time: 136.82 seconds (135.23 seconds for the tuning loop).


{'x': 0, 'y': 0}


Notice how the output of this run is structurally similar to what you saw in the RLlib example. That’s no coincidence, as RLlib (like many other Ray libraries) uses Ray Tune under the hood. If you look closely, you will see `PENDING` runs that wait for execution, as well as `RUNNING` and `TERMINATED` runs. Tune takes care of selecting, scheduling, and executing your training runs automatically.

Specifically, this Tune example finds the best possible choices of parameters `x` and `y` for a `training_function` with a given `objective` we want to minimize. Even though the objective function might look a little intimidating at first, since we compute the sum of squares of `x` and `y`, all values will be non-negative. That means the smallest value is obtained at `x=0` and `y=0`, which evaluates the objective function to `0`.

We do a so-called _grid search_ over all possible parameter combinations. As we explicitly pass in 5 possible values for both `x` and `y`, that’s a total of 25 combinations that get fed into the training function. Since we instruct `training_function` to sleep for 10 seconds, testing all combinations of hyperparameters sequentially would take more than 4 minutes total. Since Ray is smart about parallelizing this workload, this whole experiment took only about 35 seconds for us, but it might take much longer, depending on where you run it.

## Model Serving with Ray Serve

The last of Ray’s high-level libraries we’ll discuss specializes in model serving and is simply called Ray Serve. To see an example of it in action, you need a trained ML model to serve. Luckily, nowadays, you can find many interesting models on the internet that have already been trained for you. For instance, Hugging Face has a variety of models available for you to download directly in Python. The model we’ll use is a language model called GPT-2 that takes text as input and produces text to continue or complete the input. For example, you can prompt a question and GPT-2 will try to complete it.

Serving such a model is a good way to make it accessible. You may not know how to load and run a TensorFlow model on your computer, but you do know how to ask a question in plain English. Model serving hides the implementation details of a solution and lets users focus on providing inputs and understanding outputs of a model.

To proceed, make sure to run `pip install transformers` to install the Hugging Face library that has the model we want to use. With that we can now import and start an instance of Ray’s serve library, load and deploy a GPT-2 model, and ask it for the meaning of life, like so:

In [None]:
from ray import serve
from transformers import pipeline
import requests


#Start serve locally.
serve.start()


#The @serve.deployment decorator turns a function with a request parameter into a serve deployment.
@serve.deployment
def model(request):
    #Loading language_model inside the model function for every request is inefficient, 
    #but it’s the quickest way to show you a deployment.
    language_model = pipeline("text-generation", model="gpt2")
    query = request.query_params["query"]
    #Ask the model to give us at most 100 characters to continue our query.
    return language_model(query, max_length=100)


#Formally deploy the model so that it can start receiving requests over HTTP.
model.deploy()

In [16]:
query = "What's the meaning of life?"
#Use the indispensable requests library to get a response for any question you might have.
response = requests.get(f"http://localhost:8000/model?query={query}")
print(response.text)

[{"generated_text": "What's the meaning of life?\n\nThe meaning of life is the idea that \"being alive\" isn't just a \"real life experience\". There's a lot of life around you, to be human. Life can seem strange at first and confusing at first, but it's the same when you know it is happening. When you have your life, you can be alive. And you can stay in it.\n\nHow are you feeling now?\n\nIt feels like I'm at"}]
