# An Overview of Ray

Python is likely the most popular language for data science today, and it's certainly
the one I find the most useful for my daily work.
By now it's over 30 years old, but has a still growing and active community.
The rich PyData ecosystem is an essential part of a data scientist's toolbox.
How can you make sure to scale out your workloads while still leveraging the tools you need?
That's a difficult problem, especially since communities can't be forced to just toss their toolbox,
or programming language.
That means distributed computing tools for data science have to be built for their existing community.

In this chapter you'll get a first glimpse at what Ray can do for you.
We will discuss the three layers that make up Ray, namely its core engine, its high-level libraries and its ecosystem.
Throughout the chapter we'll show you first code examples to give you a feel for Ray,
but we defer any in-depth treatment of Ray's APIs and components to later chapters.
You can view this chapter as an overview of the whole book as well.

![Ray Layers](https://raw.githubusercontent.com/maxpumperla/learning_ray/main/notebooks/images/chapter_01/ray_layers.png)

## A distributed computing framework
At its core, Ray is a distributed computing framework.
We'll  provide you with just the basic terminology here, and talk about Ray's architecture in depth in chapter 2.
In short, Ray sets up and manages clusters of computers so that you can run distributed tasks on them.
A ray cluster consists of nodes that are connected to each other via a network.
You program against the so-called _driver_, the program root, which lives on the _head node_.
The driver can run _jobs_, that is a collection of tasks, that are run on the nodes in the cluster.
Specifically, the individual tasks of a job are run on _worker_ processes on _worker nodes_.

![Ray cluster](https://raw.githubusercontent.com/maxpumperla/learning_ray/main/notebooks/images/chapter_01/simple_cluster.png)

What's interesting is that a Ray cluster can also be a _local cluster_, i.e. a cluster
consisting just of your own computer.
In this case, there's just one node, namely the head node, which has the driver process and some worker processes.

With that knowledge at hand, it's time to get your hands dirty and run your first local Ray cluster.
Installing Ray on any of the major operating systems should work seamlessly using `pip`:

```
pip install "ray[rllib, tune, serve]"
```

With a simple `pip install ray` you would have installed just the very basics of Ray.
Since we want to explore some advanced features, we installed the "extras" `rllib` and `tune`,
which we'll discuss in a bit.
Depending on your system configuration you may not need the quotation marks in the above installation command.

Next, go ahead and start a Python session.
You could use the `ipython` interpreter, which I find to be the most suitable environment
for following along simple examples.
The choice is up to you, but in any case please remember to use Python version `3.7` or later.
In your Python session you can now easily import and initialize Ray as follows:

In [1]:
# tag::init[]
import ray
ray.init()
# end::init[]

2022-10-04 15:37:21,645	INFO worker.py:1518 -- Started a local Ray instance.


0,1
Python version:,3.9.13
Ray version:,2.0.0


## Data Processing with Ray Data
The first high-level library of Ray we talk about is called "Ray Data".
This library contains a data structure aptly called `Dataset`, a multitude of connectors for loading data from
various formats and systems, an API for transforming such datasets, a way to build data processing pipelines
with them, and many integrations with other data processing frameworks.
The `Dataset` abstraction builds on the powerful [Arrow framework](https://arrow.apache.org/).

To use Ray Data, you need to install Arrow for Python, for instance by running `pip install pyarrow`.
We'll now discuss a simple example that creates a distributed `Dataset` on your local Ray cluster from a Python
data structure. Specifically, you'll create a dataset from a Python dictionary containing a string `name`
and an integer-valued `data` for `10000` entries:

In [2]:
# tag::ray_data_load[]
import ray

items = [{"name": str(i), "data": i} for i in range(10000)]
ds = ray.data.from_items(items)   # <1>
ds.show(5)  # <2>
# end::ray_data_load[]

{'name': '0', 'data': 0}
{'name': '1', 'data': 1}
{'name': '2', 'data': 2}
{'name': '3', 'data': 3}
{'name': '4', 'data': 4}


In [3]:
ds

Dataset(num_blocks=200, num_rows=10000, schema={name: string, data: int64})

Great, now you have some distributed rows, but what can you do with that data?
The `Dataset` API bets heavily on functional programming, as it is very well suited for data transformations.
Even though Python 3 made a point of hiding some of its functional programming capabilities, you're probably
familiar with functionality such as `map`, `filter` and others.
If not, it's easy enough to pick up.
`map` takes each element of your dataset and transforms is into something else, in parallel.
`filter` removes data points according to a boolean filter function.
And the slightly more elaborate `flat_map` first maps values similarly to `map`, but then also "flattens" the result.
For instance, if `map` would produce a list of lists, `flat_map` would flatten out the nested lists and give
you just a list.
Equipped with these three functional API calls, let's see how easily you can transform your dataset `ds`:

In [4]:
# tag::ray_data_transform[]
squares = ds.map(lambda x: x["data"] ** 2)  # <1>

evens = squares.filter(lambda x: x % 2 == 0)  # <2>
evens.count()

cubes = evens.flat_map(lambda x: [x, x**3])  # <3> ([1,1],[2,8],...)
sample = cubes.take(10)  # <4>
print(sample)
# end::ray_data_transform[]

Map: 100%|██████████| 200/200 [00:01<00:00, 159.38it/s]
Filter: 100%|██████████| 200/200 [00:00<00:00, 975.71it/s]
Flat_Map: 100%|██████████| 200/200 [00:00<00:00, 1014.48it/s]

[0, 0, 4, 64, 16, 4096, 36, 46656, 64, 262144]





In [9]:
cubes

Dataset(num_blocks=200, num_rows=10000, schema=<class 'int'>)

The drawback of `Dataset` transformations is that each step gets executed synchronously.
In the above example this is a non-issue, but for complex tasks that e.g. mix reading files and processing data,
you want an execution that can overlap the individual tasks.
`DatasetPipeline` does exactly that.
Let's rewrite the last example into a pipeline.

In [6]:
# tag::ray_data_pipeline[]
pipe = ds.window()  # <1>
result = pipe\
    .map(lambda x: x["data"] ** 2)\
    .filter(lambda x: x % 2 == 0)\
    .flat_map(lambda x: [x, x**3])  # <2>
result.show(10)
# end::ray_data_pipeline[]

2022-10-04 15:42:24,805	INFO dataset.py:3276 -- Created DatasetPipeline with 20 windows: 7390b min, 8000b max, 7944b mean
2022-10-04 15:42:24,807	INFO dataset.py:3286 -- Blocks per window: 10 min, 10 max, 10 mean
2022-10-04 15:42:24,816	INFO dataset.py:3325 -- ✔️  This pipeline's windows likely fit in object store memory without spilling.
Stage 1:   5%|▌         | 1/20 [00:00<00:15,  1.21it/s]
Stage 0:  10%|█         | 2/20 [00:00<00:07,  2.41it/s]

0
0
4
64
16
4096
36
46656
64
262144





In [7]:
pipe

DatasetPipeline(num_windows=20, num_stages=1)

In [10]:
result

DatasetPipeline(num_windows=20, num_stages=4)

## Reinforcement Learning with Ray RLlib
We'll look at a fairly classical control problem of balancing a pendulum.
Imagine you have a pendulum like the one in the following figure, fixed at as single point and subject to gravity.
You can manipulate that pendulum by giving it a push from the left or the right.
If you assert just the right amount of force, the pendulum might remain in an upright position.
That's our goal - and the question is whether we can teach a reinforcement learning algorithm to do so for us.

# ![Pendulum problem](https://raw.githubusercontent.com/maxpumperla/learning_ray/main/notebooks/images/chapter_01/pendulum.png)

Specifically, we want to train a reinforcement learning agent that can push to the left or right,
thereby acting on its environment (manipulating the pendulum) to reach the "upright position" goal
for which it will be rewarded.
To tackle this problem with Ray RLlib, store the following content in a file called `pendulum.yml`:

```yaml
pendulum-ppo:
  env: Pendulum-v1  # <1>
  run: PPO  # <2>
  checkpoint_freq: 5  # <3>
  stop:
    episode_reward_mean: -800  # <4>
  config:
    lambda: 0.1  # <5>
    gamma: 0.95
    lr: 0.0003
    num_sgd_iter: 6
```

The details of this configuration file don't matter much at this point, don't get distracted by them.
The important part is that you specify the built-in `Pendulum-v1` environment and sufficient RL-specific
configuration to ensure the training procedure works.
The configuration is a simplified version of one of Ray's
[tuned examples](https://github.com/ray-project/ray/tree/master/rllib/tuned_examples).
We chose this one because it doesn't require any special hardware and finishes in a matter of minutes.
If your computer is powerful enough, you can try to run the tuned example as well, which should yield much better
results.
To train this pendulum example you can now simply run:

```shell
rllib train -f pendulum.yml
```

If you want, you can check the output of this Ray program and see how the different metrics evolve during
the training procedure.
Assuming the training program finished, we can now check how well it worked.
To visualize the trained pendulum you need to install one more Python library with `pip install pyglet`.
The only other thing you need to figure out is where Ray stored your training progress.
When you run `rllib train` for an experiment, Ray will create a unique experiment ID for you and stores
results in a sub-folder of `~/ray-results` by default.
For the training configuration we used, you should see a folder with results that looks
like `~/ray_results/pendulum-ppo/PPO_Pendulum-v1_<experiment_id>`.
During the training procedure intermediate model checkpoints get generated in the same folder.
For instance, I have a folder on my machine called:

```shell
 ~/ray_results/pendulum-ppo/PPO_Pendulum-v1_20cbf_00000_0_2021-09-24_15-20-03/\
  checkpoint_000029/checkpoint-29
```

Once you figured out the experiment ID and chose a checkpoint ID (as a rule of thumb the larger the ID, the
better the results), you can evaluate the training performance of your pendulum training run like this
(we'll explain what `rollout` means in this context in <<chapter_03>> and <<chapter_04>>):

```shell
rllib rollout \
  ~/ray_results/pendulum-ppo/PPO_Pendulum-v1_<experiment_id> \
  /checkpoint_000<cp-id>/checkpoint-<cp-id> \
  --run PPO --env Pendulum-v1 --steps 2000
```

You should see an animation of a pendulum controlled by an agent that looks like the figure of the pendulum
from earlier.
Since we opted for a quick training procedure instead of maximizing performance, you should see the agent
struggle with the pendulum exercise.
We could have done much better, and if you're interested to scan Ray's tuned examples for the `Pendulum-v1`
environment, you'll find an abundance of solutions to this exercise.
The point of this example was to show you how simple it can be to train and evaluate
reinforcement learning tasks with RLlib, using just two command line calls to `rllib`.

## Distributed training with Ray Train

Ray RLlib is dedicated to reinforcement learning, but what do you do if you need to train models for
other types of machine learning, like supervised learning?
You can use another Ray library for distributed training in this case, called _Ray Train_.
At this point, we don't have built up enough knowledge of frameworks such as `TensorFlow` to give you a
concrete and informative example for Ray Train.
It also doesn't make sense right now to dive into deep learning or explain what Train is, for that matter.
We'll discuss this in chapter 6, when it's time to.
But we can at least roughly sketch what a distributed training "wrapper" for an ML model would look like.
A schematic procedure for running distributed deep learning with Ray Train looks as follows.

In [1]:
# tag::ray_train_sketch[]
from ray.train import Trainer


def training_function():  # <1>
    pass


trainer = Trainer(backend="tensorflow", num_workers=4)  # <2>
trainer.start()

results = trainer.run(training_function)  # <3>
trainer.shutdown()
# end::ray_train_sketch[]

  trainer = Trainer(backend="tensorflow", num_workers=4)  # <2>
2022-10-04 18:02:02,352	INFO worker.py:1518 -- Started a local Ray instance.
2022-10-04 18:02:03,605	INFO trainer.py:160 -- GPUs are detected in your Ray cluster, but GPU training is not enabled for Ray Train. To enable GPU training, make sure to set `use_gpu` to True when instantiating your Trainer.
2022-10-04 18:02:03,607	INFO trainer.py:247 -- Trainer logs will be logged in: /home/serendipita/ray_results/train_2022-10-04_18-02-03


ModuleNotFoundError: TensorFlow isn't installed. To install TensorFlow, run 'pip install tensorflow'.

## Hyperparameter Tuning with Ray Tune
Naming things is hard, but the Ray team hit the spot with _Ray Tune_, which you can use to tune all
sorts of parameters.
Specifically, it was built to find good hyperparameters for machine learning models.
The typical setup is as follows:

- You want to run an extremely computationally expensive training function. In ML it's not uncommon
  to run training procedures that take days, if not weeks, but let's say you're dealing with just a couple of minutes.
- As result of training, you compute a so-called objective function. Usually you either want to maximize
  your gains or minimize your losses in terms of performance of your experiment.
- The tricky bit is that your training function might depend on certain parameters,
  hyperparameters, that influence the value of your objective function.
- You may have a hunch what individual hyperparameters should be, but tuning them all can be difficult.
  Even if you can restrict these parameters to a sensible range, it's usually prohibitive to test a wide
  range of combinations. Your training function is simply too expensive.

What can you do to efficiently sample hyperparameters and get "good enough" results on your objective?
The field concerned with solving this problem is called _hyperparameter optimization_ (HPO), and Ray Tune has
an enormous suite of algorithms for tackling it.
Let's look at a first example of Ray Tune used for the situation we just explained.
The focus is yet again on Ray and its API, and not on a specific ML task (which we simply simulate for now).

In [2]:
# tag::ray_tune[]
from ray import tune
import math
import time


def training_function(config):  # <1>
    x, y = config["x"], config["y"]
    time.sleep(10)
    score = objective(x, y)
    tune.report(score=score)  # <2>


def objective(x, y):
    return math.sqrt((x**2 + y**2)/2)  # <3>


result = tune.run(  # <4>
    training_function,
    config={
        "x": tune.grid_search([-1, -.5, 0, .5, 1]),  # <5>
        "y": tune.grid_search([-1, -.5, 0, .5, 1])
    })

print(result.get_best_config(metric="score", mode="min"))
# end::ray_tune[]

  from .autonotebook import tqdm as notebook_tqdm


Trial name,status,loc,x,y,iter,total time (s),score
training_function_c047f_00000,TERMINATED,192.168.43.126:8425,-1.0,-1.0,1,10.0501,1.0
training_function_c047f_00001,TERMINATED,192.168.43.126:8465,-0.5,-1.0,1,10.0523,0.790569
training_function_c047f_00002,TERMINATED,192.168.43.126:8467,0.0,-1.0,1,10.0564,0.707107
training_function_c047f_00003,TERMINATED,192.168.43.126:8469,0.5,-1.0,1,10.0478,0.790569
training_function_c047f_00004,TERMINATED,192.168.43.126:8471,1.0,-1.0,1,10.0468,1.0
training_function_c047f_00005,TERMINATED,192.168.43.126:8473,-1.0,-0.5,1,10.0477,0.790569
training_function_c047f_00006,TERMINATED,192.168.43.126:8475,-0.5,-0.5,1,10.0502,0.5
training_function_c047f_00007,TERMINATED,192.168.43.126:8477,0.0,-0.5,1,10.0482,0.353553
training_function_c047f_00008,TERMINATED,192.168.43.126:8478,0.5,-0.5,1,10.056,0.5
training_function_c047f_00009,TERMINATED,192.168.43.126:8480,1.0,-0.5,1,10.0526,0.790569


Result for training_function_c047f_00000:
  date: 2022-10-04_18-03-39
  done: false
  experiment_id: ac4d03a18b5444ad8f642b618f1f53ef
  hostname: serendipita-IdeaPad-L340-15IRH-Gaming
  iterations_since_restore: 1
  node_ip: 192.168.43.126
  pid: 8425
  score: 1.0
  time_since_restore: 10.050139665603638
  time_this_iter_s: 10.050139665603638
  time_total_s: 10.050139665603638
  timestamp: 1664924619
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: c047f_00000
  warmup_time: 0.005379438400268555
  
Result for training_function_c047f_00000:
  date: 2022-10-04_18-03-39
  done: true
  experiment_id: ac4d03a18b5444ad8f642b618f1f53ef
  experiment_tag: 0_x=-1,y=-1
  hostname: serendipita-IdeaPad-L340-15IRH-Gaming
  iterations_since_restore: 1
  node_ip: 192.168.43.126
  pid: 8425
  score: 1.0
  time_since_restore: 10.050139665603638
  time_this_iter_s: 10.050139665603638
  time_total_s: 10.050139665603638
  timestamp: 1664924619
  timesteps_since_restore: 0
  training_iterati

2022-10-04 18:03:59,951	INFO tune.py:758 -- Total run time: 34.95 seconds (32.87 seconds for the tuning loop).


{'x': 0, 'y': 0}


## Model Serving with Ray Serve

The last of Ray's high-level libraries we'll discuss specializes on model serving and is simply called _Ray Serve_.
To see an example of it in action, you need a trained ML model to serve.
Luckily, nowadays you can find many interesting models on the internet that have already been trained for you.
For instance, _Hugging Face_ has a variety of models available for you to download directly in Python.
The model we'll use is a language model called _GPT-2_ that takes text as input and produces text to
continue or complete the input.
For example, you can prompt a question and GPT-2 will try to complete it.

Serving such a model is a good way to make it accessible.
You may not now how to load and run a TensorFlow model on your computer, but you do now how
to ask a question in plain English.
Model serving hides the implementation details of a solution and lets users focus on providing
inputs and understanding outputs of a model.

To proceed, make sure to run `pip install transformers` to install the Hugging Face library
that has the model we want to use.
With that we can now import and start an instance of Ray's `serve` library, load and deploy a GPT-2
model and ask it for the meaning of life, like so:

In [None]:
# tag::ray_serve[]
from ray import serve
from transformers import pipeline
import requests

serve.start()  # <1>


@serve.deployment  # <2>
def model(request):
    language_model = pipeline("text-generation", model="gpt2")  # <3>
    query = request.query_params["query"]
    return language_model(query, max_length=100)  # <4>


model.deploy()  # <5>

query = "What's the meaning of life?"
response = requests.get(f"http://localhost:8000/model?query={query}")  # <6>
print(response.text)
# end::ray_serve[]