# Ray Core: Design Patterns, Anti-patterns and Best Practices
© 2023, Anyscale. All Rights Reserved

Ray has a myriad of design patterns and anti-patterns for [tasks](https://docs.ray.io/en/latest/ray-core/tasks/patterns/index.html#task-patterns) and [actors](https://docs.ray.io/en/latest/ray-core/actors/patterns/index.html). 

These patterns suggest the best practices to you to write distributed applications. By contrast, the patterns and anti-patterns are adivce and admonitions for you to avoid pitfall while using Ray. 

In this tutorial we'll explore a few of these design patterns, anti-patterns, tricks and trips first time Ray users.

## Learning objectives

In this this tutorial, you'll learn about:
 * Some design patterns and anti-patterns
 * Tricks and Tips to avoid when using Ray APIs
 
We won't exhaustively cover all the patterns and anti-pattern. Rather, offer you a glimpse of some common pitfalls. For advanced patterns, take a read at the docs on [Design patterns and anti-patterns](https://docs.ray.io/en/latest/ray-core/patterns/index.html#task-patterns).

In [None]:
import logging
import math
import random
import time
from typing import List, Tuple

import numpy as np
import ray

In [None]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

### Fetching Cluster Information

Many methods return information:

| Method | Brief Description |
| :----- | :---------------- |
| [`ray.get_gpu_ids()`](https://ray.readthedocs.io/en/latest/package-ref.html#ray.get_gpu_ids) | GPUs |
| [`ray.nodes()`](https://ray.readthedocs.io/en/latest/package-ref.html#ray.nodes) | Cluster nodes |
| [`ray.cluster_resources()`](https://ray.readthedocs.io/en/latest/package-ref.html#ray.cluster_resources) | All the available resources, used or not |
| [`ray.available_resources()`](https://ray.readthedocs.io/en/latest/package-ref.html#ray.available_resources) | Resources not in use |

You can see the full list of methods in the [Ray Core](https://docs.ray.io/en/latest/ray-core/api/core.html#core-api) API documention.

In [None]:
print(f"""
ray.get_gpu_ids():          {ray.get_gpu_ids()}
ray.nodes():                {ray.nodes()}
ray.cluster_resources():    {ray.cluster_resources()}
ray.available_resources():  {ray.available_resources()}
""")

In [None]:
ray.nodes()[0]['Resources']['CPU']

## Tips and Tricks and Patterns and Anti-patterns for first-time users
Because Ray's core APIs are simple and flexible, first time users can trip upon certain API calls in Ray's usage patterns. This short tips & tricks will insure you against unexpected results. Below we briefly explore a handful of API calls and their best practices.

### Use @ray.remote and @ray.method to return multiple arguments
Often, you may wish to return more than a single argument from a Ray Task, or 
return more than a single value from an Ray Actor's method. 

Let's look at some examples how you do it.

In [None]:
@ray.remote(num_returns=3)
def tuple3(id: str, lst: List[float]) -> Tuple[str, int, float]:
    one = id.capitalize()
    two = random.randint(5, 10)
    three = sum(lst)
    return (one, two, three)

# Return three object references with three distinct values in each 
x_ref, y_ref, z_ref = tuple3.remote("ray rocks!", [2.2, 4.4, 6.6])

# Fetch the list of references
x, y, z = ray.get([x_ref, y_ref, z_ref])
print(f'{x}, {y}, {z:.2f}')

A slight variation of the above example is pack all values in a single return, and then unpack them.

In [None]:
@ray.remote(num_returns=1)
def tuple3_packed(id: str, lst: List[float]) -> Tuple[str, int, float]:
    one = id.capitalize()
    two = random.randint(5, 10)
    three = sum(lst)
    return (one, two, three)

# Returns one object references with three values in it
xyz_ref = tuple3_packed.remote("ray rocks!", [2.2, 4.4, 6.6])

# Fetch from a single object ref and unpack into three values
x, y, z = ray.get(xyz_ref)
print(f'({x}, {y}, {z:.2f})')

Let's do the same for an Ray actor method, except here
we are using a decorator `@ray.method(num_returns=3)` to decorate
a Ray actor's method.

In [None]:
@ray.remote
class TupleActor:
    @ray.method(num_returns=3)
    def tuple3(self, id: str, lst: List[float]) -> Tuple[str, int, float]:
        one = id.capitalize()
        two = random.randint(5, 10)
        three = sum(lst)
        return (one, two, three)
    
# Create an instance of an actor
actor = TupleActor.remote()
x_ref, y_ref, z_ref = actor.tuple3.remote("ray rocks!", [2.2, 4.4, 5.5])
x, y, z = ray.get([x_ref, y_ref, z_ref])
print(f'({x}, {y}, {z:.2f})')   

### Anti-pattern: Calling ray.get in a loop harms parallelism

With Ray, all invocations of `.remote()` calls are asynchronous, meaning the operation returns immediately with a promise/future object Reference ID. This is key to achieving massive parallelism, for it allows a devloper to launch many remote tasks, each returning a remote future object ID. Whenever needed, this object ID is fetched with `ray.get.` Because `ray.get` is a blocking call, where and how often you use can affect the performance of your Ray application.

**TLDR**: Avoid calling `ray.get()` in a loop since it’s a blocking call; use `ray.get()` only for the final result.


<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Ray_Core/ray-get-loop.png" height="70%" width="70%">

In [None]:
@ray.remote
def do_some_work(x):
    # Assume doing some computation
    time.sleep(0.5)
    return math.exp(x)

#### Bad usage
We use `ray.get` inside a list comprehension loop, hence it blocks on each call of `.remote()`, delaying until the task is finished and the value
is materialized and fetched from the Ray object store.

In [None]:
%%time
results = [ray.get(do_some_work.remote(x)) for x in range(25)]
results[:5]

#### Good usage
We delay `ray.get` after all the tasks have been invoked and their references have been returned. That is, we don't block on each call but instead do outside the comprehension loop.


In [None]:
%%time
results = ray.get([do_some_work.remote(x) for x in range(10)])
results[:5]

### Anti-pattern: Over-parallelizing with too fine-grained tasks harms speedup

Ray APIs are general and simple to use. As a result, new comers' natural instinct is to parallelize all tasks, including tiny ones, which can incur an overhead over time. In short, if the Ray remote tasks are tiny or miniscule in compute, they may take longer to execute than their serial Python equivalents.

**TLDR**: Where possible strive to batch tiny smaller Ray tasks into chuncks to reap the benefits of distributing them.

In [None]:
# Using regular Python task that returns double of the number
def tiny_task(x):
    time.sleep(0.00001)
    return 2 * x

Run this as a regular sequential Python task.

In [None]:
start_time = time.time()
results = [tiny_task(x) for x in range(100000)]
end_time = time.time()
print(f"Ordinary funciton call takes {end_time - start_time:.2f} seconds")

In [None]:
results[:5], len(results)

Now convert this into Ray remote task

In [None]:
@ray.remote
def remote_tiny_task(x):
    time.sleep(0.00001)
    return 2 * x

In [None]:
start_time = time.time()
result_ids = [remote_tiny_task.remote(x) for x in range(100000)]
results = ray.get(result_ids)
end_time = time.time()
print(f"Parallelizing Ray tasks takes {end_time - start_time:.2f} seconds")

In [None]:
results[:5], len(results)

Surprisingly, Ray didn’t improve the execution time. In fact, Ray program is actually much slower in execution time than the sequential program! 

_What's going on?_ 

Well, the issue here is that every task invocation has a non-trivial overhead (e.g., scheduling, inter-process communication, updating the system state), and this overhead dominates the actual time it takes to execute the task.

_What can we do to remedy it?_

One way to mitigate is to make the remote tasks "larger" in order to amortize invocation overhead. This is achieved by aggregating tasks into bigger chunks of 1000.

**Better approach**: Use batching or chunking


In [None]:
@ray.remote
def mega_work(start, end):
    return [tiny_task(x) for x in range(start, end)]

In [None]:
result_ids = []
start_time = time.time()

[result_ids.append(mega_work.remote(x*1000, (x+1)*1000)) for x in range(100)]
# fetch the finihsed results
results = ray.get(result_ids)
end_time = time.time()

print(f"Parallelizing Ray tasks as batches takes {end_time - start_time:.2f} seconds")

A huge difference in execution time!

Breaking or restructuring many small tasks into batches or chunks of large Ray remote tasks, as demonstrated above, achieves significant performance gain.

### Pattern: Using ray.wait to limit the number of pending tasks

| Name | Argument Type |  Description |
| :--- | :---     |  :---------- |
| `ray.get()`     | `ObjectRef` or `List[ObjectRefs]`   | Return a value in the object ref or list of values from the object IDs. This is a synchronous (i.e., blocking) operation. |
| `ray.wait()`    | `List[ObjectRefs]`  | From a list of object IDs, returns (1) the list of IDs of the objects that are ready, and (2) the list of IDs of the objects that are not ready yet. By default, it returns one ready object ID at a time. However, by specifying `num_returns=<value>` it will return all object IDs whose tasks are finished and there respective values materialized and available in the object store. |


As we noted above, an idiomatic way of using `ray.get()` is to delay fetching the object until you need them. Another way is to use it is with `ray.wait()`. Only fetch values that are already available or materialized in the object store. This is a way to [pipeline the execution](https://docs.ray.io/en/latest/ray-core/tips-for-first-time.html#tip-4-pipeline-data-processing), especially when you want to process the results of completed Ray tasks as soon as they are available.

|<img src="https://technical-training-assets.s3.us-west-2.amazonaws.com/Ray_Core/core-data-pipeline.png" height="40%" width="60%">|
|:--|
|Execution timeline in both cases: when using `ray.get()` to wait for all results to become available before processing them, and using `ray.wait()` to start processing the results as soon as they become available.|


If we use `ray.get()` on the results of multiple tasks we will have to wait until the last one of these tasks finishes. This can be an issue if tasks take widely different amounts of time.

To illustrate this issue, consider the following example where we run four `transform_images()` tasks in parallel, with each task taking a time uniformly distributed between 0 and 4 seconds. Next, assume the results of these tasks are processed by `classify_images()`, which takes 1 sec per result. The expected running time is then (1) the time it takes to execute the slowest of the `transform_images()` tasks, plus (2) 4 seconds which is the time it takes to execute `classify_images()`.

Let's look at a simple example.

In [None]:
from PIL import Image, ImageFilter

In [None]:
random.seed(42)

In [None]:
import time
import random
import ray

@ray.remote
def transform_images(x):
    imarray = np.random.rand(x, x , 3) * 255
    img = Image.fromarray(imarray.astype('uint8')).convert('RGBA')
    
    # Make the image blur with specified intensify
    img = img.filter(ImageFilter.GaussianBlur(radius=20))
    
    time.sleep(random.uniform(0, 4)) # Replace this with extra work you need to do.
    return img

def predict(image):
    size = image.size[0]
    if size == 16 or size == 32:
        return 0
    elif size == 64 or size == 128:
        return 1
    elif size == 256:
        return 2
    else:
        return 3

def classify_images(images):
    preds = []
    for image in images:
        pred = predict(image)
        time.sleep(1)
        preds.append(pred)
    return preds

def classify_images_inc(images):
    preds = [predict(img) for img in images]
    time.sleep(1)
    return preds

SIZES = [16, 32, 64, 128, 256, 512]

#### Not using ray.wait and no pipelining

In [None]:
start = time.time()
# Transform the images first and then get the images
images = ray.get([transform_images.remote(image) for image in SIZES])

# After all images are transformed, classify them
predictions = classify_images(images)
print(f"Duration without pipelining: {round(time.time() - start, 2)} seconds; predictions: {predictions}")

#### Using ray.wait and pipelining

In [None]:
start = time.time()
result_images_refs = [transform_images.remote(image) for image in SIZES] 
predictions = []

# Loop until all tasks are finished
while len(result_images_refs):
    done_image_refs, result_images_refs = ray.wait(result_images_refs, num_returns=1)
    preds = classify_images_inc(ray.get(done_image_refs))
    predictions.extend(preds)
print(f"Duration with pipelining: {round(time.time() - start, 2)} seconds; predictions: {predictions}")

**Notice**: You get some incremental difference. However, for compute intensive and many tasks, and over time, this difference will be in order of magnitude.

For large number of tasks in flight, use `ray.get()` and `ray.wait()` to implement pipeline execution of processing completed tasks.

**TLDR**: Use pipeline execution to process results returned from the finished Ray tasks using `ray.get()` and `ray.wait()`

#### Exercise for Pipelining:
 * Extend or add more images of sizes: 1024, 2048, ...
 * Increase the number of returns to 2, 3, or 4 from the `ray.wait`()`
 * Process the images
 
 Is there a difference in processing time between serial and pipelining?

### Anti-pattern: Passing the same large argument by value repeatedly harms performance

When passing a large argument (>100KB) by value to a task, Ray will implicitly store the argument in the object store and the worker process will fetch the argument to the local object store from the caller’s object store before running the task. If we pass the same large argument to multiple tasks, Ray will end up storing multiple copies of the argument in the object store since Ray doesn’t do deduplication.

Instead of passing the large argument by value to multiple tasks, we should use `ray.put()` to store the argument to the object store once and get an ObjectRef, then pass the argument reference to tasks. This way, we make sure all tasks use the same copy of the argument, which is faster and uses less object store memory.

**TLDR**: Avoid passing the same large argument by value to multiple tasks, use ray.put() and pass by reference instead.

In [None]:
@ray.remote
def do_work(a):
    # do some work with the large object a
    return np.sum(a)

Bad Usage

In [None]:
random.seed(42)

start = time.time()
a = np.random.rand(5000, 5000)

# Sending the big array to each remote task, which will
# its copy of the same data into its object store
result_ids = [do_work.remote(a) for x in range(10)]

results = math.fsum(ray.get(result_ids))
print(f" results = {results:.2f} and duration = {time.time() - start:.3f} sec")

**Better approach**: Put the value in the object store and only send the reference

In [None]:
start = time.time()
# Adding the big array into the object store
a_id_ref = ray.put(a)

# Now send the objectID ref
result_ids = [do_work.remote(a_id_ref) for x in range(10)]
results = math.fsum(ray.get(result_ids))
print(f" results = {results:.2f} and duration = {time.time() - start:.3f} sec")

### Excercise 

1. Try with different array sizes. Does it make a difference in processing time?
2. Have a go at this [Tree of Actors](https://docs.ray.io/en/latest/ray-core/patterns/tree-of-actors.html) design pattern we covered in the last lesson.

### Recap
In this short tutorial, we got a short glimpse at design pattern, anti-pattern, and tricks and tips. By no means it is comprehensive, but we touched upon some methods we have seen in the previous lessons. With those methods, we explored additional arguments to the `.remote()` call such as number of return statements.

More importantly, we walked through some tips and tricks that many developers new to Ray can easily stumble upon. Although the examples were short and simple, the lessons behind the cautionary tales are important part of the learning process.

### Homework 

There is a advanced and comprehensive list of all [Ray design patterns and anti-design patterns](https://docs.ray.io/en/latest/ray-core/patterns/index.html#design-patterns-anti-patternsray.shutdown()) you can explore at after the class at home.

### Additional Resource on Best Practices
 * [User Guides for Ray Clusters](https://docs.ray.io/en/latest/cluster/vms/user-guides/index.html)
 * [Best practices for deploying large clusters](https://docs.ray.io/en/latest/cluster/vms/user-guides/large-cluster-best-practices.html)
 * [Launching an On-Premise Cluster](https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html)
 * [Configuring Autoscaling](https://docs.ray.io/en/latest/cluster/vms/user-guides/configuring-autoscaling.html)

In [None]:
ray.shutdown()