# Advanced Ray Tutorial - Exercise Solutions

First, import everything we'll need and start Ray:

In [None]:
import ray, time, sys
import numpy as np
sys.path.append("../..")
from util.printing import pd, pnd  # convenience methods for printing results.

In [None]:
from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster

setup_ray_cluster(
  num_worker_nodes=2,
  num_cpus_per_node=4,
  collect_log_to_path="/dbfs/path/to/ray_collected_logs"
)
ray.init()

## Exercise 1 in 01: Ray Tasks Revisited

You were asked to convert the regular Python code to Ray code. Here are the three cells appropriately modified.

First, we need the appropriate imports and `ray.init()`.

In [None]:
@ray.remote
def slow_square(n):
    time.sleep(n)
    return n*n

In [None]:
start = time.time()
refs = [slow_square.remote(n) for n in range(4)]
squares = ray.get(refs)
duration = time.time() - start

In [None]:
assert squares == [0, 1, 4, 9]
# should fail until the code modifications are made:
assert duration < 4.1, f'duration = {duration}' 

## Exercise 2 in 01: Ray Tasks Revisited

You were asked to use `ray.wait()` with a shorter timeout, `2.5` seconds. First we need to redefine in this notebook the remote functions we used in that lesson:

In [None]:
@ray.remote
def make_array(n):
    time.sleep(n/10.0)
    return np.random.standard_normal(n)

@ray.remote
def add_arrays(a1, a2):
    time.sleep(a1.size/10.0)
    return np.add(a1, a2)

In [None]:
start = time.time()
array_refs = [make_array.remote(n*10) for n in range(5)]
added_array_refs = [add_arrays.remote(ref, ref) for ref in array_refs]

arrays = []
waiting_refs = list(added_array_refs)  # Assign a working list to the full list of refs
while len(waiting_refs) > 0:            # Loop until all tasks have completed
    # Call ray.wait with:
    #   1. the list of refs we're still waiting to complete,
    #   2. tell it to return immediately as soon as TWO of them complete,
    #   3. tell it wait up to 10 seconds before timing out.
    return_n = 2 if len(waiting_refs) > 1 else 1
    ready_refs, remaining_refs = ray.wait(waiting_refs, num_returns=return_n, timeout=2.5)
    print('Returned {:3d} completed tasks. (elapsed time: {:6.3f})'.format(len(ready_refs), time.time() - start))
    new_arrays = ray.get(ready_refs)
    arrays.extend(new_arrays)
    for array in new_arrays:
        print(f'{array.size}: {array}')
    waiting_refs = remaining_refs  # Reset this list; don't include the completed refs in the list again!
    
print(f"\nall arrays: {arrays}")
pd(time.time() - start, prefix="Total time:")

For a timeout of `2.5` seconds, the second call to `ray.wait()` times out before two tasks finish, so it only returns one completed task. Why did the third and last iteration not time out? (That is, they both successfully returned two items.) It's because all the tasks were running in parallel so they had time to finish. If you use a shorter timeout, you'll see more time outs, where zero or one items are returned. 

Try `1.5` seconds, where all but one iteration times out and returns one item. The first iteration returns two items.
Try `0.5` seconds, where you'll get several iterations that time out and return zero items, while all the other iterations time out and return one item.

## Exercise 3 in 01: Ray Tasks Revisited

You were asked to convert the code to use Ray, especially `ray.wait()`.

In [None]:
@ray.remote
def slow_square(n):
    time.sleep(n)
    return n*n

start = time.time()
refs = [slow_square.remote(n) for n in range(4)]
squares = []
waiting_refs = refs
while len(waiting_refs) > 0:
    finished_refs, waiting_refs = ray.wait(waiting_refs)  # We just assign the second list to waiting_refs...
    squares.extend(ray.get(finished_refs))
duration = time.time() - start

In [None]:
assert squares == [0, 1, 4, 9]
assert duration < 4.1, f'duration = {duration}' 