# Exercise 7 - Process Tasks in Order of Completion

**GOAL:** The goal of this exercise is to show how to use `ray.wait` to process tasks in the order that they finish.

See the documentation for ray.wait at https://ray.readthedocs.io/en/latest/package-ref.html?highlight=ray.wait#ray.wait.

## Concepts for this exercise - `ray.wait`

After launching a number of tasks, you may want to run the results sequentially. To do so, we build off of exercise 6 and use `ray.wait` to execute the results sequentially. 

We are able to use `ray.wait` because the two lists returned by **`ray.wait` maintains the ordering of the input list**. That is, if `f` is a remote function, the code 
```python
    results = ray.wait([f.remote(i) for i in range(100)], num_returns=10)
```
will return `(ready_list, remain_list)` and the `ObjectID`s of in those lists will be ordered by the argument passed to `f` above.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import ray
import time

ray.init(num_cpus=5, ignore_reinit_error=True)  # include_webui=False

# Sleep a little to improve the accuracy of the timing measurements used below,
# because some workers may still be starting up in the background.
time.sleep(2.0)

2021-02-02 17:18:44,522	INFO services.py:1173 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


In [2]:
@ray.remote
def f():
    time.sleep(np.random.uniform(0, 5))
    return time.time()

**EXERCISE:** Change the code below to use `ray.wait` to get the results of the tasks in the order that they complete.

**NOTE:** It would be a simple modification to maintain a pool of 10 experiments and to start a new experiment whenever one finishes.

In [3]:
start_time = time.time()

remaining_result_ids = [f.remote() for _ in range(10)]

# Get the results.
results = []
while len(remaining_result_ids) > 0:
    # EXERCISE: Instead of simply waiting for the first result from
    # remaining_result_ids, use ray.wait to get the first one to finish.
    result_ids, remaining_result_ids = ray.wait(remaining_result_ids, num_returns=1)
    result = ray.get(result_ids[0])
    results.append(result)
    print('Processing result which finished after {} seconds.'
          .format(result - start_time))

end_time = time.time()
duration = end_time - start_time

Processing result which finished after 0.9700706005096436 seconds.
Processing result which finished after 1.679121494293213 seconds.
Processing result which finished after 2.673194646835327 seconds.
Processing result which finished after 2.9632163047790527 seconds.
Processing result which finished after 3.7022712230682373 seconds.
Processing result which finished after 4.604336261749268 seconds.
Processing result which finished after 4.847353935241699 seconds.
Processing result which finished after 5.472399711608887 seconds.
Processing result which finished after 6.5434792041778564 seconds.
Processing result which finished after 7.087517976760864 seconds.


**VERIFY:** Run some checks to verify that the changes you made to the code were correct. Some of the checks should fail when you initially run the cells. After completing the exercises, the checks should pass.

In [4]:
assert results == sorted(results), ('The results were not processed in the '
                                    'order that they finished.')

print('Success! The example took {} seconds.'.format(duration))

Success! The example took 7.088518857955933 seconds.


In [1]:
# use Dask futures
from distributed import as_completed, Client, LocalCluster
import time
import numpy as np


def f(_):
    time.sleep(np.random.uniform(0, 5))
    return time.time()

client = Client(LocalCluster(n_workers=4, threads_per_worker=1))
client

0,1
Client  Scheduler: tcp://127.0.0.1:64199  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 19.28 GB


In [2]:
start_time = time.time()

# Get the results.
results = []
remaining_result_ids = as_completed(client.map(f, range(10)))
for future in remaining_result_ids:
    result = future.result()
    results.append(result)
    print('Processing result which finished after {} seconds.'
          .format(result - start_time))

end_time = time.time()
duration = end_time - start_time

Processing result which finished after 0.4310309886932373 seconds.
Processing result which finished after 1.0090818405151367 seconds.
Processing result which finished after 1.837120771408081 seconds.
Processing result which finished after 2.0781400203704834 seconds.
Processing result which finished after 3.8248324394226074 seconds.
Processing result which finished after 4.0518715381622314 seconds.
Processing result which finished after 4.259101867675781 seconds.
Processing result which finished after 4.440119743347168 seconds.
Processing result which finished after 6.056450605392456 seconds.
Processing result which finished after 6.633112668991089 seconds.


In [3]:
assert results == sorted(results), ('The results were not processed in the '
                                    'order that they finished.')

print('Success! The example took {} seconds.'.format(duration))

Success! The example took 6.644113540649414 seconds.
