## 7. Pattern: Pipeline data processing and waiting for results

After launching a number of tasks, you may want to know which ones have finished executing without blocking on all of them. This could be achieved by `ray.wait()`

|<img src="https://assets-training.s3.us-west-2.amazonaws.com/ray-core/ray-core/pipeline-data-processing.png" width="50%" loading="lazy">|
|:--|
|(top panel) Execution timeline when using ray.get() to wait for all results before calling process results. (bottom panel) Execution timeline when using ray.wait() to process results as soon as they become available.|

Let's modify `expensive_square` a bit:

In [None]:
@ray.remote
def expensive_square(x):
    time.sleep(np.random.randint(1, 10))
    return x**2

In [None]:
expensive_compute = []

for i in range(15):
    expensive_compute.append(expensive_square.remote(i))

expensive_compute

Process items as soon as they become available

In [None]:
ready_refs, not_ready_refs = ray.wait(expensive_compute) # wait for next object ref that is ready

# process new item as soon as it becomes available
while not_ready_refs:
    print(f"{ready_refs[0]} is ready; result: {ray.get(ready_refs[0])}")
    print(f"{len(not_ready_refs)} items not ready... \n")

    ready_refs, not_ready_refs = ray.wait(not_ready_refs) # wait for next item

    assert len(ready_refs) == 1, f"len(ready_refs) should be 1, got {len(ready_refs)} instead"

print(f"I'm the last item: {ready_refs[0]}; result: {ray.get(ready_refs[0])}")

<div class="alert alert-info">
Read more about the <strong><a href="https://docs.ray.io/en/latest/ray-core/tips-for-first-time.html#tip-4-pipeline-data-processing" target="_blank">pipeline data processing</a></strong>
</div>

### 7.1 Batch Processing Pattern

Program can wait for a batch of `ObjectRef`, before returning. Let's consider this scenario:

In [None]:
expensive_compute = []

for i in range(15):
    expensive_compute.append(expensive_square.remote(i))

expensive_compute

In [None]:
BATCH_SIZE = 3

ready_refs, not_ready_refs = ray.wait(expensive_compute, num_returns=BATCH_SIZE)  # wait for BATCH_SIZE object refs

# process new item as soon as it becomes available
while not_ready_refs:
    print(f"{ready_refs} are ready; results: {ray.get(ready_refs)}")
    print(f"{len(not_ready_refs)} items not ready... \n")
    ready_refs, not_ready_refs = ray.wait(not_ready_refs, num_returns=BATCH_SIZE)  # wait for BATCH_SIZE object refs

print(f"Last batch {ready_refs}; result: {ray.get(ready_refs)}")

### 7.2 Note on fetching too many objects at once with ray.get causes failure

Calling `ray.get()` on too many objects will lead to **heap out-of-memory** or **object store out-of-space**.

```python
object_refs = [expensive_square.remote(i) for i in range(1_000_000)]

all_results_at_once = ray.get(object_refs)
all_results_at_once
```

Instead fetch and process one batch at a time.

<div class="alert alert-info">
Read more about this <strong><a href="https://docs.ray.io/en/latest/ray-core/patterns/ray-get-too-many-objects.html#anti-pattern-fetching-too-many-objects-at-once-with-ray-get-causes-failure" target="_blank">anti-pattern</a></strong>.
</div>