## 2. Chaining Tasks and Passing Data

Let's say we now want to execute a graph of two tasks:
1. Square a value using `expensive_square`
2. Add 1 to the `expensive_square` result, by using `remote_add`

This can be achieved without fetching an intermediate result.

Anti-pattern:

In [None]:
@ray.remote
def remote_add(a, b):
    return a + b

@ray.remote
def expensive_square(x):
    time.sleep(5)
    return x**2

# 1st task
square_ref = expensive_square.remote(2)
square_value = ray.get(square_ref)

# 2nd task
sum_ref = remote_add.remote(1, square_value)
sum_value = ray.get(sum_ref)

Chain the tasks by passing the `ObjectRef` directly to the second task:

In [None]:
square_ref = expensive_square.remote(2)
sum_ref = remote_add.remote(1, square_ref)
sum_value = ray.get(sum_ref)

In this way Ray doesn't fetch data to the "driver" process, *especially* if the returned object is large.

The term "driver" refers to the process that initiated the connection to the cluster which in this case is the Python process running this notebook.

Under the hood, Ray will still call `ray.get` on the first task

i.e. effectively, Ray will do something like this to make the argument available to the second task:

```python
def expensive_square(x):
    if isinstance(x, ObjectRef):
        x = ray.get(x)
    time.sleep(5)
    return x**2
```

The benefit of this approach is that data at most needs to be transferred once between the first and second task. Instead of going through the driver process. To read more about this, see [Passing object arguments](https://docs.ray.io/en/latest/ray-core/objects.html#passing-object-arguments).

Also note, you can bypass this behavior by wrapping/nesting the object ref in a container object (e.g., a tuple, list, or dict):

In [None]:
ref = expensive_square.remote(1)
out_ref = remote_add.remote([ref], [ref])
ray.get(out_ref) 