# Learn Ray Basics

In [1]:
# !pip install -q ray

In [69]:
import re
import ray

In [4]:
ray.init(num_cpus=4, num_gpus=0)

{'node_ip_address': '192.168.13.128',
 'raylet_ip_address': '192.168.13.128',
 'redis_address': '192.168.13.128:6379',
 'object_store_address': '/tmp/ray/session_2021-12-27_12-37-38_097566_67762/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-12-27_12-37-38_097566_67762/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2021-12-27_12-37-38_097566_67762',
 'metrics_export_port': 58033,
 'node_id': '5aa5d433c337fdbfcdee3ca1fcbe1bdbd24f301c159c0add6430ad2d'}

---

# Why ```ray.wait([worker.remote() for i in range(4)])``` fails

When a remote function **returns multiple values**, the ```ray.wait([jobs])``` fails. Why?

In [35]:
@ray.remote(num_returns=2)
def worker():
    return 'X', 'Y' # <---- Return a list of two objects because of "num_returns=2"

In [46]:
ray.wait([worker.remote() for i in range(4)])

TypeError: wait() expected a list of ray.ObjectRef, got list containing <class 'list'>

## Reason

***```[worker.remote() for i in range(4)]```*** creates ```List[List[ObjectID]]``` which is a **list of lists** where each inner list contains the object references to 'X', 'Y' from each worker as a list. 


```
jobs = [
    [object_ref to 'X', object_ref to 'Y'],   # <--- Worker 1 returns a list of ['X', 'Y']
    [object_ref to 'X', object_ref to 'Y'],   # <--- Worker 2 returns a list of ['X', 'Y']
    [object_ref to 'X', object_ref to 'Y'],   # <--- Worker 3 returns a list of ['X', 'Y']
    [object_ref to 'X', object_ref to 'Y'],   # <--- Worker 4 returns a list of ['X', 'Y']
]
```

In [47]:
# --------------------------------------------------------------------------------
# Verify [worker.remote() for i in range(4)] creates List[List[ObjectID]].
# --------------------------------------------------------------------------------
jobs = [worker.remote() for i in range(4)]
for instance in jobs:
    print(f"Instance type is {type(instance)}. Content is \n{instance}\n")

Instance type is <class 'list'>. Content is 
[ObjectRef(7513710212de102affffffffffffffffffffffff0100000001000000), ObjectRef(7513710212de102affffffffffffffffffffffff0100000002000000)]

Instance type is <class 'list'>. Content is 
[ObjectRef(eb7cccec83cc166cffffffffffffffffffffffff0100000001000000), ObjectRef(eb7cccec83cc166cffffffffffffffffffffffff0100000002000000)]

Instance type is <class 'list'>. Content is 
[ObjectRef(21711c35be2858f6ffffffffffffffffffffffff0100000001000000), ObjectRef(21711c35be2858f6ffffffffffffffffffffffff0100000002000000)]

Instance type is <class 'list'>. Content is 
[ObjectRef(5e4556eab3523b9cffffffffffffffffffffffff0100000001000000), ObjectRef(5e4556eab3523b9cffffffffffffffffffffffff0100000002000000)]



```ray.wait(jobs)``` fails as ```jobs``` does not match the function signature expecting **```List[ObjectID]```** becasue it is  **```List[List[ObjectID]]```**.

* [ray.wait(object_ids, num_returns=1, timeout=None, worker=<ray.worker.Worker object>)](https://docs.ray.io/en/stable/api.html?highlight=wait#ray.wait)

> ### Parameters:	
> * **object_ids (List[ObjectID])** – List of object IDs for objects that may or may not be ready. Note that these IDs must be unique.
> *num_returns (int) – The number of object IDs that should be returned.
> *timeout (float) – The maximum amount of time in seconds to wait before returning.

In [43]:
ray.wait(jobs)

TypeError: wait() expected a list of ray.ObjectRef, got list containing <class 'list'>

## Workaround

Transform ```List[List[ObjectID]]``` into ```List[ObjectID]``` by flattening it. However, it breaks the semantics of the return ```(X, Y)``` from each function as a unit.

In [44]:
sum(jobs, start=[])

[ObjectRef(5497aa04f981e4a1ffffffffffffffffffffffff0100000001000000),
 ObjectRef(5497aa04f981e4a1ffffffffffffffffffffffff0100000002000000),
 ObjectRef(909a212b104ea2f1ffffffffffffffffffffffff0100000001000000),
 ObjectRef(909a212b104ea2f1ffffffffffffffffffffffff0100000002000000),
 ObjectRef(6f4f08f301901921ffffffffffffffffffffffff0100000001000000),
 ObjectRef(6f4f08f301901921ffffffffffffffffffffffff0100000002000000),
 ObjectRef(ff1eb204d30f6c0fffffffffffffffffffffffff0100000001000000),
 ObjectRef(ff1eb204d30f6c0fffffffffffffffffffffffff0100000002000000)]

# Solution - Relearn the Concept

1. Understand **ray.reamote returns object reference(s)**, NOT **ray.remote returns job id** which is wrong.
2. Think **ray.wait(object_references)**, NOT **ray.wait(job_ids)**.

Tell the ```ray.remote``` function to return single value, **NOT multiple values**.

In [51]:
@ray.remote(num_returns=1)
def fixed_worker():
    return 'X', 'Y' # <--- Returns a single object refeence due to "num_returns=1"

In [66]:
references_to_future_objects = ([fixed_worker.remote() for i in range(4)])

while len(references_to_future_objects):
    references_to_available_objects, references_to_future_objects = ray.wait(references_to_future_objects)
    for reference in references_to_available_objects:
        #print(dir(reference))
        print("{job_id} is done. Result is {result}".format(
            job_id=reference.task_id(), result=ray.get(reference))
        )

TaskID(3a4bb9b95c2938a4ffffffffffffffffffffffff01000000) is done. Result is ('X', 'Y')
TaskID(9e8c0eaa9bab673cffffffffffffffffffffffff01000000) is done. Result is ('X', 'Y')
TaskID(30c04f84db70b40cffffffffffffffffffffffff01000000) is done. Result is ('X', 'Y')
TaskID(b6b1bf9bcf8721b5ffffffffffffffffffffffff01000000) is done. Result is ('X', 'Y')


---

# Using put/get

* [Put and Get](https://docs.ray.io/en/stable/tutorial.html#put-and-get)

> ```ray.put(object)``` takes a Python object and copies it to the local object store in the node where it is executed. The local object is **immutable**.
> 
> ```ray.put(object)``` returns a reference which identifies the now remote-able object. If we save it in a variable ```ref``` e.g ```ref = ray.put(x)```, remote functions can take it as its argument and operate on the corresponding **acutual** remote object. **NO NEED to ```ray.get(ref)``` in the remote function** to use the object instance.
> 
> For objects like arrays, we can use shared memory and avoid copying the object. If the remote object does not live on the node where the worker calls ray.get(ref), then the remote object will be transferred first to the node.

In [82]:
@ray.remote(num_returns=1)
def runner(instances): # <--- caller passes references 
    result = []
    for obj in instances: # <--- Just use the object without executing ray.get().
        match = re.search(r"http://(.*).com", obj, re.IGNORECASE)
        result.append(match.group(1))
        
    return result

In [83]:
urls = [
    "http://gmail.com",
    "http://facebook.com"
]
reference_to_urls = ray.put(urls)

ray.get(runner.remote(reference_to_urls))  # Passing the reference to the object

['gmail', 'facebook']

## Stop/disconnect

In [84]:
ray.shutdown()