### Activity 7: Communicating Ray Actors

This is a short exercise to demonstrate how actors can communicate through remote oids.
We are going to break the actors of the ImageNet classification [Example 24](../../examples/24_ex_ray_actors.ipynb) into 
two actors: one that transforms the image into an ResNet50 compatible tensor and one that takes
the tensor as input and returns the classification. 

You have been given two class files that have been written to be instantiated as Ray actors:
  * [rayresnet50_normalize](./rayresnet50_normalize.py)
  * [rayresnet50_classify](./rayresnet50_classify.py)

To complete the exercise you need to populate the following driver code.  Then answer the questions.

Data is from https://github.com/EliSchwartz/imagenet-sample-images.

Note: check your ouput to make sure that the predictions match the input file. This classifier should be over 90% correct. You need to be careful to match the return OIDs with files. **Include the cell output in submitted notebook**.

In [None]:
from rayresnet50_normalize import RRN50Normalize
from rayresnet50_classify import RRN50Classify
import ray
import time
import os

num_actors=4

# script to drive parallel program
ray.init(num_cpus=num_actors, ignore_reinit_error=True)

### instantiate 4 normalization actors
normalize_actors = [RRN50Normalize.remote() for _ in range(num_actors)]

### instantiate 4 classification actors
classify_actors = [RRN50Classify.remote() for _ in range(num_actors)]

directory = 'data/'
files = os.listdir(directory)

start_time = time.time()  # Get the current time

oids = list()

for i in range(len(files)):
    if files[i].endswith(".JPEG"):
        file_path = os.path.join(directory, files[i])

        ### call remote to normalize image into tensor
        tensor_oid = normalize_actors[i % num_actors].normalize_image.remote(file_path)
        
        ### call remote to classify tensor
        classify_oid = classify_actors[i % num_actors].classify_image.remote(tensor_oid)
        
        ### store the oids needed to complete the computation
        oids.append(classify_oid)
        
preds = list()

for i in range(len(files)):
    try:
        ### collect results for each file in a variable preds
        preds.append(ray.get(oids[i]))
        print(f"Filename {files[i]}: predictions {preds[-1]}")
    except:
        pass

end_time = time.time()  # Get the current time again

execution_time = end_time - start_time
print("Execution time: ", execution_time, " seconds")

### Questions

* Question 1: Does the computation for a single input file (normalization and classification) run in serial or parallel?  If serially, how is the dependency enforced?  

    **Answer**: The computation for a single file, involving normalization and classification, is executed serially within a parallel framework. This serial execution is enforced by data dependency: the classification of an image cannot begin until its normalization is complete. We utilize Ray to parallelize these operations across multiple files, allowing different files to be normalized and classified simultaneously by different actors. Thus, while each file is processed serially, multiple files undergo this process in parallel.

* Question 2: Does the computation of different files run in serial or parallel?  If parallel, explain why they are independent.  

    **Answer**: The computation of different files runs in parallel, not in serial. By creating multiple `normalize_actors` and `classify_actors`, we distribute the tasks of normalizing and classifying images across different CPU cores. Each actor handles a subset of the files independently, allowing simultaneous processing. The independence of these tasks is inherent, as each image's normalization and classification do not depend on the results of other images, making them suitable for parallel execution.

* Question 3: Your computation needs to collect return identifiers for the classification objects. It is not necessary to collect the OIDs of the normalization function in the driver code. Why?  

    **Answer**: In Ray, the output of one actor method can directly be used as the input for another actor method, eliminating the need to collect intermediate Object IDs in the driver code. The normalization actors' outputs are directly passed to the classification actors. Thus, only the final classification results (OIDs from classify_actors) need to be collected by the driver for further processing.

* Question 4: At any given point in time, how many actors are running and what are they doing?  

    **Answer**: At any given time, up to eight actors (four normalization actors and four classification actors) can be running concurrently. The normalization actors are responsible for normalizing images into tensors, while the classification actors classify these tensors. The number of active actors depends on the number of files being processed and their distribution across the actors, given the round-robin scheduling (`i % num_actors`).

* Question 5: Is this implementation faster or slower than doing the normalization and classification in one actor?  Can you think of a situation in which it would be faster to do them together?  (By situation, I mean data properties or target hardware system on which this would be preferable.)  

    **Answer**: This implementation, which uses separate actors for normalization and classification, can be faster due to parallel processing, especially if normalization and classification are computationally intensive and independent tasks. However, if the tasks share significant data or if inter-actor communication overhead is high, combining them into one actor could be faster, particularly on systems with limited processing cores or slower inter-process communication capabilities.
