### Exercise 9: Communicating Ray Actors (solutions code at bottom)

This is a short exercise to demonstrate how actors can communicate through remote oids.
We are going to break the actors of the ImageNet classification [Example XX](RayImageNet.ipynb) into 
two actors: one that transforms the image into an ResNet50 compatible tensor and one that takes
the tensor as input and returns the classification. 

You have been given two class files that have been written to be instantiated as Ray actors:
  * [rayresnet50_normalize](../rayresnet50_normalize.py)
  * [rayresnet50_classify](../rayresnet50_classify.py)

To complete the exercise you need to populate the following driver code.  Then answer the questions.

Data is from https://github.com/EliSchwartz/imagenet-sample-images.

Note: check your ouput to make sure that the predictions match the input file. This classifier should be over 90% correct. You need to be careful to match the return OIDs with files. **Include the cell output in submitted notebook**.

In [1]:
from rayresnet50_normalize import RRN50Normalize
from rayresnet50_classify import RRN50Classify
import ray
import time
import os

num_actors=4

# script to drive parallel program
ray.init(num_cpus=num_actors, ignore_reinit_error=True)

### TODO instantiate 4 normalization actors

### TODO instantiate 4 classification actors


directory = '../../data/imagenet1000'
files = os.listdir(directory)

start_time = time.time()  # Get the current time

for i in range(len(files)):
    if files[i].endswith(".JPEG"):
        file_path = os.path.join(directory, files[i])

        ### TODO call remote to normalize image into tensor
        
        ### TODO call remote to classify tensor
        
        ### TODO store the oids needed to complete the computation
        
for i in range(len(files)):
    try:
        ### TODO collect results for each file in a variable preds
        # preds = ray.get(....)
        # print(f"Filename {files[i]}: predictions {preds}")
    except:
        pass

end_time = time.time()  # Get the current time again

execution_time = end_time - start_time
print("Execution time: ", execution_time, " seconds")

IndentationError: expected an indented block after 'try' statement on line 33 (2512887600.py, line 37)

### Questions

Answer these questions inline in this markdown cell.

* Question 1: Does the computation for a single input file (normalization and classification) run in serial or parallel?  If serially, how is the dependency enforced?

Serial. Classification blocks waits on OID of normalization.

* Question 2: Does the computation of different files run in serial or parallel?  If parallel, explain why they are independent. 

Parallel. Launched independently all at the same time.

* Question 3: Your computation needs to collect return identifiers for the classification objects. It is not necessary to collect the OIDs of the normalization function in the driver code. Why?

The normalization oids are passed to the classification functions. Only need classification oids to get returns.

* Question 4: At any given point in time, how many actors are running and what are they doing?

8 actors. 4 classification. 4 normalization.

* Question 5: Is this implementation faster or slower than doing the same work in one actor?  Can you think of a situation in which it would be faster to decompose the code?  (By situation, I mean data properties or target hardware system on which this would be prefereable.) 

Faster if you had different hardware, e.g. GPU for classification.  Other ideas OK.

In [2]:
from rayresnet50_normalize import RRN50Normalize
from rayresnet50_classify import RRN50Classify
import ray
import time
import os

num_actors=4

# script to drive parallel program
ray.init(num_cpus=num_actors, ignore_reinit_error=True)

normactors = []
for i in range(num_actors):
    normactors.append(RRN50Normalize.remote())
    
classactors = []
for i in range(num_actors):
    classactors.append(RRN50Classify.remote())    

current_actor = 0


directory = '../../data/imagenet1000'
files = os.listdir(directory)
croids = [None] * len(files)

start_time = time.time()  # Get the current time

for i in range(len(files)):
    if files[i].endswith(".JPEG"):
        file_path = os.path.join(directory, files[i])
        nroid = normactors[i%num_actors].normalize_image.remote(file_path)
        # why doesn't the driver need to collect the nroid?
        croids[i] = classactors[i%num_actors].classify_image.remote(nroid)
        
for i in range(len(files)):
    try:
        preds = ray.get(croids[i])
        print(f"Filename {files[i]}: predictions {preds}")
    except:
        pass

end_time = time.time()  # Get the current time again

execution_time = end_time - start_time
print("Execution time: ", execution_time, " seconds")

ModuleNotFoundError: No module named 'rayresnet50_normalize'