# Color extraction from images with Lithops4Ray

In this tutorial we explain how to use Lithops4Ray to extract colors and [HSV](https://en.wikipedia.org/wiki/HSL_and_HSV) color range from the images persisted in the IBM Cloud Oject Storage. To experiment with this tutorial, you can use any public image dataset and upload it to your bucket in IBM Cloud Object Storage. For example follow [Stanford Dogs Dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) to download images. We also provide upload [script](https://github.com/project-codeflare/data-integration/blob/main/scripts/upload_to_ibm_cos.py) that can be used to upload local images to the IBM Cloud Object Storage 

Our code is using colorthief package that need to be installed in the Ray cluster, both on head and worker nodes. You can edit `cluster.yaml` file and add
   
   `- pip install colorthief`

To the `setup_commands` section. This will ensure that once Ray cluster is started required package will be installed automatically.

In [None]:
import lithops
import ray

We write function that extracts color from a single image. Once invoked, Lithops framework will inject a reserved parameter `obj` that points to the data stream of the image. More information on the reserved `obj` parameter can be found [here](https://github.com/lithops-cloud/lithops/blob/master/docs/data_processing.md#processing-data-from-a-cloud-object-storage-service)

In [None]:
def extract_color(obj):
    from colorthief import ColorThief
    body = obj.data_stream
    dominant_color = ColorThief(body).get_color(quality=10)
    return dominant_color, obj.key


We now write a Ray task that will return image name and HSV color range of the image. Instead of a direct call to extract_color function, Lithops is being used behind the scenes (through the data object) to call it only at the right moment.

In [None]:
@ray.remote
def identify_colorspace(data):
    import colorsys
    color, name =  data.result()

    hsv = colorsys.rgb_to_hsv(color[0], color[1], color[2])
    val = hsv[0] * 180
    return name, val

Now let's tie all together with a main method. By using Lithops allows us to remove all the boiler plate code required to list data from the object storage. It also inspects the data source by using the internal Lithops data partitioner and creates a lazy execution plan, where each entry maps an "extract_color" function to a single image. Moreover, Lithops creates a single authentication token that is used by all the tasks, instead of letting each task perform authentication. The parallelism is controlled by Ray and once Ray task is executed, it will call Lithops to execute the extract_color function directly in the context of the calling task. Thus, by using Lithops, we can allow code to access object storage data, without requiring additional coding effort from the user.

In [None]:
if __name__ == '__main__':

    ray.init(ignore_reinit_error=True)

    fexec = lithops.LocalhostExecutor(log_level=None)
    my_data = fexec.map(extract_color, 'cos://gvernikuseast/500flowers/')

    results = [identify_colorspace.remote(d) for d in my_data]

    for res in results:
        value = ray.get(res)
        print("Image: " + value[0] + ", dominant color HSV range: " + str(value[1]))
    ray.shutdown()