# <font color='red'>**KANDINSKY**</font>: Clustering and Quantization   
  
## **Prep the Photographs**  

*Shaurya Agarwal*

![K-Means and Friends](./images/Data_Analysis_BG_1200.png)

### <font color='green'>__Support for Google Colab__  </font>  
    
open this notebook in Colab using the following button:  
  
<a href="https://colab.research.google.com/github/shauryashaurya/kandinsky/blob/master/01-K-Means-and-friends-prep-the-photos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>  
  
<font color='green'>uncomment and execute the cell below to setup and run this notebook on Google Colab.</font>

In [1]:
# # SETUP FOR COLAB: select all the lines below and uncomment (CTRL+/ on windows)
# # Image conversion methods use OpenCV 

# ! pip install --upgrade --no-cache-dir jax cv2 numpy pandas skimage scikit-learn

# ! mkdir ./data
# ! mkdir ./data/cgi
# ! mkdir ./data/edtf

## Setup, imports etc.

In [2]:
import concurrent.futures
import csv
import os
import cv2
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
from skimage import color
# 
import jax.numpy as jnp
from jax import random, jit

To render Matplotlib plots directly in a JupyterLab notebook, you can use the ```%matplotlib inline``` magic command. This command configures Matplotlib to render its plots inline within the Jupyter notebook cells, immediately below the code cells that produce them.  
For interactive plots within JupyterLab (e.g., for zooming and rotating 3D plots), you can use the ```%matplotlib widget``` magic instead.  
```%matplotlib widget``` requires ```ipympl``` package, ```pip install --upgrade --no-cache-dir ipympl``` if it's missing.  

In [3]:
%matplotlib widget

In [4]:
# hack to make plotly plots show up in the notebook
# pio.renderers.default = "notebook"

# A look at our Images

To easily display images in the notebook, use this code:
```
![Image Title](Image Path)
```

In [10]:
# if you *HAD* to do it this way, otherwise just use markdown...or HTML
# from IPython.display import Image as IPImage

# def display_image(image_path):
#     """Display an image using IPython display."""
#     display(IPImage(filename=image_path))

In [11]:
# blows up the size of the notebook
# display_image(image_path)

![sample image](./data/edtf/001.png)

![sample image](./data/edtf/002.png)

![sample image](./data/edtf/003.png)

![sample image](./data/edtf/004.png)

![sample image](./data/edtf/005.png)

![sample image](./data/edtf/006.png)

![sample image](./data/edtf/007.png)

![sample image](./data/edtf/008.png)

![sample image](./data/edtf/009.png)

![sample image](./data/edtf/010.png)

![sample image](./data/edtf/011.png)

![sample image](./data/edtf/012.png)

## Pre-processing images to generate data points

Let's try to load an image and see what it looks like

In [5]:
image = cv2.imread("./data/cgi/" + "01x25.png")

image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_rgb.shape
img2 = image_rgb.reshape((-1, image_rgb.shape[2]))
img2.shape

(157500, 3)

So about 157500 pixels, each with 3 values - red, green and blue 

### References:  
* Look at the color conversion codes here: https://docs.opencv.org/4.9.0/de/d25/imgproc_color_conversions.html  
* OpenCV Image transformation Enumerations: https://docs.opencv.org/3.4.0/d7/d1b/group__imgproc__misc.html

Save the RGB image as CSV data - we'll use these to perform clustering operations later.

In [15]:
# 1. Convert image to specified color space and save to CSV
# the method image_to_color_spaces_parallel() uses this
def convert_and_save(image_path, conversion_func, suffix, header):
    # Load the image
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # Convert the image using the provided conversion function
    converted_image = conversion_func(image_rgb)

    # Flatten the image array to list pixels
    pixels = converted_image.reshape((-1, converted_image.shape[2]))
    # print('suffix: ', suffix, ' sample pixels: ', pixels[:3])

    # Save to CSV
    base_path = image_path.rsplit(".", 1)[0]
    output_path = f"{base_path}_{suffix}.csv"

    # Using NumPy to directly save to CSV
    np.savetxt(
        output_path,
        pixels,
        delimiter=",",
        header=",".join(header),
        comments="",
        encoding="utf-8",
    )

In [16]:
# 2. Convert image to various color spaces using parallel processing. 
# Calls 1 in turn
def image_to_color_spaces_parallel(image_path):
    
    # Define conversion functions
    def to_rgb(image_rgb):
        return image_rgb

    def to_xyz(image_rgb):
        return cv2.cvtColor(image_rgb, cv2.COLOR_RGB2XYZ)

    def to_lab(image_rgb):
        return cv2.cvtColor(image_rgb, cv2.COLOR_RGB2Lab)

    def to_hsv(image_rgb):
        return cv2.cvtColor(image_rgb, cv2.COLOR_RGB2HSV)

    # TODO: HSL no supported in skimage.color, build support using other libraries later
    def to_hsl(image_rgb):
        return cv2.cvtColor(image_rgb, cv2.COLOR_RGB2HLS)  # OpenCV uses HLS naming

    # Conversion specifications (function, suffix, header)
    conversions = [
        (to_rgb, "RGB", ["R", "G", "B"]),
        (to_xyz, "XYZ", ["X", "Y", "Z"]),
        (to_lab, "Lab", ["L*", "a*", "b*"]),
        (to_hsv, "HSV", ["H", "S", "V"]),
        (to_hsl, "HSL", ["H", "S", "L"]),
    ]
    for func, suff, header in conversions:
        convert_and_save(image_path, func, suff, header)

    # # [TODO] Use ThreadPoolExecutor to parallelize conversions
    # with concurrent.futures.ThreadPoolExecutor() as executor:
    # 	futures = [executor.submit(convert_and_save, image_path, func, suffix, header)
    # 		for func, suffix, header in conversions]

    # # Wait for all futures to complete
    # concurrent.futures.wait(futures)

In [18]:
# Save sample images in a variety of color models
images = ["01x25.png", "02x25.png", "03x25.png", "04x25.png", "05x25.png", "06x25.png"]
for image in images:
    print("pre-processing image: ", image)
    image_path = "./data/cgi/" + image
    image_to_color_spaces_parallel(image_path)

pre-processing image:  01x25.png
pre-processing image:  02x25.png
pre-processing image:  03x25.png
pre-processing image:  04x25.png
pre-processing image:  05x25.png
pre-processing image:  06x25.png


In [19]:
# Do this for Eight-Down-Toofaan-Mail photographs as well
images = ['001.png','002.png','003.png','004.png','005.png','006.png','007.png','008.png','009.png','010.png','011.png','012.png']
for image in images:
    print("pre-processing image: ", image)
    image_path = "./data/edtf/" + image
    image_to_color_spaces_parallel(image_path)

pre-processing image:  001.png
pre-processing image:  002.png
pre-processing image:  003.png
pre-processing image:  004.png
pre-processing image:  005.png
pre-processing image:  006.png
pre-processing image:  007.png
pre-processing image:  008.png
pre-processing image:  009.png
pre-processing image:  010.png
pre-processing image:  011.png
pre-processing image:  012.png
