# Images as Functions

In computer vision, images are more than mere pixels with colors; they are representations of a function $f(x, y)$. Here, 'x' signifies the horizontal rightward direction, 'y' denotes the vertical downward direction, and $f(x, y)$ corresponds to the pixel value, typically in grayscale or RGB, for a given pixel coordinate pair. This convention will be consistently followed throughout this repository unless otherwise specified.

In [None]:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from skimage import color, io

%matplotlib inline

Here are two example images opened using `Pillow` and visualized using `matplotlib`. Notably, the horizontal axis (x-axis) extends from left to right, while the vertical axis (y-axis) extends from top to bottom. Consequently, it's important to note that the image's origin is at the **top-leftmost point**, which might appear counterintuitive.

In [None]:
img1_path = "./input/ps0-1-a-1.tiff"
img2_path = "./input/ps0-1-a-2.tiff"

# gray-image: MxN
# RGB-image: MxNx3
# RGBA-image: MxNx4
arr1 = io.imread(img1_path)

with Image.open(img2_path) as pil_im2:
    arr2 = np.asarray(pil_im2)
    # arr2 = np.asarray(pil_im2.convert('L')) RGB -> GRAY. Notice the grayscale range is 0~255 with this function.


print(f"The left (car) image {img1_path} has shape {arr1.shape} and its order is RGB.") # HxWx3, RGB
print(f"The right (boat) image {img2_path} has shape {arr2.shape} and its order is {pil_im2.mode}.") # HxWx3, RGB

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(6, 4)) # figsize(Width, height) in inches.

ax[0].imshow(arr1)
ax[1].imshow(arr2)

fig.tight_layout()
plt.show()

# with Image.open(img2_path) as pil_im2:
#     arr2 = np.asarray(pil_im2.convert('L')) # RGB -> GRAY
#     h2, w2 = arr2.shape
#     center2 = (int(h2/2), (w2/2)) # y, x
#     # arr2 is an immutable array. If written, the error below is shown:
#     # ValueError: assignment destination is read-only
#     output = np.zeros_like(arr2)
#     output[:, :] = arr2[:, :]



## Color Channels

Now, let's explore the images further to gain a deeper understanding of what a digital image really is by doing the following:

- Plot monochrome image by converting RGB to grayscale image.
- Plot monochrome image by selecting R channel.
- Plot monochrome image by selecting G channel.
- Plot monochrome image by selecting B channel.
- Swap the red (R) and blue (B) channels.
- Swap the red (G) and blue (B) channels.

Notice that the pixel range for `RGB to Gray` image is `0` to `1`.

In [None]:
with Image.open(img1_path) as im:
    arr = np.asarray(im)

fontsize = 12

fig, ax = plt.subplots(
    nrows=2, ncols=3, figsize=(9, 6)
)  # figsize(Width, height) in inches.

ax[0, 0].imshow(color.rgb2gray(arr), cmap="gray", vmin=0, vmax=1)
ax[0, 0].set_title("RGB to Gray", fontsize=fontsize)
ax[0, 0].axis('off')  # Remove scale axis

ax[0, 1].imshow(arr[:, :, 0], cmap="gray", vmin=0, vmax=255)  # R channel
ax[0, 1].set_title("Red", fontsize=fontsize)
ax[0, 1].axis('off')

ax[1, 0].imshow(arr[:, :, 1], cmap="gray", vmin=0, vmax=255)  # G channel
ax[1, 0].set_title("Green", fontsize=fontsize)
ax[1, 0].axis('off')

ax[1, 1].imshow(arr[:, :, 2], cmap="gray", vmin=0, vmax=255)  # B channel
ax[1, 1].set_title("Blue", fontsize=fontsize)
ax[1, 1].axis('off')

# img[:,:,::-1] will create a view with swapped channels, `img` will stay unchanged.
ax[0, 2].imshow(arr[:, :, ::-1]) # Swap the red and blue channel. Or `[:, :, [2, 1, 0]]`
ax[0, 2].set_title("Swap R and B", fontsize=fontsize)
ax[0, 2].axis('off')

ax[1, 2].imshow(arr[:, :, [0, 2, 1]])  # Swap the green and blue channel
ax[1, 2].set_title("Swap G and B", fontsize=fontsize)
ax[1, 2].axis('off')

fig.tight_layout()
plt.show()


### Discussion

Let's analyze the R, G, and B monochrome images separately. The "green" image appears brighter due to the outdoor scene with trees and grass. Conversely, in the "red" image, the car appears dark primarily because it is blue, resulting in its darkness in the "red" image and brightness in the "blue" image. Additionally, trees and grass appear darker in both the "green" and "red" images due to the absorption of more high-frequency light, particularly blue light, by natural green objects.

In summary, computer vision algorithms perform better when images capture more information or details. Consequently, these algorithms excel with natural objects in outdoor scenes and also perform well with artificial objects under indoor lighting conditions. However, it's important to note that RGB and RGB-derived grayscale images cover a wide spectrum of light, making them versatile for various scenarios.

## Radiometric Operation: Calculate Mean and Standard Deviation

Let's compute the mean and standard deviation (STD) for two images. The mean provides insight into the average brightness of an image, while STD indicates the degree of brightness variation.

* Begin by determining the minimum and maximum pixel values of the boat image. Calculate the mean and standard deviation as well. It's essential to describe the methodology employed for these computations.

* Next, perform the following operations on all pixels:
  - Subtract the mean value.
  - Divide the result by the standard deviation.
  - Adjust the scaling factor: multiply by 10 if the image intensity ranges from 0 to 255, or by 0.05 if the range is from 0.0 to 1.0.
  - Finally, add the mean value back to the adjusted pixel values.



In [None]:
arr = io.imread(img2_path)

green = arr[:, :, 1]
minimum = np.min(green)
maximum = np.max(green)
mean = np.mean(green)
std = np.std(green)
print(f"Characteristic of the green channel of the boat image: \nMin: {minimum}. Max: {maximum}. Mean: {mean}. Std: {std}.")

normal = (mean+10*(green-mean)/std)
normal = normal.astype(np.uint8)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(6, 3))

ax[0].imshow(green, cmap="gray", vmin=0, vmax=255)
ax[0].set_title("Original Green", fontsize=fontsize)
ax[0].axis('off')

ax[1].imshow(normal, cmap="gray", vmin=0, vmax=255)
ax[1].set_title("Standarized Green", fontsize=fontsize)
ax[1].axis('off')

# Save the 'green' and 'normal' image.
plt.imsave("./images/green_channel_original.png", green, cmap="gray", vmin=0, vmax=255)
plt.imsave("./images/green_channel_standardized.png", normal, cmap="gray", vmin=0, vmax=255)

fig.tight_layout()
plt.show()


### Discussion

The green channel exhibits a mean pixel value of 124.3 and a standard deviation (STD) of 77, indicating a well-exposed image with balanced brightness, neither too bright nor too dark. The visual appearance on the screen is pleasing. However, post-standardization of the green channel, the resulting image appears predominantly gray. This occurs because standardization shifts the entire image toward the midpoint of the pixel range, which, in the case of an 8-bit image ranging from 0 to 255, corresponds to gray. Thus, the overall gray appearance is a consequence of this adjustment.

## Geometric Operation: Shifting the Image

In this section, we apply a geometric operation to the image to observe its resulting appearance.

* Perform a leftward shift of 1 pixel on the green channel.

* Subtract the original green channel with the shifted green channel, ensuring that pixel values remain within the valid range (what do negative numbers for pixels mean anyway?).

In [None]:
arr = io.imread(img1_path)

green = arr[:, :, 1]

shift_pixel = 1

shifted = np.zeros_like(green)
shifted[:, :-shift_pixel] = green[:, shift_pixel:]

sub_shift_green = green-shifted
sub_shift_green[sub_shift_green<0] = 0


fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(6, 3))

ax[0].imshow(green, cmap="gray", vmin=0, vmax=255)
ax[0].set_title("Original green", fontsize=fontsize)
ax[0].axis('off')

ax[1].imshow(sub_shift_green, cmap="gray", vmin=0, vmax=255)
ax[1].set_title("Subtract original with shifted green", fontsize=fontsize)
ax[1].axis('off')

fig.tight_layout()
plt.show()


### Noise

To assess the impact of noise on the original colored image, Gaussian noise is incrementally added to the green and blue channels separately. The sigma parameter is increased until the noise becomes visibly noticeable. This experiment aims to observe how the appearance of colored images changes when noise is introduced into these channels individually. The following questions are explored: which channel proves more resilient to noise, which one results in a better visual outcome, and why?

In [None]:
def gen_2d_gauss_noise(mean, std, height, width):
    return np.random.normal(mean, std, size=(height, width))

In [None]:
arr = io.imread(img1_path)
height, width, _ = arr.shape

green_output = np.zeros_like(arr)
green_output[:] = arr
blue_output = np.zeros_like(arr)
blue_output[:] = arr

noise = gen_2d_gauss_noise(0, 15, height, width)

# Convert int to float
green_output = green_output.astype(np.float64)
blue_output = blue_output.astype(np.float64)
green_output[:, :, 1] += noise # Add noise to Green channel
blue_output[:, :, 2] += noise # Add noise to Blue channel
# Convert float to int
green_output = green_output.astype(np.uint8)
blue_output = blue_output.astype(np.uint8)

fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(9, 6))

ax[0, 0].imshow(green_output, vmin=0, vmax=255)
ax[0, 0].set_title("Noisex1 on Green", fontsize=fontsize)
ax[0, 0].axis('off')
ax[1, 0].imshow(blue_output, vmin=0, vmax=255)
ax[1, 0].set_title("Noisex1 on Blue", fontsize=fontsize)
ax[1, 0].axis('off')

green_output = green_output.astype(np.float64)
blue_output = blue_output.astype(np.float64)
green_output[:, :, 1] += noise
blue_output[:, :, 2] += noise
green_output = green_output.astype(np.uint8)
blue_output = blue_output.astype(np.uint8)
ax[0, 1].imshow(green_output, vmin=0, vmax=255)
ax[0, 1].set_title("Noisex2 on Green", fontsize=fontsize)
ax[0, 1].axis('off')
ax[1, 1].imshow(blue_output, vmin=0, vmax=255)
ax[1, 1].set_title("Noisex2 on Blue", fontsize=fontsize)
ax[1, 1].axis('off')

green_output = green_output.astype(np.float64)
blue_output = blue_output.astype(np.float64)
green_output[:, :, 1] += noise
blue_output[:, :, 2] += noise
green_output = green_output.astype(np.uint8)
blue_output = blue_output.astype(np.uint8)
ax[0, 2].imshow(green_output, vmin=0, vmax=255)
ax[0, 2].set_title("Noisex3 on Green", fontsize=fontsize)
ax[0, 2].axis('off')
ax[1, 2].imshow(blue_output, vmin=0, vmax=255)
ax[1, 2].set_title("Noisex3 on Blue", fontsize=fontsize)
ax[1, 2].axis('off')

fig.tight_layout()
plt.show()


### Discussion

The outcome is striking: the blue channel exhibits greater resilience to Gaussian noise.
Now, let's delve into the reasons behind this phenomenon.

The robustness of the blue channel to Gaussian noise can be attributed to several
factors:

- **Human Visual Perception:** The human eye is more sensitive to changes in luminance
  (brightness) than to changes in chrominance (color). Since the blue channel
  predominantly carries chrominance information, adding Gaussian noise to it affects
  color perception less compared to the green channel.

- **Noise Level:** The green channel often contains more detail and variation than the blue
  channel in typical images. Therefore, when noise is added, the green channel is more
  likely to reveal noticeable artifacts and degrade image quality.

- **Color Composition:** In many natural scenes, blue objects or regions are relatively less
  prevalent than green ones. This means that alterations in the blue channel's color are
  often less noticeable to viewers than similar changes in the green channel.

- **Color Balance:** Human vision is more forgiving of slight color imbalances in the blue
  channel compared to the green channel. This is because our brains are accustomed to
  tolerating minor variations in non-dominant colors.


## Rescaling and Normalizing Images

the most important thing is to rescale your image from 0 to 1. it will be a problem when showing the result so we need to rescale it back 
to unsigned integer to be able to visualize it.

In [None]:

def rescale_and_clip_raster(arr: np.ndarray, out_range=(0, 254), percentiles=1):
    def clip(img, min, max):
        assert img.ndim == 2
        assert img.shape[0] > 3
        assert img.shape[1] > 3
        img[img < min] = min
        img[img > max] = max
        return img

    assert np.any(arr == NO_DATA_VALUES[str(arr.dtype)]) == True
    # valid_arr is an array whose nodata value are masked out
    valid_arr = np.ma.masked_equal(arr, NO_DATA_VALUES[str(arr.dtype)])
    assert np.any(valid_arr == NO_DATA_VALUES[str(arr.dtype)]) == False

    minimum, maximum = np.percentile(
        valid_arr.flatten(), (percentiles, 100 - percentiles)
    )

    # TODO: minimum must with the range of out_range.
    if minimum <= out_range[0]:
        minimum = out_range[0]

    for i in range(arr.shape[0]):
        arr[i, :, :] = clip(arr[i, :, :], min=minimum, max=maximum)

    a = (out_range[1] - out_range[0]) / (maximum - minimum)
    b = out_range[1] - a * maximum

    arr = a * arr + b
    return arr



In [None]:
def normalize_saturation(img, desired_mean=0.325, desired_std=0.15):
    # maybe just change the mean but not std for saturation
    # img: HxWxC (RGB)
    assert img.ndim == 3 and img.shape[2] == 3
    assert np.isfinite(img).all() == True
    assert np.all(img >= 0.0) == True
    assert np.all(img <= 1.0) == True

    # Hue: float 0-1
    # Saturation: float 0-1
    # Value: float 0-1
    hsv_arr = rgb2hsv(img)

    assert np.isfinite(hsv_arr).all() == True
    assert np.all(hsv_arr >= 0.0) == True
    assert np.all(hsv_arr <= 1.0) == True

    current_mean = np.mean(hsv_arr[:, :, 1])
    current_std =  np.std(hsv_arr[:, :, 1])

    # Calculate the z-scores of the data
    z_scores = (hsv_arr[:, :, 1] - current_mean) / current_std

    # Scale the z-scores to the desired mean and standard deviation
    hsv_arr[:, :, 1] = z_scores * desired_std + desired_mean

    hsv_arr[:, :, 1][hsv_arr[:, :, 1] > 1.0] = 1.0
    hsv_arr[:, :, 1][hsv_arr[:, :, 1] < 0.0] = 0.0
    assert np.isfinite(hsv_arr).all() == True
    assert np.all(hsv_arr >= 0.0) == True
    assert np.all(hsv_arr <= 1.0) == True

    # Check the new mean and standard deviation of the transformed array
    # new_mean = np.mean(hsv_arr[:, :, 1])
    # new_std = np.std(hsv_arr[:, :, 1])

    rgb_arr = hsv2rgb(hsv_arr)

    rgb_arr[rgb_arr>1.0] = 1.0
    rgb_arr[rgb_arr<0.0] = 0.0

    assert np.isfinite(rgb_arr).all() == True
    assert np.all(rgb_arr >= 0.0) == True
    assert np.all(rgb_arr <= 1.0) == True

    return rgb_arr
    
def normalize_luminance(img, desired_mean=0.5, desired_std=0.18):
    # img: HxWxC (RGB)
    assert img.ndim == 3 and img.shape[2] == 3
    assert np.isfinite(img).all() == True
    assert np.all(img >= 0.0) == True
    assert np.all(img <= 1.0) == True

    # Hue: float 0-1
    # Saturation: float 0-1
    # Value: float 0-1
    hsv_arr = rgb2hsv(img)

    assert np.isfinite(hsv_arr).all() == True
    assert np.all(hsv_arr >= 0.0) == True
    assert np.all(hsv_arr <= 1.0) == True

    current_mean = np.mean(hsv_arr[:, :, 2])
    current_std =  np.std(hsv_arr[:, :, 2])

    # Calculate the z-scores of the data
    z_scores = (hsv_arr[:, :, 2] - current_mean) / current_std

    # Scale the z-scores to the desired mean and standard deviation
    hsv_arr[:, :, 2] = z_scores * desired_std + desired_mean

    hsv_arr[:, :, 2][hsv_arr[:, :, 2] > 1.0] = 1.0
    hsv_arr[:, :, 2][hsv_arr[:, :, 2] < 0.0] = 0.0
    assert np.isfinite(hsv_arr).all() == True
    assert np.all(hsv_arr >= 0.0) == True
    assert np.all(hsv_arr <= 1.0) == True

    # Check the new mean and standard deviation of the transformed array
    # new_mean = np.mean(hsv_arr[:, :, 2])
    # new_std = np.std(hsv_arr[:, :, 2])

    rgb_arr = hsv2rgb(hsv_arr)

    rgb_arr[rgb_arr>1.0] = 1.0
    rgb_arr[rgb_arr<0.0] = 0.0

    assert np.isfinite(rgb_arr).all() == True
    assert np.all(rgb_arr >= 0.0) == True
    assert np.all(rgb_arr <= 1.0) == True

    return rgb_arr

class HsvNormalize(StereoTransform):
    """Divide pixel values by 255 = 2**8 - 1, subtract mean per channel and divide by std per channel.

    Args:
        mean (float, list of float): mean values
        std  (float, list of float): std values
        max_pixel_value (float): maximum possible pixel value

    Targets:
        left, right

    Image types:
        uint8, float32
    """

    def __init__(self, mean=0.5, std=0.18, max_pixel_value=2**8-1, always_apply=False, p=1.0):
        super(HsvNormalize, self).__init__(always_apply, p)
        self.mean = mean
        self.std = std
        self.max_pixel_value = max_pixel_value

    def apply(self, image, **params):
        # image: HxWxC. Bit-depth: 8, 11 or 12. Datatype: uint8 or uint16
        assert image.ndim == 3 and image.shape[2] == 3
        assert np.isfinite(image).all() == True
        assert np.all(image >= 0.0) == True

        float_img = image / self.max_pixel_value
        float_img = float_img.astype(np.float32)
        float_img[float_img>1.0] = 1.0
        float_img[float_img<0.0] = 0.0
        return normalize_luminance(normalize_saturation(float_img), desired_mean=self.mean, desired_std=self.std)

    def get_transform_init_args_names(self):
        return ("mean", "std", "max_pixel_value")

class WhuNormalize(StereoTransform):
    """Divide pixel values by 255 = 2**8 - 1, subtract mean per channel and divide by std per channel.

    Args:
        mean (float, list of float): mean values
        std  (float, list of float): std values
        max_pixel_value (float): maximum possible pixel value

    Targets:
        left, right

    Image types:
        uint8, float32
    """

    def __init__(self, mean=0.5, std=0.18, max_pixel_value=2**8-1, always_apply=False, p=1.0):
        super(WhuNormalize, self).__init__(always_apply, p)
        self.mean = mean
        self.std = std
        self.max_pixel_value = max_pixel_value

    def apply(self, image, **params):
        # image: HxWxC. Bit-depth: 8, 11 or 12. Datatype: uint8 or uint16
        assert image.ndim == 3 and image.shape[2] == 3
        assert np.isfinite(image).all() == True
        assert np.all(image >= 0.0) == True

        float_img = image / self.max_pixel_value
        float_img = float_img.astype(np.float32)
        float_img[float_img>1.0] = 1.0
        float_img[float_img<0.0] = 0.0
        return normalize_luminance(float_img, desired_mean=self.mean, desired_std=self.std)

    def get_transform_init_args_names(self):
        return ("mean", "std", "max_pixel_value")