# Colorization

Authors:

- Rohan Rele (rsr132)
- Aakash Raman (abr103)
- Alex Eng (ame136)
- Adarsh Patel (aap237)

This project was completed for Professor Wes Cowan's Fall 2019 offering of the CS 520: Intro to Artificial Intelligence course, taught at Rutgers University, New Brunswick.

# Problem Statement

In this project, we tackle the problem of colorizing black-and-white photos. That is, given an input image for which every pixel only has one numerical value representing lightness, i.e. ranging from white to black, we seek to output an image for which every pixel has three numerical values corresponding to the Red, Green, and Blue (RGB) color channels. 

The challenge is that grayscale images contain less information than RGB images, so mapping from the former to the latter will certainly involve some perceptual and numerical loss in conversion. To identify this loss, we start with color images, convert them to grayscale, attempt to colorize them, and then compare the result with the original truth color images. In this way, we seek to solve a supervised machine learning problem wherein we attempt to predict the "true" coloring of an image when we know what that "true" coloring ought to be.

To that end, we build and train a neural network to colorize a black-and-white photo.

# Process Representation

## Color spaces

Consider a **color image** with $n \times m$ pixel dimensions. We consider its numerical representation as an $n \times m \times 3$ tensor, which can be thought of as an $n \times m$ matrix for which each entry corresponds to one pixel and is a matrix with dimension $3 \times 1$: one value for each of the R, G, and B channels to "color" that pixel.

For example:

$$I_{rgb} = \begin{bmatrix} 
    \begin{bmatrix} r_{0,0} & g_{0,0} & b_{0,0} \end{bmatrix} &
    \begin{bmatrix} r_{0,1} & g_{0,1} & b_{0,1} \end{bmatrix} &
    \dots 
    \begin{bmatrix} r_{0,m} & g_{0,m} & b_{0,m} \end{bmatrix} \\
    {} & \ddots & {} \\
    \begin{bmatrix} r_{1,0} & g_{1,0} & b_{1,0} \end{bmatrix} &
    \begin{bmatrix} r_{1,1} & g_{1,1} & b_{1,1} \end{bmatrix} &
    \dots 
    \begin{bmatrix} r_{n,m} & g_{n,m} & b_{n,m} \end{bmatrix}
    \end{bmatrix}$$
    

where $r_{i,j}, g_{i,j}, b_{i,j} \in [0,255]$ each represent the $(i,j)$-th pixel's color along the red, green, and blue channels. The reader is likely familiar with the following two colors in RGB:

$$(r=0, g=0, b=0) \rightarrow \text{black}$$
$$(r=255, g=255, b=255) \rightarrow \text{white}$$


That being said, a **grayscale image** of the same pixel dimensions can intuitively be thought of as an $n \times m \times 1$ tensor, since each pixel can only contain one value for its lightness.

For example:

$$I_{gray} = \begin{bmatrix}
        p_{0,0} & p_{0,1} & \dots \ p_{0,m} \\
        {} & \ddots & {} \\
        p_{n,0} & p_{n,1} & \dots \ p_{n,m}
        \end{bmatrix}$$


where $p_{i,j} \in [0,255]$ represents the $(i,j)$-th pixel's lightness. For example, we have:

$$(p=0) \rightarrow \text{black}$$
$$(p=255) \rightarrow \text{white}$$


Finally, we introduce the CIE **$Lab$ color space** to which an image can be mapped, which, like the RGB space, encodes three numeric values per pixel:

1. **L:** lightness metric, which is similar to the lightness in grayscale except over the range $[0,100]$
2. **a:** green-red metric, which can be constrained to the range $[-128, 127]$, over which negative numbers correspond to green, and positive numbers correspond to red
3. **b:** blue-yellow metric, which can be constrained to the range$[-128, 127]$, over which negative numbers correspond to blue, and positive numbers correspond to yellow

In this way, a $Lab$ image can represent the spectrum of human-visible color just as RGB, and it can be encoded in a similar way. For example:

$$I_{Lab} = \begin{bmatrix} 
    \begin{bmatrix} L_{0,0} & a_{0,0} & b_{0,0} \end{bmatrix} &
    \begin{bmatrix} L_{0,1} & a_{0,1} & b_{0,1} \end{bmatrix} &
    \dots 
    \begin{bmatrix} L_{0,m} & a_{0,m} & b_{0,m} \end{bmatrix} \\
    {} & \ddots & {} \\
    \begin{bmatrix} L_{1,0} & a_{1,0} & b_{1,0} \end{bmatrix} &
    \begin{bmatrix} L_{1,1} & a_{1,1} & b_{1,1} \end{bmatrix} &
    \dots 
    \begin{bmatrix} L_{n,m} & a_{n,m} & b_{n,m} \end{bmatrix}
    \end{bmatrix}$$
    

One advantage to considering color images in $Lab$ as opposed to RGB is that **Lab images have a significantly reduced space complexity.**

This is because although there are three channels, the $ab$ channels contain each pixel's actual color information. The $L$ channel merely determines how bright to make that particular color representation. Then, to capture the entire space of human-visible colors in $Lab$, we need to consider is $255^2 = 65025$ data points per pixel. This is a vast improvement over RGB, which would require $255^3 = 16581375$ data points per pixel. 

For this problem, because we are considering a machine learning approach for which the feature space is the space of colors per pixel, the dramatically smaller $Lab$ color space is more computationally preferable that than of the RGB space.

## Color mappings

Therefore, for this problem, we can define our desired color mappings more rigorously than just saying "go from black and white to color."

We begin with an image with its true coloring, and map it to its RGB tensor form. Then, we convert the image's RGB tensor form to its $Lab$ tensor form. From here, it is simple to convert the $Lab$ tensor to a grayscale tensor by dropping the $ab$ channels and retaining the $L$ channel. The resulting tensor essentially encodes a black-and-white image, and it is the input to our neural network.

Note: The difference in range between the grayscale lightness channel of range $[0,255]$ and the $Lab$ $L$ channel of range $[0,100]$ is trivial, as they are within scalar multiples of each other. Using the latter scale is convenient, as it reduces the input feature space.

Then, our neural network will attempt to predict the color values of the image based only on its grayscale information. Using the language of our color spaces, the neural network will predict a pair of $ab$ channel values per pixel based off each pixel's inputted $L$ channel value. This process will be described at length later. 

Assuming the network has outputted a tensor with predicted $ab$ channel values, we can then construct a color image by adding in the image's original $L$ channel to produce a final image tensor in $Lab$. Finally, we may convert the $Lab$ tensor to an RGB tensor in order to output and save the resulting image.

In summary, our image conversion process can be represented as a sequence of functions mapping between the aforementioned color spaces as follows:

$$\text{Image} \rightarrow I_{rgb} \rightarrow I_{Lab} \rightarrow I_{L} \xrightarrow{NN} I^{*}_{ab} \rightarrow I^{*}_{Lab} \rightarrow I^{*}_{rgb} \rightarrow \text{Image}^{*}$$

where the $LHS$ prior to the neural network are simple image conversions accomplished in Python using the packages `numpy`, `PIL`, and `skimage`, the middle function is a lengthy composition of neural network operations, and the $RHS$ afterwards are also simple image conversions to recover the predicted colorization.

# Training Data

## Pre-processing

# Model: Convolutional Neural Network

## Intuition

## CNN architecture

## CNN implementation

# Model Evaluation

## Numerical error: loss function

## Perceptual error

# Model Training

## Stochastic gradient descent

## Back propagation

## Performance complexity

## Overfitting

# CNN Model Trials

# CNN Model Assessment

## Results analysis

## Colorization errors

## Brand new input data

## Future directions

# Bonus: Reconstructing damaged images