# Colorization

Authors:

- Rohan Rele (rsr132)
- Aakash Raman (abr103)
- Alex Eng (ame136)
- Adarsh Patel (aap237)

This project was completed for Professor Wes Cowan's Fall 2019 offering of the CS 520: Intro to Artificial Intelligence course, taught at Rutgers University, New Brunswick.

# Problem Statement

In this project, we tackle the problem of colorizing black-and-white photos. That is, given an input image for which every pixel only has one numerical value representing lightness, i.e. ranging from white to black, we seek to output an image for which every pixel has three numerical values corresponding to the Red, Green, and Blue (RGB) color channels. 

The challenge is that grayscale images contain less information than RGB images, so mapping from the former to the latter will certainly involve some perceptual and numerical loss in conversion. To identify this loss, we start with color images, convert them to grayscale, attempt to colorize them, and then compare the result with the original truth color images. In this way, we seek to solve a supervised machine learning problem wherein we attempt to predict the "true" coloring of an image when we know what that "true" coloring ought to be.

To that end, we build and train a neural network to colorize a black-and-white photo.

# Process Representation

## Color spaces

Consider a **color image** with $n \times m$ pixel dimensions. We consider its numerical representation as an $n \times m \times 3$ tensor, which can be thought of as an $n \times m$ matrix for which each entry corresponds to one pixel and is a matrix with dimension $3 \times 1$: one value for each of the R, G, and B channels to "color" that pixel.

For example:

$$I_{rgb} = \begin{bmatrix} 
    \begin{bmatrix} r_{0,0} & g_{0,0} & b_{0,0} \end{bmatrix} &
    \begin{bmatrix} r_{0,1} & g_{0,1} & b_{0,1} \end{bmatrix} &
    \dots 
    \begin{bmatrix} r_{0,m} & g_{0,m} & b_{0,m} \end{bmatrix} \\
    {} & \ddots & {} \\
    \begin{bmatrix} r_{1,0} & g_{1,0} & b_{1,0} \end{bmatrix} &
    \begin{bmatrix} r_{1,1} & g_{1,1} & b_{1,1} \end{bmatrix} &
    \dots 
    \begin{bmatrix} r_{n,m} & g_{n,m} & b_{n,m} \end{bmatrix}
    \end{bmatrix}$$
    

where $r_{i,j}, g_{i,j}, b_{i,j} \in [0,255]$ each represent the $(i,j)$-th pixel's color along the red, green, and blue channels. The reader is likely familiar with the following two colors in RGB:

$$(r=0, g=0, b=0) \rightarrow \text{black}$$
$$(r=255, g=255, b=255) \rightarrow \text{white}$$


That being said, a **grayscale image** of the same pixel dimensions can intuitively be thought of as an $n \times m \times 1$ tensor, since each pixel can only contain one value for its lightness.

For example:

$$I_{gray} = \begin{bmatrix}
        p_{0,0} & p_{0,1} & \dots \ p_{0,m} \\
        {} & \ddots & {} \\
        p_{n,0} & p_{n,1} & \dots \ p_{n,m}
        \end{bmatrix}$$


where $p_{i,j} \in [0,255]$ represents the $(i,j)$-th pixel's lightness. For example, we have:

$$(p=0) \rightarrow \text{black}$$
$$(p=255) \rightarrow \text{white}$$

## Color mappings

Therefore, for this problem, we can define our desired color mappings more rigorously than just saying "go from black and white to color."

We begin with an image with its true coloring, and map it to its RGB tensor form. Then, our neural network will attempt to predict the color values of the image based only on its grayscale information. Using the language of our color spaces, the neural network will predict a 3-tuple of R, G, and B channel values per pixel based off each pixel's inputted grayscale channel value. This process will be described at length later. Then, the resulting matrix will be parsed and saved as an image.

In summary, our image conversion process can be represented as a sequence of functions mapping between the aforementioned color spaces as follows:

$$\text{Image} \rightarrow I_{rgb} \rightarrow I_{gray} \xrightarrow{NN} I^{*}_{rgb} \rightarrow \text{Image}^{*}$$

where the $LHS$ prior to the neural network are simple image conversions, the middle function is a lengthy composition of neural network operations, and the $RHS$ afterwards are also simple image conversions to recover the predicted colorization.

# Training Data

The input data is a volume of multiple color images. 

    <!- add information about what the images are: nature, animals etc and how many images -!>

To modify the above mappings to work on a volume of multiple images, we consider 4D tensors of dimensions $i, j, k, l$ where $i$ is the index of a given image in the list of input images, $j$ is the dimensionality of pixel information (i.e. 3 for RGB images), $k$ is the length of the image, and $l$ is the width of the image. Then, we may pass this entire 4D tensor through our network to encode the information of all images in the input.

## Pre-processing

Given a 4D input tensor $T_{imgs}$, we carry out the $LHS$ of the color mapping sequence described above to pre-process each image in the input volume. The necessary vectorizations and color map conversions are accomplished in Python using the package `PIL` for image file parsing and `numpy` for image tensor operations.

    <!- add some code for pre-processing -!>

# Model: Neural Network

## Intuition

## NN architecture

Our NN architecture can be described by the diagram below:

## NN implementation

# Model Evaluation

## Numerical error: loss function

We use the following loss function to determine the error of a certain pixel's coloring:

$$Loss_t = \sum_{P_{i,j}} (r^{I'}_{i,j} - r^{I_t}_{i,j})^2 + (g^{I'}_{i,j} - g^{I_t}_{i,j})^2 + (b^{I'}_{i,j} - b^{I_t}_{i,j})^2$$

for pixel $P_{i,j}$ where $r^{I'}_{i,j}, g^{I'}_{i,j}$, and $b^{I'}_{i,j}$ are this pixel's true coloring, and $r^{I_t}_{i,j}, g^{I_t}_{i,j}$, and $b^{I_t}_{i,j}$ are this pixel's coloring in the current state.

## Perceptual error

# Model Training

## Back propagation

## Performance complexity

## Overfitting

# NN Model Testing

# NN Model Assessment

## Results analysis

## Colorization errors

## Brand new input data

## Future directions