# DeepLearning nagyHF II. milestone
## Team - balit_learning : Gurubi Barnabás - DXEXVR, Mátyás Gergely - IL21NI, Horváth Ákos - DKILK6

## Before reading this project report
These code snippets are working and tested, although you cannot try them, by just running in this _ipynb_, considering this project has many dependencies, including a large dataset and predetermined directories.

Albeit if you do the following instructions you can run it:
* Download this _ipynb_ and put it in a directory (referenced as root from now and on)
* Download the following dataset: https://drive.google.com/open?id=1MYsSbdCOKQ8yJYQWhTPE9DGe_rcCaT4P then extract them into a directory named "floydhub_dataset"
* Create a directory named "Test" and put test pictures in it, following this also make a "result" directory

After these steps you can teach the model.

## Problem: Colorizing black and white pictures (with user interactions)

As we mentioned in our previous milestone we would like to create a neural network which can colorize black and white pictures. In adition to this we would like to provide a possibility for the user to change the result (colorized) image by giving the color of certain points in the original (black and white) image.

At the moment we have a dataset of ~70k images. (we wrote a script in the previous milestone, this time we will only upload the pictures in a .zip).

### The change in the data
We have used anther image dataset, since we found a relatively good network in an article that uses 256x256 sized pictures for colorization. This dataset contains approximately 9k images, we imagined that this set will be suitable for our first model.
_Note:_ Also we changed the names of the pictures to numbers like 1,2,3... to make it easier to load and use them, we will describe this decision in detail later.

## Required imports for this project

In [None]:
import keras
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.preprocessing import image
from keras.engine import Layer
from keras.applications.inception_resnet_v2 import preprocess_input
from keras.layers import Conv2D, UpSampling2D, InputLayer, Conv2DTranspose, Input, Reshape, merge, concatenate
from keras.layers import Activation, Dense, Dropout, Flatten
from keras.layers.normalization import BatchNormalization
from keras.callbacks import TensorBoard 
from keras.models import Sequential, Model
from keras.layers.core import RepeatVector, Permute
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from skimage.color import rgb2lab, lab2rgb, rgb2gray, gray2rgb
from skimage.transform import resize
from skimage.io import imsave
import numpy as np
import os
import random
import tensorflow as tf
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
from keras.models import load_model
import copy

## The network

We found the basic network in an article. We decided to start from this as they got a relatively nice result with it.
It builds up from three main parts:
* The __encoder__
* The __fusion layer__
* The __decoder__

### Encoder
The decoder is made up from 2D Convolutional layers. It gets the 256x256 grayscale image. This part does the feature extraction, it starts from extracting basic shapes (e.g. lines, curves) and eventually gets a bit more complex patterns in the end. At this time we have got lots of 32x32 images (256 exactly), all containing some pattern.

### Fusion layer
The article at this point uses a trick to improve the generalization ability of the network. The idea is to push these 32x32 images (with the patterns) through the inception resnet v2 network which is one of the most powerful classifiers nowadays (it can classify the pictures into 1000 classes).

With this method we can teach the network some basic pattern colors (like leaves, trees are green, clouds are white etc...) 

The article calls this the fusion layer, hence it combines this with the original picture data and gives as a second input. 

### Decoder
The last layer is supposed to build up images with the original size (256x256) from the 32x32 images. At the end of the decoder we have 256x256 images with two channels (ab from the Lab image coding), this is the final outputs of the network.

### Model with funcition API
We have built our model with the Keras functional API, becasuse of the complexity of the model. This API makes easier to create models for example with multi-input and multi-output or shared layers, and in this project, we have to merge the resnet v2 model's prediction and the convolutional model's prediction, so we have to deal with multi-in/outputs. We use this API to create the model with the three part, the encoder, the fusion and the decoder part. This API seems to be also usefull in the future, if we would like to use our model on videos instead of pictures.

![alt text](structure.png "Structure")


In [None]:
# Fusion input
embed_input = Input(shape=(1000,), name='embed_input')

# Encoder
encoder_input = Input(shape=(256, 256, 1,), name='encoder_input')
encoder_output = Conv2D(64, (3,3), activation='relu', padding='same', strides=2)(encoder_input)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same', strides=2)(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same', strides=2)(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)

# Fusion
fusion_output = RepeatVector(32 * 32)(embed_input) 
fusion_output = Reshape(([32, 32, 1000]))(fusion_output)
fusion_output = concatenate([encoder_output, fusion_output], axis=3) 
fusion_output = Conv2D(256, (1, 1), activation='relu', padding='same')(fusion_output) 

# Decoder
decoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(fusion_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)

model = Model(inputs=[encoder_input, embed_input], outputs=decoder_output)

## Running the images through _inception resnet v2_

This is the function that resizes the given image to 299x299 since the inception resnet v2 uses this size.
As a result the function returns prediction.

In [None]:
def create_inception_embedding(grayscaled_rgb):
    grayscaled_rgb_resized = []
    for i in grayscaled_rgb:
        i = resize(i, (299, 299, 3), mode='constant')
        #print(i.shape)
        grayscaled_rgb_resized.append(i)
    grayscaled_rgb_resized = np.array(grayscaled_rgb_resized)
    grayscaled_rgb_resized = preprocess_input(grayscaled_rgb_resized)
    with inception.graph.as_default():
        embed = inception.predict(grayscaled_rgb_resized)
    return embed

### Loading in the InceptionResNetV2

In [None]:
#Load weights
inception = InceptionResNetV2(weights='imagenet', include_top=True)
inception.graph = tf.get_default_graph()

### Running out of memory while loading the pictures

So we realized that we could not load in all the 70k pictures at the same time because they would consume too much memory. However the pictures use only 750 MB space, if we load them into 256x256x3 sized tensors (for the 256x256 pixels and the 3 color channels) they need 256 * 256 * 3 * 4 bytes = 786 432 bytes = ~0.8 MB. (since we use float numbers). SO if we would like to load in 70k pictures and each one of them uses 0.8 MB space we would need ~56 000 MB = 56 GB memory. It is almost impossible to have that much memory (especially if we use GPU for training), so we load in only batch-size pictures at a single time. We wrote a new class to handle the picture loading: __DataGenerator__

### Training with generator:

We found out that there is a way in keras to train the network with batches of data. The _Sequential.fit_generator()_ method trains the network with data which is generated __batch-by-batch__ by a Python generator. We implemented a Python generator, the class called __DataGenerator__.

Before we can explain the class, we shall discuss another part of the code, which we created for the same manner as the generator class. 

### Using indexes to access the data
The main problem is the earlier mentioned size of the dataset. If we want to load the data for splitting it into train, validation and test sets, we immediately bump into the not enough memory problem. Instead of this, we should assing __indexes__ to the data. Then if we split the indexes, the separation of the data is solved. We can make partitions from the indexes (_train, validation, test_).

The least complicated way to assign ids to the files, is giving names to the files same as the ids. Accordingly we did this in advance. We can easily do this with shell scripts.

In [None]:
#The range of existing file names (therefore ids)
all_id = list(range(1000, 10293))

#The split proportions
valid_split = 0.1
test_split = 0.1

#The split indexes
v_index = int(len(all_id)*(1-valid_split-test_split))
t_index = int(len(all_id)*(1-test_split))

#Splitting the id sets
train_ids = all_id[:v_index]
valid_ids = all_id[v_index:t_index]
test_ids = all_id[t_index:]

#Printing the length of the id sets
print("Length of train set: " + str(len(train_ids)))
print("Length of validation set: " + str(len(valid_ids)))
print("Length of test set: " + str(len(test_ids)))

#Storing them for later usage
partition = {'train': train_ids, 'validation': valid_ids, 'test': test_ids}

## DataGenerator class
This class does the input data generation in batches.

The class stores the crucial parameters of the generation (e.g. batch size, list of ids -therefore filenames-, shuffle option). The most important methods are __getitem__ and __data_generation__ since these are responsible for the generation of the data.

Comments describe the behaviour of the class in detail.

In [None]:
class DataGenerator(keras.utils.Sequence):
    # Initialization
    def __init__(self, list_IDs, batch_size=32, shuffle=True):
        self.batch_size = batch_size
        self.list_IDs = list_IDs
        self.shuffle = shuffle
        self.on_epoch_end()
        
    # Denotes the number of batches per epoch
    def __len__(self):
        return int(np.floor(len(self.list_IDs) / self.batch_size))

    # Generate one batch of data
    def __getitem__(self, index):
        # Generate indexes of the batch
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]

        # Find list of IDs
        list_IDs_temp = [self.list_IDs[k] for k in indexes]

        # Generate data
        X, y = self.__data_generation(list_IDs_temp)
        
        return X, y

    # Updates indexes after each epoch
    def on_epoch_end(self):
        self.indexes = np.arange(len(self.list_IDs))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

    # Generates data containing batch_size samples
    def __data_generation(self, list_IDs_temp):
        # Initialization (X for imput gray images, X for output ab channel images, I for resnet prediction) 
        X = np.empty((self.batch_size, *(256,256,1)), dtype=float)
        Y = np.empty((self.batch_size, *(256,256,2)), dtype=float)
        I = np.empty((self.batch_size, 1000))

        # Generate data
        for i, ID in enumerate(list_IDs_temp):
            # Loading image
            img = img_to_array(load_img('floydhub_dataset/' + str(ID) + '.jpg'))
            # Convert to grayscale
            grayscaled_rgb = gray2rgb(rgb2gray(img))
            grayscaled_rgb = grayscaled_rgb.reshape((1,*(grayscaled_rgb.shape)))
            # Setting the right domain for Lab converting 
            img = 1.0/255*img
            # Convert image to Lab
            img = rgb2lab(img)
            # Separating the image
            gray = img[:,:,0]
            ab = img[:,:,1:] / 128
            
            # Inception resnet prediction input
            I[i,] = create_inception_embedding(grayscaled_rgb)
            
            # Image input
            X[i,] = gray.reshape((256,256,1))

            # Output
            Y[i,] = ab

        # Returning with the multiple input and the output
        return {'encoder_input': X, 'embed_input': I}, Y


## Training of the network
After all these preparations we can finally start the training of the network (with the generator).

We used early stopping in the training of the model.

### About the batch_size and steps_per_epoch
We have already given _train and validation_ sets. The __batch_size__ could be chosen willingly, however after this choice, the __steps_per_epoch__ should be approximately _train set size/batch_size_ so for example if we have a training set containing 1000 images and a batch_size of 100, afterwards the steps_per_epoch shall be 1000/100 = 10.

__validation_steps__ parameter is the same for the _valid_generator_ and the validation set

In [None]:
# Early stopping
patience=10
early_stopping=EarlyStopping(patience=patience, verbose=1)
checkpointer=ModelCheckpoint(filepath='weights.hdf5', save_best_only=True, verbose=1)


# The generator which generates the training data
training_generator = DataGenerator(partition['train'], batch_size=1)
# The generator which generates the validation data
valid_generator = DataGenerator(partition['validation'], batch_size=100)


# Train model      
model.compile(optimizer='rmsprop', loss='mse')
model.fit_generator(training_generator, validation_data=valid_generator,
                    epochs=100, steps_per_epoch=148, validation_steps=18,
                   callbacks=[checkpointer, early_stopping])

## Testing the network and evaluating the results

### Testing
For the Milestone 2, we load the test images from a directory, called "Test", instead of the created test_split, because we would like to test on different, but certain pictures, to test our model in different cases.

### Evaluation
To evaluate the network at this time of the project we only use the keras __Model.evaluate()__, which describes the loss and other metrics for the model. In addition we of course use our eyes for the evaluation.

In [None]:
# Test images
tests = []
for filename in os.listdir('Test/'):
    tests.append(img_to_array(load_img('Test/'+filename)))
tests = np.array(color_me, dtype=float)
gray_me = gray2rgb(rgb2gray(1.0/255*tests))
color_me_embed = create_inception_embedding(gray_me)
X_test = rgb2lab(1.0/255*tests)[:,:,:,0]
X_test = X_test.reshape(X_test.shape+(1,))
Y_test = rgb2lab(1.0/255*tests)[:,:,:,1:]
Y_test = Y_test / 128

print(model.evaluate([X_test, color_me_embed], Y_test))

In [None]:
color_me = []
for filename in os.listdir('Test/'):
    color_me.append(img_to_array(load_img('Test/'+filename)))
color_me = np.array(color_me, dtype=float)
gray_me = gray2rgb(rgb2gray(1.0/255*color_me))
color_me_embed = create_inception_embedding(gray_me)
color_me = rgb2lab(1.0/255*color_me)[:,:,:,0]
color_me = color_me.reshape(color_me.shape+(1,))


# Test model
output = model.predict([color_me, color_me_embed])
output = output * 128

# Output colorizations
for i in range(len(output)):
    cur = np.zeros((256, 256, 3))
    cur[:,:,0] = color_me[i][:,:,0]
    cur[:,:,1:] = output[i]
    imsave("result/img_"+str(i)+".png", lab2rgb(cur))

## Testing and teaching the network
We trained the network on circa 7400 images and validate it on roughly 900 images, with 20 epochs. You can see the process on this picture:

![alt text](training.png "Training")

It took harshly 3 and a half hours on a Nvidia GTX1070 GPU.

### The test results can be seen in the "colorizing_results" folder

## Summary
With this first model we can reach some not too outstanding results but they are definitely worth mentioning, the model started to color the pictures although it probably needs some more data and time to learn, moreover we can probably satisfy these needs in the future.

The model is quite large, copied to the GPU, it is estimately 7.2GB, so decreasing the size is a reasonable goal, with an eye to the future.

## User interactions

In this version of the program the user interaction part has not been implemented yet. First we would like to see a good result in the simple colorization part. 

In the final version we will give the generated user inputs (see Milestone I.) to the network during training so we hope that this way it can learn how to use the given user inputs later.