## 1. Exploring VGG16 Convolutional Neural Network <a id="1"></a> 

[VGG16](https://neurohive.io/en/popular-networks/vgg16/) is a well-established convolutional neural network (CNN) trained on millions of images for classification in 1000 object categories. VGG16 is publically available through keras, an API for tensorflow. We can readily import it, inspect its architecture, and ran some images into it and see what kind of object the network thinks it is. The following code blocks explore the functionality of VGG16. Note: in order to run the code in this notebook, you must have tensorflow, keras, and other supporting modules installed in your virtual environment. See README for instruction on how to install these dependencies.   

In [1]:
from keras.applications.vgg16 import VGG16

model = VGG16()
model.summary()

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

In [2]:
import os
import re
import pandas as pd
import numpy as np
from keras.preprocessing import image
from keras.preprocessing.image import img_to_array, load_img
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions

# Try using VGG16 directly on an image
# The image is a single family home, and the network thinks that it is a mobile home, which is not too bad

image = image.load_img('House3.jpg', target_size=(224, 224))
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)
y_hat = model.predict(image)
label = decode_predictions(y_hat)[0][0]
print(label)

('n03776460', 'mobile_home', 0.59778005)


## 2. Manipulating VGG16 Network Layers <a id="2"></a>

The next two blocks contain codes for eliminating the fully-connected layers and softmax layer from the VGG16 network, and inspect the resulting 7x7x512 tensor after we input a 224x224 (required input dimension for VGG16) color image to the network. 

Although not required for this project, we also played around with inserting layers into VGG16. The code is in [Section 8](#8) below.

In [3]:
# Use 'include_top=False' argument to not include the output layer

vgg_model = VGG16(weights = 'imagenet', include_top = False)
vgg_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
__________

In [4]:
import os
import re
import pandas as pd
import numpy as np
from keras.preprocessing import image
from keras.preprocessing.image import img_to_array, load_img
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions

# Try using VGG16 directly on an image without prediction layer

vgg_model = VGG16(weights = 'imagenet', include_top = False)
image = image.load_img('House3.jpg', target_size=(224, 224))
image = img_to_array(image)
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)
y_hat = vgg_model.predict(image)

print(y_hat.shape)
print(y_hat)

(1, 7, 7, 512)
[[[[ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   ...
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]]

  [[ 0.         29.363773    0.         ...  0.         29.04313
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   ...
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]
   [ 0.          0.          0.         ...  0.          0.
     0.        ]]

  [[ 0.          0.          0.0690162

## 3. CNN Image Batch Processing

The next several blocks of code process all property images. There are ~ 700,000 images in total for interior and exterior from ~ 62,000 properties around the Boston area. For each image, the code extracts all the non-zero entries from an output tensor, record their indices, and write them into a dictionary along with the property's MLSNUM and IMGNUM.

The dictionaries from all images are stored in a list, which is in turn written into a csv file.  

In [5]:
# Define a function to only record the indices where the tensor has non-zero values
# Doing this will save a lot of space when writing the feature vectors into a database

def nonzeros(arr, filtered):
    for index, item in enumerate(arr):
        if type(item) is np.ndarray:
            filtered[index] = nonzeros(item, dict())
        else:
            if item != 0:
                filtered[index] = item
    return filtered

In [7]:
# This block of code processes a folder containing images to be analyzed, and store each image's 
# feature vector (the non-zero entries) into a dictionary  
# Process a folder called 'temp' that contains only 14 images to validate the code 

import os
import re
import pandas as pd
import numpy as np
from keras.preprocessing import image
from keras.preprocessing.image import img_to_array, load_img
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
import csv

# Create regular expression pattern
p = re.compile("(\d{8})_img_(\d+).jpg")

# Create an array to store image features
data_arr = []
failcount = 0

for root, dirs, files in os.walk('C:\\temp'):
    for name in files:
        match = p.match(name)
        if match:
            try:
                img_dict = {
                    'MLSNUM': match.group(1),
                    'IMGNUM': match.group(2)
                }
                path = os.path.join(root, name)
                img = image.load_img(path, target_size=(224, 224))
                img_data = img_to_array(img)
                img_data = np.expand_dims(img_data, axis=0)
                img_data = preprocess_input(img_data)
                
                features = vgg_model.predict(img_data)              # features is a numpy ndarray
                img_dict['FEATURES'] = nonzeros(features, dict())   # store the non-zero entries into a nested dictionary
                
                data_arr.append(img_dict)                   # store {MLSNUM: IMGNUM: FEATURES:} as an element in a list
            except OSError:
                failcount += 1

print('There were ', failcount, ' images that failed to load.')

# make the data array into a data frame so we can look at it
data_df = pd.DataFrame(data_arr)

print(data_df.shape)
data_df

There were  0  images that failed to load.
(14, 3)


Unnamed: 0,FEATURES,IMGNUM,MLSNUM
0,"{0: {0: {0: {4: 2.9935026, 28: 51.557457, 82: ...",0,71524024
1,"{0: {0: {0: {17: 0.6998304, 25: 24.099962, 55:...",1,71524024
2,"{0: {0: {0: {6: 9.023701, 12: 29.83715, 20: 1....",10,71524024
3,"{0: {0: {0: {12: 4.6816025, 25: 17.246347, 26:...",11,71524024
4,"{0: {0: {0: {5: 5.2803173, 30: 54.870575, 58: ...",12,71524024
5,"{0: {0: {0: {4: 18.096224, 20: 10.33395, 25: 1...",13,71524024
6,"{0: {0: {0: {2: 8.998453, 4: 22.723763, 46: 15...",2,71524024
7,"{0: {0: {0: {40: 26.316399, 55: 1.3048272, 63:...",3,71524024
8,"{0: {0: {0: {0: 2.428958, 2: 1.8430979, 5: 4.1...",4,71524024
9,"{0: {0: {0: {1: 48.76469, 28: 30.675917, 37: 2...",5,71524024


In [12]:
# Write the data_arr into a csv file 

import csv

with open('temp.csv', 'w', encoding='utf8', newline='') as output_file:
    fc = csv.DictWriter(output_file, 
                        fieldnames=data_arr[0].keys(),
                       )
    fc.writeheader()
    fc.writerows(data_arr)

In [None]:
# Write the data into a database

import psycopg2
import pandas as pd

conn = psycopg2.connect("host=cnn-project3.cetsu4jwuaoc.us-east-1.rds.amazonaws.com dbname=fulldata port=5432 username=postgres password=JhuSli0d38oNNfAauam9")
current = conn.cursor()

df_condos = pd.read_csv("temp.csv", sep='|', index_col=False)

for index, condo in data_df.iterrows():
    current.execute(
    "'INSERT INTO photos (mlsnum, imgnum, features) VALUES (%s%s%s)'",
    (condo.MLSNUM, condo.IMGNUM, condo.FEATURES)
    )
    conn.commit()

current.close()
conn.close()

## 4. Computing "Similarity" Between Two Images

The next several blocks of code compute the "similarity" or "distance" between two image tensors obtained from VGG16 feature extraction. 

Algorithmatically, the following steps are used to compute similarity:
1. An user's input image goes through VGG16 convoluitonal layers and the CNN outputs a 7x7x512 output tensor (a numpy array). 
2. The feature tensor from one image stored in our database is retrieved as a string, and the string is converted back to dictionary, and the dictionary is converted back to a numpy array
3. The numpy array from 1. is subtracted from the numpy array from 2.
4. The numpy norm() function is applied to the resulting numpy array from 3. to obtain the distance
5. Repeat steps 1. to 4. for all images stored in the database
6. Select the top 5 properties to display to user (5 smallest distance score)

In [8]:
# Function to convert a dictionary back to its corresponding numpy array (recover all omitted zeroes) 

import numpy as np

def backToArr(dict):
    arr = np.zeros((1, 7, 7, 512))

    for row in dict[0]:
        for col in dict[0][row]: 
            for elem in dict[0][row][col]:
                arr[0][row][col][elem] = dict[0][row][col][elem]
    return arr

In [9]:
# Compute norm(diff) from dictionary directly 

import pandas as pd
import numpy as np
from keras.preprocessing import image
from keras.preprocessing.image import img_to_array, load_img
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from numpy.linalg import norm

user_input = '71524024_img_0.jpg'
img = image.load_img(user_input, target_size=(224, 224))
img_data = img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
                
features = vgg_model.predict(img_data)

arr = backToArr(data_df['FEATURES'].iloc[0])

norm(features - arr)

0.0

## 5. Issue in Precision Loss During Data Conversion and Transmission<a id="5"></a>

The previous code block demonstrates that the norm() function is working properly even if the input numpy array is multi-dimensional. For two identical images norm() outputs a distance of 0, as it should be. 

However, when our algorithm is run formally, minor precision loss is noticed after data conversion and transmission, such that the distance between two identical images are not exactly 0, but some small number. In the following paragraph, a double arrow, "&rarr;&rarr;", represents a VGG16 feature extraction step; a single arrow, "&rarr;", represents a single data conversion step. 

An input image from user outputs a numpy array: 

<font size=3>user input:  image &rarr;&rarr; numpy array</font>

<u>But how a database feature tensor is handled affects precision!</u> We tested the following 3 scenarios:


| <font size=2.5>Scenario</font> | <font size=2.5>Conversion Process</font> | <font size=2.5>Distance Computed by norm()</font> |
| ------------------------------ | ----------------------------------------------- | --------------------------------------: |
| <font size=2.5>Retrieve feature tensor directly without converting to string <br/> and without importing to database</font> | <font size=3>image &rarr;&rarr; numpy array &rarr; dict &rarr; numpy array</font> | <font size=3>$$0.0$$</font> |
<h>
| <font size=2.5>Feature tensor stored as dict is converted to string, then converted back to dict, <br/>then converted to numpy array</font> | <font size=3>image &rarr;&rarr; numpy array &rarr; dict <span style="color:red">&rarr; string &rarr; dict</span> &rarr; numpy array</font> | <font size=3>$$2.28e^{-5}$$</font> |
<h>
| <font size=2.5>Feature tensor stored as dict is converted to string, then converted back to dict, then converted to numpy array, then converted to python list using tolist();<br/>(requires user input to also be converted to list)</font> | <font size=3>image &rarr;&rarr; numpy array &rarr; dict <span style="color:red">&rarr; string &rarr; dict</span> &rarr; numpy array <span style="color:blue">&rarr; list</span></font> | <font size=3>$$5.29e^{-11}$$</font> |

<br/>
    
The above table indicates that conversion of the output numpy array to string causes a small loss of precision such that the distance computed is not exactly 0 anymore. However, using tolist() to convert the numpy array to list recovers some of the precision. When the data was imported to the database and retrieved, a relatively large precision loss is observed: 

| <font size=2.5>Scenario</font> | <font size=2.5>Conversion Process</font> | <font size=2.5>Distance Computed by norm()</font> |
| ------------------------------ | ----------------------------------------------- | --------------------------------------: |
| <font size=2.5>Retrieve feature tensor from the database (stored as string), converted back to dict, then converted to numpy array</font> | <font size=3>image &rarr;&rarr; numpy array &rarr; dict <span style="color:red">&rarr; string &rarr; database &rarr; string &rarr; dict </span>&rarr; numpy array</font> | <font size=3>$$12 - 20$$ (depending on image)</font> |

<br/>

In essence, a trip to a database caused a significant loss in precision. We could implement tolist() to the final numpy array after retrieving the feature tensor from the database, and see if this will improve precision significantly like scenario 3 above. 

<font size=2.5>**However, as far as current application is concerned, loss of precision is does NOT alter the output result, that is, the most similar image will still have the lowest distance score among all images, since precision loss acts universally and consistently on all images.**</font> 

For example, an input image will give a distance score of ~ 15 with respect to an identical image stored in the database, whereas all other dissimilar images' distance scores are in the range of thousands.

The following code blocks explore our observation of precision loss, and explains a strategy by which we could keep this phenomenon to a minimum. 

In [10]:
# Simulate the real process when retrieving feature tensor as a string from database
# Compute norm(diff) by first converting features to string then converting back to dictionary 

import pandas as pd
import numpy as np
from keras.preprocessing import image
from keras.preprocessing.image import img_to_array, load_img
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from numpy.linalg import norm
from ast import literal_eval

a = str(data_df['FEATURES'].iloc[0])
b = literal_eval(a)
arr = backToArr(b)

img = image.load_img('71524024_img_0.jpg', target_size=(224, 224))
img_data = img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
                
features = vgg_model.predict(img_data)

# We lost some precision during conversions
norm(features - arr)

2.2809222259403666e-05

In [11]:
# What if we convert the output numpy array into a python list?

#c = backToArr(data_arr[0]['FEATURES']).tolist()[0]
c = arr.tolist()[0]
d = []

for i in range(len(c)):
    for j in range(len(c[i])):
        for k in range(len(c[i][j])):
            d.append(c[i][j][k])

e = features.tolist()[0]
f = []

for i in range(len(e)):
    for j in range(len(e[i])):
        for k in range(len(e[i][j])):
            f.append(e[i][j][k])

print(len(d))
print(len(f))

g = []
for i in range(len(d)):
    g.append((d[i] - f[i])*(d[i] - f[i]))

print(len(g))

# significant improvement in precision!
print(sum(g))
print(norm(g))

25088
25088
25088
5.20260620078875e-10
5.2910455076166615e-11


## 6. Conclusion<a id="6"></a>

* VGG16 can be used not only for image classification (supervised learning) but also for unsupervised learning such as image feature extraction as we successfully demonstrated in this project
* Feature tensors extracted can be regarded as a high-dimensional vector (i.e., a point in a high-dimensional space), and a similar image is expected to have similar feature tensor element-wise
* The numpy norm() function is a good function to use for computing the "distance" between two images, since norm() is able to accept multi-dimension tensor and computes the norm without having to reduce the tensor to an array
* It turns out that serialization among different data types can cause a slight loss in precision: 
   * One remedy is to convert the numpy array to list, which recovers some of the precision as demonstrated in [Section 5](#5)
   * The loss in precision does not affect the matching process becase the loss in precision acts universally and consistently on every image
* Computation time is huge if an input image is to compare with every feature tensor stored in the database (~ 700,000 of them), far exceeding the within 2 seconds rule if we are to deploy it as a fully functional app, some suggestion for improving this:
   * Use PCA to reduce the dimension of the feature tensor (at the moment every image has 7x7x512 = 25,088 features)
   * Apply some type of clustering to the imges stored in our database so that not all feature tensors have to be retrieved during computaiton time
   * Upgrade hardware infrastructure 

## 7. Future Directions<a id="7"></a>

#### Reduction of Dimensionality
Feature extraction from VGG16 network outputs a 7x7x512 = 25088 features for a given image. As we saw from [Section 2](#2) that the output tensor is sparse, i.e., most of the entries are 0. This implies that there are a lot of redundant features in the output. It would be computationally more efficient if we can transform the high-dimensional feature space (25088-dimensional) into a lower dimensional space. 

Principal component Analysis (PCA) is the most widely used unsupervised learning technique to reduce feature space. Both matplotlib.mlab and sklearn offer a PCA module. 

#### Further Investigate the Issue of Precision Loss
Flattening the output numpy array using tolist() seems to help a lot with precision, so it is worth investigating the possibility of implementing approach in our image process pipeline without compromising runtime. 

#### Enhance Visual Appeal of the App
Due to limited time and the time investment in investigating the issue of precision loss, current app does not have a good visual appeal. We will enhance the visual appeal of this app in the near future. 

#### Reduce Computational Tasks for an Input Image
At its current implementation in full scale, our app computes the distance between the input image feature tensor with every feature tensor stored in our database, which is ~ 700,000. Computation time is fairly long to run this computation. If we can somehow segment out just the relevant images to process it can save a lot of time. For example, if the input image is a kitchen, we only computes distances of the input image to all kitchen images in our database. This require prior clustering of our database images. [Places 365](http://places2.csail.mit.edu/) from MIT specializes in classifying images pertaining to places, and can be applied for this purpose.    

## 8. Miscellaneous Exercises<a id="8"></a>

The next several blocks of code were used during initial exploration or were written in anticipation of the next phase in the image process pipeline. However, at the end they were not used in the development of the application. Feel free to play around the code.  

In [None]:
# Code for downloading images by looping through url's stored in the last column of the master csv file 

import urllib

with open("XX.csv", 'r') as csvfile:
    i = 0
    for line in csvfile:
        splitted_line = line.split(",")
        if splitted_line[38] != '' and splitted_line[38] != \n:
            urllib.urlretrieve(splitted_line[38], "img" + str(splitted_line[0]) + ".jpg")
            i += 1
        else:
            print("No result for index " + splitted_line[0] + "with mls number " + splitted_line[1])

In [None]:
# Use VGG16 as feature extractor: remove the last softmax (prediction) layer

from keras.layers import Dense, Flatten, Activation
from keras.layers import Input
from keras.models import Model

x = model2.output
x = Flatten(name = 'flatten')(x)
x = Dense(128, activation='relu', name='fc1')(x)
out = Dense(128, activation='relu', name='fc2')(x)
custom_vgg_model = Model(image_input, out)

print(custom_vgg_model.summary())

#prediction = Dense(num_classes, activation = 'softmax')(out)
#print(custom_vgg_model.output.shape)

In [None]:
# VGG16 as feature extractor: include every VGG16 layers except the last softmax layer


model3 = VGG16(input_tensor = image_input, weights = 'imagenet', include_top = True)

out = model3.get_layer('fc2').output

custom_vgg_model2 = Model(image_input, out)

print(custom_vgg_model2.summary())