# Capstone
## *Show Me the Monet!*
### *Or how I learned to stop worrying, and apply neural networks to artistic recognition.*

The goal of the capstone was to generate a method for classifying Monet paintings versus other paintings present. Data that was used was for this analysis is stored on an s3 bucket  
  
The focus of this final project was to gain an understanding of neural networks that was not covered during my time at General Assembly. This required a large amount of learning with respect to neural networks and how they are used in image classification. 

To begin the project needed to decide what set up to use. Settled on Keras on top of theano. Had some experience with Theano from Daniel Nouri's fantastic [facial analysis](http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/) work. Additional goals for the project were to determine what key features were identified by the model in classifying Monets versus other works. 

The stretch objective of this project was also to be able to use the neural network to create photos that have been manipulated to appear like works of Monet. This may require additional work following the end of the course. 

In [None]:
from keras.models import Sequential        # Linear stack of 'neurons'
import numpy as np


from keras.layers import (                 # Flattened list of layers
    
                          Convolution2D,   # Filters via windows of 2d input
                          MaxPooling2D,    # Subsets image into matrices, largest value of matrix is taken
                          Activation,      # Filters tha activate when deisred features are identified
    
                          Dropout,         # Prevents overfitting by dropping based on probability 1-p
                          Flatten,         # Layers comprising model graph
                          Dense            # Specify input argument shape
    
                          )

Below is a quick demonstration of how the image processing works. A photo that is fed through the below features will have a variety of transformations applied to it. 

In [None]:
# Data Cleaning and Preparation

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img



d_gen = ImageDataGenerator( 
                            rotation_range=0.15,   # Random Roation within specified range
                            width_shift_range=0.15, # Horizontal shift as a fractional of image width
                            height_shift_range=0.15, # Vertical shift as fraction of image height
                            shear_range=0.15, # Intensity of shear
                            zoom_range=0.15, # Either number or list, but specifies the range for the random zoom
                            
            
                            fill_mode='nearest', # How pixels outside of the boundary are dealt with
                            horizontal_flip=True, # Random horizontal flip of sample
                            vertical_flip=True, # Random vertical flip of sample
                            rescale=1./225, # Scalar multiplying the image. 1 used  
#                             dim_ordering=K.image_dim_ordering()
                          )

In [108]:
monet1 = load_img('Claude_Monet,_Impression,_soleil_levant.jpg') # PIL image
monet_array = img_to_array(monet1) # Convert the 2d Image to an array
monet_array = monet_array.reshape((1,) + monet_array.shape) # Reshape to 1D

Show the purdy pictures.

In [109]:
# t = 0
# for batch in d_gen.flow(monet_array, batch_size=1,
#                         save_to_dir='mod', save_prefix='monet', 
#                         save_format='jpeg'):
#     t += 1
#     if t > 20: # Create 20 manipulated images
#         break 

# Creation of the Model

*Here is where the model is created. So far the model that is being used is a rather simple three step neural network. Then going to build out a bit more*

In [22]:


model = Sequential()

model.add(Convolution2D(32,3,3, input_shape=(3,150,150)))
model.add(Activation('relu'))
model.add(Dropout(0.2))  
model.add(MaxPooling2D(pool_size=(3,3)))

model.add(Convolution2D(64,2,2))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Convolution2D(128,2,2))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Convolution2D(256,3,3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3)))


In [23]:

# model = Sequential()
# model.add(Convolution2D(32,3,3, input_shape=(3,100,100)))
# model.add(Activation('relu'))
# model.add(MaxPooling2D(pool_size=(3,3)))

# model.add(Convolution2D(64,3,3)
# model.add(Activation('relu'))
# model.add(MaxPooling2D(pool_size=(3,3)))
          
# model.add(Convolution2D(128,2,2))
# model.add(Activation('relu'))
# model.add(MaxPooling2D(pool_size=(2,2)))

# Build the model

In [24]:
model.add(Flatten())
model.add(Dense(256, activation='relu'))   # Rectified Linear Unit
model.add(Dropout(0.5))                 # Drop 50% of the Neurons
model.add(Dense(128, activation='sigmoid'))  # Yay, sigmoids
model.add(Dropout(0.5))                 # Drop 50% of the Neurons
model.add(Dense(1, activation='sigmoid'))


model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [25]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
convolution2d_9 (Convolution2D)  (None, 32, 148, 148)  896         convolution2d_input_3[0][0]      
____________________________________________________________________________________________________
activation_9 (Activation)        (None, 32, 148, 148)  0           convolution2d_9[0][0]            
____________________________________________________________________________________________________
dropout_8 (Dropout)              (None, 32, 148, 148)  0           activation_9[0][0]               
____________________________________________________________________________________________________
maxpooling2d_9 (MaxPooling2D)    (None, 32, 49, 49)    0           dropout_8[0][0]                  
___________________________________________________________________________________________

In [26]:
# use the above datagen for training
d_gen = ImageDataGenerator(rescale=1./255,
                          shear_range=0.2,
                          zoom_range=0.2,
                          horizontal_flip=True)

# For test only use a rescale function
test_d_gen = ImageDataGenerator(rescale=1./255)

In [27]:
import numpy as np

monet_gen = d_gen.flow_from_directory('/home/sambozek/Desktop/Better_Data/Test/',       # Folder containing the test images
                                     target_size = (150, 150),              # Resize to 150 * 150 pxs
                                     batch_size = 25,                       # Reduce Noise, larger batch less noise
                                     class_mode='binary')                   # Binary Classification Problem

# bottleneck_features_train = model.predict_generator(monet_gen, 1500)
# np.save(open('bottleneck_features_train.npy', 'w'), bottleneck_features_train)

hold_out = d_gen.flow_from_directory('/home/sambozek/Desktop/Better_Data/Hold_Out/',
                                         target_size= (150, 150),
                                         batch_size = 25,
                                         class_mode='binary')

# bottleneck_features_validation = model.predict_generator(hold_out, 1000)
# np.save(open('bottleneck_features_validation.npy', 'w'), bottleneck_features_validation)

Found 500 images belonging to 2 classes.
Found 500 images belonging to 2 classes.


To avoid `Epoch comprised more than 'samples_per_epoch' samples` error need to make sure that batch size is divisible into 

In [30]:
model.fit_generator(monet_gen,                  # Resize the image get from test dictionary
                   samples_per_epoch=125,      # How many samples to use per epoch
                   nb_epoch=500,                 # Number of epochs used
                   validation_data = hold_out,  # Location of the validation data
                   nb_val_samples=250,          # Validation samples used per epoch
                   verbose=1)

model.save_weights('monet_weights.h5', overwrite=True)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500

KeyboardInterrupt: 

In [None]:
train_data = np.load(open('bottleneck_features_train.npy'))
# the features were saved in order, so recreating the labels is easy
train_labels = np.array([0] * 750 + [1] * 750)

validation_data = np.load(open('bottleneck_features_validation.npy'))
validation_labels = np.array([0] * 500 + [1] * 500)

model = Sequential()
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, train_labels,
          nb_epoch=2500, batch_size=32,
          validation_data=(validation_data, validation_labels))
model.save_weights('bottleneck_fc_model.h5')

In [None]:
from keras.utils.visualize_util import plot
import pydot
plot(model, to_file='model.png')

In [None]:
monet_model = model.to_json() # Save model to json

In [None]:
print(model.history)

* Test : 
    * Monet:
    * Not Monet:
* Hold Out: **No Issues**
    * Monet:
    * Not Monet:

*Initial running of the model found there was an issue with the image files and was given this error message:  
**IOError: image file is truncated (39 bytes not processed)**  
From [StackOverflow](http://bit.ly/21oXdhF) it was found that the images were likely corrupted or had unneeded trailing bytes. In order to find these truncated files used [jpeginfo](https://github.com/tjko/jpeginfo) to create a text file of the analysis of the jpegs. Output read by pandas allows for isolation of the trouble data.*


!jpeginfo -c *.jpg >> Test_monet.txt

In [None]:
import pandas as pd
import numpy as np

In [None]:
# test_monet = pd.read_csv("/Users/sebozek/Desktop/Test.txt",delim_whitespace=True,header=None)
# test_not_monet = pd.read_csv("/Users/sebozek/Desktop/Test.txt",delim_whitespace=True,header=None)
# val_monet = pd.read_csv("/Users/sebozek/Desktop/HoldOut.txt",delim_whitespace=True,header=None, error_bad_lines=False)
# val_not_monet = pd.read_csv("/Users/sebozek/Desktop/HoldOut_nm.txt", delim_whitespace=True, header=None,  error_bad_lines=False)

In [None]:
# dfs = [test_monet, test_not_monet, val_monet, val_not_monet]

# # More Descriptive Labels
# for df in dfs:
#     df.columns = ["img_name", "width", "x", 'height', 'bits', 'format', 'N/P', 'pixels', 'test']

In [None]:
# for df in dfs:
#     print df.test.unique()
    
# print val_not_monet.test.unique()

In [None]:
# val_not_monet['img_name'][val_not_monet.test == '[ERROR]'].values

## To Do

* Expand to try categorizing styles?
* Figure out a few hatches
* Try getting to above a 95% accuracy
* Manipulate loss functions
* Change the dropout rate and see what's going on
* Validation
    * ROC curve
    * Confusion Matrix
* What does model think is important?
* Restricted Boltzmann Machine for Monet Replication

In [None]:
# Import previous model (that worked)
model.load_weights('my_model_weights.h5')

# Get image output
output_1 = K.function([model.layers[0].input],
                     [model.layers[1].output])

In [None]:
import json

with open('model.txt', 'w') as outfile:
    json.dump(data, outfile)

In [None]:
test_not_monet.head(1)