**Building a strong image classification model from less data**

The implementation is a slight variation of the one in https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d

Mainly, in this kernel , the method flow(x,y) is used whereas, in the above gist, method flow_from_directory(directory) is used.
For more info, you can refer https://keras.io/preprocessing/image/

The change is made to have an appropriate kernel to deal with the way data is structured in kaggle. Appropriate changes in other parts of the source code is also done.

**Perform the necessary imports.**

In [None]:
import os, cv2, re, random
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, load_img
from keras import layers, models, optimizers
from keras import backend as K
from sklearn.model_selection import train_test_split

**Data dimensions and paths**

In [None]:

TRAIN_DIR = '../input/dogs-vs-cats-redux-kernels-edition/train/'
TEST_DIR = '../input/dogs-vs-cats-redux-kernels-edition/test/'
train_images_dogs_cats = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)] # use this for full dataset
test_images_dogs_cats = [TEST_DIR+i for i in os.listdir(TEST_DIR)]
NO_EPOCHS=10
RESNET_WEIGHTS_PATH = '../input/keras-pretrained-models/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'


**Helper function to sort the image files based on the numeric value in each file name.**

In [None]:
len(train_images_dogs_cats)

**Sort the traning set. Use 1300 images each of cats and dogs instead of all 25000 to speed up the learning process.**

**Sort the test set**

In [None]:
len(train_images_dogs_cats)

In [None]:
train_images_dogs_cats

In [None]:
from sklearn.preprocessing import LabelEncoder

**Now the images have to be represented in numbers. For this, using the openCV library read and resize the image.  **

**Generate labels for the supervised learning set.**

**Below is the helper function to do so.**

In [None]:
from tqdm import tqdm

In [None]:
def prepare_data(list_of_images):
    """
    Returns two arrays: 
        x is an array of resized images
        y is an array of labels
    """
    x = [] # images as arrays
    y=[]
    
    for image in tqdm(list_of_images):
        x.append(cv2.resize(cv2.imread(image), (224,224), interpolation=cv2.INTER_CUBIC))
        z=(re.split('\d+',image)[0][-4:-1])
        if 'cat' in z:
            y.append(0)
        else:
            y.append(1)

                
    
    return x,y

**Generate X and Y using the helper function above**

**Since K.image_data_format() is channel_last,  input_shape to the first keras layer will be (img_width, img_height, 3). '3' since it is a color image**

In [None]:
train_images_dogs_cats[0]

In [None]:
X ,Y= prepare_data(train_images_dogs_cats)
print(K.image_data_format())

In [None]:
X[0]

In [None]:
Y[0]

In [None]:
len(X)

In [None]:
print(type(X),type(Y))

In [None]:
X = np.array(X)
Y = np.array(Y)

In [None]:
X.shape

In [None]:
np.unique(Y,return_counts=True)

**Split the data set containing 2600 images into 2 parts, training set and validation set. Later, you will see that accuracy and loss on the validation set will also be reported while fitting the model using training set.**

In [None]:
print(len(X),len(Y))

In [None]:
Y

In [None]:
Y

In [None]:
from keras.utils import to_categorical
Y1 = to_categorical(Y)

In [None]:
Y[:10]

In [None]:
Y1[:10]

In [None]:
# First split the data in two sets, 80% for training, 20% for Val/Test)
X_train, X_val, Y_train, Y_val = train_test_split(X,Y1, test_size=0.2, random_state=7)

In [None]:
len(X_val)

In [None]:
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
batch_size = 64

**We will be using the Sequential model from Keras to form the Neural Network. Sequential Model is  used to construct simple models with linear stack of layers. **

**More info on Sequential model and Keras in general at https://keras.io/getting-started/sequential-model-guide/ and https://github.com/keras-team/keras**

In [None]:
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.applications import ResNet50
from tensorflow.python.keras.layers import Dense,Dropout

In [None]:
model = Sequential()
model.add(ResNet50(include_top=False, pooling='max', weights=RESNET_WEIGHTS_PATH))
model.add(Dropout(0.2))
model.add(Dense(2, activation='softmax'))
# ResNet-50 model is already trained, should not be trained
model.layers[0].trainable = True

In [None]:
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
print(X.shape,X_train.shape,X_val.shape)

In [None]:
type(Y_val)

In [None]:
print(np.unique(Y_train,return_counts=True),np.unique(Y_val,return_counts=True))

In [None]:
train_model = model.fit(X_train,Y_train ,
    batch_size=64,
    epochs=10,
    verbose=1,
    validation_data=(X_val,Y_val))

In [None]:
import matplotlib.pyplot as plt
hist=train_model.history
acc=hist['acc']
val_acc=hist['val_acc']
epoch=range(len(acc))
loss=hist['loss']
val_loss=hist['val_loss']
f,ax=plt.subplots(1,2,figsize=(16,8))
ax[0].plot(epoch,acc,'b',label='Training Accuracy')
ax[0].plot(epoch,val_acc,'r',label='Validation Accuracy')
ax[0].legend()
ax[1].plot(epoch,loss,'b',label='Training Loss')
ax[1].plot(epoch,val_loss,'r',label='Training Loss')
ax[1].legend()
plt.show()



**Saving the model in Keras is simple as this! ** 

**It is quite helpful for reuse.**

In [None]:
import keras

In [None]:
model.save_weights('model_weights.h5')
model.save('model_keras.h5')

In [None]:
X,_=prepare_data(test_images_dogs_cats)

In [None]:
X = np.array(X)

In [None]:
y_test=model.predict(X,verbose=1)

In [None]:
import matplotlib.pyplot as plt

In [None]:
test_images_dogs_cats[0]

In [None]:
f,ax=plt.subplots(1,5,figsize=(10,5))
i=0
for x in test_images_dogs_cats[:5]:
    print(ax[i].imshow(cv2.imread(x))) 
    i+=1

In [None]:
y_test[:,1]

In [None]:
y_final=y_test[:,1]

In [None]:
# y_final=[0 if x[0]>x[1] else 1 for x in y_test ]

# y_final[:5]

In [None]:
len(test_images_dogs_cats)

In [None]:
len(X)

In [None]:
df_test=pd.DataFrame({'id':range(1,len(X)+1),'label':y_final})
df_test.to_csv('solution1.csv',index=False)