# Problem 3

This problem's purpose is to build a convolutional neural network to classify images as hot dogs or not-hot dogs. This is the same problem as seen in the HBO TV show "Silicon Valley" (https://www.youtube.com/watch?v=pqTntG1RXSY).  We'll be using the dataset put together by a user on Kaggle (https://www.kaggle.com/dansbecker/hot-dog-not-hot-dog) which contains 498 training images and 500 test images.

There are two parts to this assignment:

1. A simple CNN is given below.  Due to the small sample size it has a very poor test set accuracy (around 55\%). Your task is to build a CNN that can beat this test set accuracy by a large margin (better than or equal to 70\% test set accuracy).
2. Describe 3 changes that you made beyond what is given in this notebook and explain what effect they had on the test set accuracy (see below for more instructions).

### Submission

Submit this completed and executed notebook on Quercus that shows your best test set accuracy. We will run a friendly competition in class to see who can achieve the best test set accuracy (for bonus points, bragging rights and a small prize).


# Student Info

###Name: Cole Shulman
###Student Number: 1004021408

# Imports

In [10]:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input
from keras.applications.mobilenet import MobileNet, preprocess_input

from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K


# Loading Hotdog-Not-Hotdog Dataset 

In [8]:
# Download files
!wget https://briankeng.com/files/hotdog.tar.gz
!tar -xvzf hotdog.tar.gz

--2022-03-11 21:55:59--  https://briankeng.com/files/hotdog.tar.gz
Resolving briankeng.com (briankeng.com)... 192.0.78.156, 192.0.78.240
Connecting to briankeng.com (briankeng.com)|192.0.78.156|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 46732258 (45M) [application/octet-stream]
Saving to: ‘hotdog.tar.gz.2’


2022-03-11 21:56:00 (66.5 MB/s) - ‘hotdog.tar.gz.2’ saved [46732258/46732258]

hotdog/
hotdog/test/
hotdog/test/hot_dog/
hotdog/test/hot_dog/324507.jpg
hotdog/test/hot_dog/800992.jpg
hotdog/test/hot_dog/716049.jpg
hotdog/test/hot_dog/588881.jpg
hotdog/test/hot_dog/570799.jpg
hotdog/test/hot_dog/838604.jpg
hotdog/test/hot_dog/315220.jpg
hotdog/test/hot_dog/612440.jpg
hotdog/test/hot_dog/250715.jpg
hotdog/test/hot_dog/292683.jpg
hotdog/test/hot_dog/291354.jpg
hotdog/test/hot_dog/380963.jpg
hotdog/test/hot_dog/533521.jpg
hotdog/test/hot_dog/558890.jpg
hotdog/test/hot_dog/408504.jpg
hotdog/test/hot_dog/201986.jpg
hotdog/test/hot_dog/382188.jpg
hotdog/test

In [9]:
# Re-scaled dimensions of our images.
img_width, img_height = 299, 299

train_data_dir = 'hotdog/train'
test_data_dir = 'hotdog/test'

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)

# Frozen Layers in a Pre-Trained Model

In [11]:
incres_base= InceptionResNetV2(include_top=False, weights="imagenet")
incres_base

<keras.engine.functional.Functional at 0x7ff07d77c190>

In [12]:
from keras.layers.pooling import GlobalAveragePooling2D
def mymodel():
    model = Sequential()
    model.add(incres_base)
    model.add(GlobalAveragePooling2D())
    model.add(Flatten())
    model.add(Dense(1024,Activation('relu')))
    model.add(Dropout(0.7))
    model.add(Dense(1, activation='sigmoid'))
    # Freeze layers in the base model (i.e. only train the classifier)
    for layer in incres_base.layers:
      layer.trainable = False

    model.compile(loss='binary_crossentropy',
               optimizer=keras.optimizers.Adam(lr=0.1),
               metrics=['accuracy'])
    
    return model

# Test function
mymodel().summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 inception_resnet_v2 (Functi  (None, None, None, 1536)  54336736 
 onal)                                                           
                                                                 
 global_average_pooling2d_1   (None, 1536)             0         
 (GlobalAveragePooling2D)                                        
                                                                 
 flatten_1 (Flatten)         (None, 1536)              0         
                                                                 
 dense_2 (Dense)             (None, 1024)              1573888   
                                                                 
 dropout_1 (Dropout)         (None, 1024)              0         
                                                                 
 dense_3 (Dense)             (None, 1)                

  super(Adam, self).__init__(name, **kwargs)


### Loading data on the fly

We load the data directly from the images on disk via these Keras helper functions (`ImageDataGenerator` and `flow_from_directory`). It performs two transformations: 

* Rescaling pixels to be between [0, 1]
* Resizing images to be in `img_width`x`img_height` (150x150)

During training for each batch, the images are read from disk on the fly, loaded into memory and then the transformations are applied.

In [13]:
# You may optionally change these parameters
batch_size = 50
epochs = 10
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

# Data parameters (DO NOT MODIFY)
num_train_samples = 498
num_test_samples = 500

# Data generators (DO NOT MODIFY)
train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary'
)

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary'
)

Found 498 images belonging to 2 classes.
Found 500 images belonging to 2 classes.


In [14]:
def evaluate_model(runs=5):
    ''' DO NOT MODIFY THIS FUNCTION '''
    scores = [] 
    for i in range(runs):
        print('Executing run %d' % (i+1))
        model = mymodel()
        model.fit_generator(train_generator,
                            callbacks=[],
                            steps_per_epoch=num_train_samples // batch_size,
                            epochs=epochs, verbose=0)
        print(' * Evaluating model on test set')
        scores.append(model.evaluate_generator(test_generator, 
                                               steps=num_test_samples // batch_size,
                                               verbose=0))
        print(' * Test set Loss: %.4f, Accuracy: %.4f' % (scores[-1][0], scores[-1][1]))
        
    accuracies = [score[1] for score in scores]     
    return np.mean(accuracies), np.std(accuracies)
        
mean_accuracy, std_accuracy = evaluate_model(runs=5)

Executing run 1


  super(Adam, self).__init__(name, **kwargs)
  # Remove the CWD from sys.path while we load stuff.


 * Evaluating model on test set


  # Remove the CWD from sys.path while we load stuff.


 * Test set Loss: 0.3047, Accuracy: 0.9260
Executing run 2
 * Evaluating model on test set
 * Test set Loss: 0.2801, Accuracy: 0.9320
Executing run 3
 * Evaluating model on test set
 * Test set Loss: 0.1681, Accuracy: 0.9240
Executing run 4
 * Evaluating model on test set
 * Test set Loss: 0.4196, Accuracy: 0.9320
Executing run 5
 * Evaluating model on test set
 * Test set Loss: 0.2593, Accuracy: 0.9360


In [15]:
 # You will be evaluated on your mean test set accuracy over 5 runs
print('Mean test set accuracy over 5 runs: %.4f +/- %.4f' % (mean_accuracy, std_accuracy))

Mean test set accuracy over 5 runs: 0.9300 +/- 0.0044
