# Hotdog or Not Hotdog?

---

 Context:
 https://www.youtube.com/watch?v=pqTntG1RXSY


## Summary

The task was to predict whether an image was either a hotdog, or not a hotdog, based on a Linear Regression algorithm. We look at the difference between doing this from a fixed-location pixel values, and from the embedded result from a neural network.

Our dataset was provided by DanB on Kaggle: https://www.kaggle.com/dansbecker/hot-dog-not-hot-dog/download 


## Part 1: Pixel Positional Linear Regression

We start by setting this up with a basic 1 dimensional array based on rescaled 16 x 16 grayscale images.

## Part 2: Neural Network Embedding Linear Regression

Next, try the same regression, but start by inputting 150 x 150 colour images into ImageNet. We run the linear regression on the embedded output from the neural net.

---
<center><h5> Part 1: Pixel Positional Linear Regression </h5></center>
---
### Imports
Start by importing the required libraries:

In [6]:
from matplotlib import pyplot as plt
from matplotlib import cm
import numpy as np
from skimage.io import imread
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

import cv2

---
### Loading Data

Start by heading to the source dataset on Kaggle, and downloading the zip file. 

Then unzip the file and relocate it to the same directory as your code.

Once that's done, you can load the dataset with sklearn. (Because the content is not text, we set the load_content to False, so sklearn doesn't try to automatically load everything).

*NB: The notebook seperates a training file and a test file by default. I've moved all the .jpgs into the relevent folder within train, and discarded the test folder)*

In [8]:
# Load the Folder with sklearn. 
folder = datasets.load_files("./hot-dog-not-hot-dog/seefood/train",
                                     load_content=False
                                    )

{'filenames': array(['./hot-dog-not-hot-dog/seefood/train/hot_dog/1308699.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/102037.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/126784.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/779193.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/181579.jpg',
       './hot-dog-not-hot-dog/seefood/train/hot_dog/1059903.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/160301.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/96935.jpg',
       './hot-dog-not-hot-dog/seefood/train/hot_dog/3758438.jpg',
       './hot-dog-not-hot-dog/seefood/train/hot_dog/3742819.jpg',
       './hot-dog-not-hot-dog/seefood/train/hot_dog/971944.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/100148.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/4176.jpg',
       './hot-dog-not-hot-dog/seefood/train/not_hot_dog/130766.jpg',
       './hot-dog-not-hot-dog/seefood/t

---
### Setting up the data

Next, we setup our data into a useable numpy array format.

**IMAGE_SIZE** is the dimensions of the image we are resizing to (we're starting with 16)

Our **xs** will be a 2d numpy array, which contains: 

       - 498 arrays (the total number of images)
       - 256 integers / array (the pixels for each image)
       
Our **ys** will by a 1d numpy array, which contains:

       - 498 integers (0 or 1) representing whether the target for each image is *hotdog* or *not hotdog*.

In [10]:
IMAGE_SIZE = 16

# Setup arrays
xs = np.zeros((len(folder.filenames), IMAGE_SIZE ** 2))
ys = np.array(folder.target)

# print(xs.shape) Expected: (498, 256)
# print(ys.shape) Expected: (498, )

(498, 256)
(498,)


---
### Formating the images and adding to arrays

Now we iterate over all the files in the loaded folder. 

For each, we're using sklearn to read the image as black & white, then using OpenCV (cv2) to resize the image to 16 x 16.

The **lowresim** will return a 2d (16x16) array. So we'll need to flatten this using the reshape method.

Finally, we set the index of **xs** to our flat array.

In [11]:

# Iterate over the loaded folder and format images
for i, file in enumerate(folder.filenames):   
    im = imread(file, as_gray=True)
    lowresim = cv2.resize(im, dsize=(IMAGE_SIZE,IMAGE_SIZE), interpolation=cv2.INTER_CUBIC)
    reshaped = lowresim.reshape(IMAGE_SIZE ** 2)
    xs[i] = reshaped

---
### Splitting the training data and testing data

It's important that we seperate our training data from the testing data, so that leave some unseen images to test with. We'll do this with sklearn's train_test_split. 

We set our x_train (The images) and y_train (The classification targets) as seperate from the x_test and y_test. In this case we're saving 20% for our testing data. 

*NB: Using random state allows us to keep state, so that same randomization function is used. This way for any input, our random output will be constant.*

In [12]:

# Split training and testing
x_train, x_test, y_train, y_test = train_test_split(xs, ys, test_size=0.20, random_state=0)

# print(x_train.shape) Expected: (398, 256)


(398, 256)


---
### Training the model

Time for the magic to happen.

More on sklearn's LogisticRegression here: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html 

In [13]:

# Train the model
logisticRegr = LogisticRegression(solver = 'lbfgs')
logisticRegr.fit(x_train, y_train)


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

---
### Verify the score of our Model

Finally we set a **score** to verify our use the model to test images against the test targets.

Given the complexity and noise of the images, we get a pretty dismal score of 0.56 (only 6% better than random guessing).

In [14]:

# Check the score
score = logisticRegr.score(x_test, y_test)

print(score)

0.56


---
<center><h5> Part 2: Neural Network Embedding Linear Regression </h5></center>
---
Based on the output above, the main difficulties relate to the insufficiency of pixel positioning. These would include: 
    
        - The background noise in the Photo
        - The positioning and angle of the hotdogs

So, rather than relying on the grayscale value of each pixel, instead, we can feed our images into a pre-trained neural network, and take an output value from the network. This should provide us a more position agnostic value, with better information on the content of the photo.

---
### Import the Neural Network libraries

In [15]:
from keras.layers import Dropout, Flatten, Dense, BatchNormalization, Activation
from keras.models import Model
from keras import optimizers
from keras.applications import inception_v3

import tensorflow as tf

Using TensorFlow backend.


---
### Setup the pre-trained network

Imagenet is a pretrained neural network from Keras, which has 1000 classifications from a large dataset of images. 

http://www.image-net.org/

In [16]:
# Initialize inception model
inception = inception_v3.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(150, 150, 3)
)

---
### Setup our Model

Now, we setup a model which will input our images into the imagenet network, and output a flattened array based on the output of the network.

In [17]:
# Setup our Model
x = Flatten()(inception.output)
model = Model(input=inception.input, output=x)

  This is separate from the ipykernel package so we can avoid doing imports until


---
### Setup the data for the network

We'll take the same code as above, but refactor it to provide a larger 150 x 150 colour image.

In this case, each pixel value will be a 3 integer array (RGB), we'll also need to reshape the pixel values so that each is a number between 0 - 1, rather than 0 - 255. 

In [18]:
IMAGE_SIZE = 150

# Setup array for images
xs = np.zeros((len(folder.filenames), IMAGE_SIZE, IMAGE_SIZE, 3))

# Setup array for targets
ys = np.array(folder.target)

# Iterate over the loaded folder and format images
for i, file in enumerate(folder.filenames):   
    im = imread(file)
    lowresim = cv2.resize(im, dsize=(IMAGE_SIZE,IMAGE_SIZE),
                          interpolation=cv2.INTER_CUBIC)
    
    reshaped = np.divide(lowresim, 255.0) 
    xs[i] = reshaped

  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)
  " Skipping tag %s" % (size, len(data), tag)


---
### Retrive the Outputs from the Neural Net

We can now feed all of the images in **xs** into our model, and store this in an array called **embeddings**.

Embeddings will be a 2d array, which contains 498 flattened network outputs, each, 18,432 numbers long. 

In [19]:
# Run the images through the neural network
embeddings = model.predict(xs)

# print(embeddings.shape) Expectd: (498, 18432)
# print(embeddings[0])

(498, 18432)
[0.57136744 1.3166616  0.         ... 3.3267553  0.         0.14513326]


---
### Split the training data and testing data

Same as in Part 1, but this time using the embedding results rather than the pixel values. 

In [20]:
# Split training and testing
x_train, x_test, y_train, y_test = train_test_split(embeddings, ys, test_size=0.20, random_state=0)

# print(x_train.shape)

---
### Train the Model and Verify Score

We'll use the same Linear Regression algorithm as in Part 1, and check the score the same.

Expected Output: 0.89

*33% more accurate than the method used in Part 1!*

In [21]:
# Train the model
logisticRegr = LogisticRegression(solver = 'lbfgs')
logisticRegr.fit(x_train, y_train)


# Check the score
score = logisticRegr.score(x_test, y_test)

print(score)

0.89




---
<center><h5> Outcomes </h5></center>
---

Because the embedding from the Neural Network provides a more contextually aware output for the contents of the image (for example, accounting for things like positioning) our result is dramatically improved by catagorizing the objects based on the embedding, rather than on a location dependent pixel value. 

One unfortunate factor to note, the results may be skewed by the nature of ImageNet, as this network includes 'hotdogs' as one of it's 1000 catagories. It also provides no way to exclude this from the weights used.
