<a href="https://colab.research.google.com/github/kpr-03/DeepLearning_TensorFlow/blob/main/O3_introduction_to_computer__vision_with_tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Convolutional Neural Networks And Computer Vision with TensorFlow.

Computer vision is the practice of writing algorithms which can discover patterns in visual data.Such as the camera of a self- driving car recognizing the car in front.

## Get the data

Because convolutional neural networks work so well with images, to learn more about them, we're going to start with a dataset of images.

The images we're going to work with are from the [Food-101 dataset](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/), a collection of 101 different categories of 101,000 (1000 images per category) real-world images of food dishes.

To begin, we're only going to use two of the categories, pizza 🍕 and steak 🥩 and build a binary classifier.

> 🔑 **Note:** To prepare the data we're using, preprocessing steps such as, moving the images into different subset folders, have been done. To see these preprocessing steps check out [the preprocessing notebook](https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/extras/image_data_modification.ipynb).

We'll download the `pizza_steak` subset .zip file and unzip it.

In [None]:
import zipfile

!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip

In [None]:
# unzip the downloaded file
zip_ref =zipfile.ZipFile("pizza_steak.zip")
zip_ref.extractall()
zip_ref.close()

## Inspect the data

A very crucial step at the beginning of any machine learning project is becoming one with data.

And for a computer vision project...this usually means visualizing many samples of your data


In [None]:
!ls pizza_steak

In [None]:
!ls pizza_steak/train/

In [None]:
!ls pizza_steak/train/steak

In [None]:
import os

# Walk through pizza_steak directory and list number of files
for dirpath, dirnames, filenames in os.walk("pizza_steak"):
  print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

In [None]:
# the extra file in our pizza_steak directory is ".DS_store"
!ls -la pizza_steak

In [None]:
# Another way to find out how many images are in a file
num_steak_images_train= len(os.listdir("pizza_steak/train/steak"))
num_steak_images_train

To visualize our images, first let's get the class names programatically

In [None]:
# Get the class names programatically
import pathlib
import numpy as np
data_dir =pathlib.Path("pizza_steak/train")
class_names = np.array(sorted([item.name for item in data_dir.glob("*")])) # creted a list of class_names from the subdirectory
print(class_names)

In [None]:
# Let's visualize our images
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random

def view_random_image(target_dir,target_class):
  #setup the target directory (we'll view images from here)
  target_folder = target_dir+target_class
  #Get a random image path
  random_image =random.sample(os.listdir(target_folder),1)
  print(random_image)

  # Read in the image and plot it using matplotlib
  img = mpimg.imread(target_folder + "/" + random_image[0])
  plt.imshow(img)
  plt.title(target_class)
  plt.axis("off")

  print(f"image shape : {img.shape}") # show the shape of the image

  return img


In [None]:
# view a random image from training data set
img= view_random_image(target_dir="pizza_steak/train/",
                       target_class="pizza")

In [None]:
import tensorflow as tf
tf.constant(img)

In [None]:
# View the image shape
img.shape # returns width,height,colour channels

In [None]:
# Get all the pixel values between 0 & 1
img/255.

## An End-to-End Example
Let's build a convolutional neural network to find patterns in our images,more specifically we neea a way to :
* Load our images
* Preprocess our images
* Build a CNN to find patterns in our images
* Compile our CNN
* Fit the CNN to our training data

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# set the random seed
tf.random.set_seed(42)

# Preprocessing data(get all of the pixel values between 0 & 1,also called scaling/normalization)
train_datagen = ImageDataGenerator(rescale=1./255)
valid_datagen = ImageDataGenerator(rescale= 1./255)

# Set up path to our data directories
train_dir =  "/content/pizza_steak/train"
test_dir = "pizza_steak/test"

# Import data from directories and turn it into batches
train_data = train_datagen.flow_from_directory(directory=train_dir,
                                               batch_size = 32,
                                               target_size=(224,224),
                                               class_mode = "binary",
                                               seed=42)

valid_data =valid_datagen.flow_from_directory(directory=test_dir,
                                               batch_size = 32,
                                               target_size=(224,224),
                                               class_mode = "binary",
                                               seed=42)

# Build a CNN model(same as the TINY VGG on the CNN explainer website)
model_1= tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=10,
                           kernel_size=3,
                           activation="relu",
                           input_shape=(224,224,3)),
    tf.keras.layers.Conv2D(10,3,activation="relu"),
    tf.keras.layers.MaxPool2D(pool_size=2,
                              padding="valid"),
    tf.keras.layers.Conv2D(10,3,activation="relu"),
    tf.keras.layers.Conv2D(10,3,activation="relu"),
    tf.keras.layers.MaxPool2D(2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1,activation="sigmoid")
])

# compile our CNN
model_1.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit the model
history_1 =model_1.fit(train_data,
                       epochs=5,
                       steps_per_epoch=len(train_data),
                       validation_data=valid_data,
                       validation_steps=len(valid_data))


In [None]:
# Get the model summary
model_1.summary()


## Using the same model as before

The model we're building is from the [Tensorflow Playground](
  https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.001&regularizationRate=0&noise=0&networkShape=4,2&seed=0.46097&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false).

In [None]:
# set random seed
tf.random.set_seed(42)

# Create a model to replicate the Tensorflow Model
model_2= tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(224,224,3)),
    tf.keras.layers.Dense(4,activation="relu"),
    tf.keras.layers.Dense(4,activation="relu"),
    tf.keras.layers.Dense(1,activation="sigmoid")
])

# Compile the model
model_2.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

#Fit the model
history_2= model_2.fit(train_data,
                       epochs=5,
                       steps_per_epoch=len(train_data),
                       validation_data=valid_data,
                       validation_steps=len(valid_data))


In [None]:
#get summary of model_2
model_2.summary()

Despite having 20x more parameters than our CNN(model_1),model_2 performs terribly...let's try to improve it.

In [None]:
#set the random seed
tf.random.set_seed(42)

# create the model(same as above but let's step it up a notch)
model_3 = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(224,224,3)),
    tf.keras.layers.Dense(100,activation="relu"),
    tf.keras.layers.Dense(100,activation="relu"),
    tf.keras.layers.Dense(100,activation="relu"),
    tf.keras.layers.Dense(1,activation="sigmoid")
])

# Compile the model
model_3.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# fit the model

history_3 = model_3.fit(train_data,
                        epochs=5,
                        steps_per_epoch=len(train_data),
                        validation_data=valid_data,
                        validation_steps=len(valid_data))

In [None]:
# Get a summary of model_3
model_3.summary()

🔑 **Note:** You can think of trainable parameters as *patterns a model can learn from data*. Intuitiely, you might think more is better. And in some cases it is. But in this case, the difference here is in the two different styles of model we're using. Where a series of dense layers have a number of different learnable parameters connected to each other and hence a higher number of possible learnable patterns, **a convolutional neural network seeks to sort out and learn the most important patterns in an image**. So even though there are less learnable parameters in our convolutional neural network, these are often more helpful in decphering between different **features** in an image.

## binary classification : Let's break it down
1. Become one with data(visualize,visualize,visualize)
2. Preprocess the data(prepared it for our model,the main step here was scaling/normalizing)
3. Created a model(start with a baseline).
4. Fit the model
5. Evaluate the model
6. Adjust the different parameters and improve the model(try to beat our baseline)
7. Repeat untill satisfied(experiment,experiment,experiment)


## Become one with data

In [None]:
# visualize the data
plt.figure()
plt.subplot(1,2,1)
steak_img = view_random_image("pizza_steak/train/","steak")
plt.subplot(1,2,2)
pizza_img = view_random_image("pizza_steak/train/","pizza")

### 2. Preprocess the data(prepare it for a model)

In [None]:
# Define directory datasets path
train_dir ="pizza_steak/train/"
test_dir = "pizza_steak/test/"

Our next step is to turn data into **batches**...

A batch is a small subset of data,Rather than at all ~10,000 images at one time,a model might only look at 32 at a time.

It does this for couple of reasons:
1. 10,000 images (or more) might not fit into the memory of your processor(GPU).
2. Tryin to learn the patterns in 10,000 images in one hit could result in the model not being able to learn very well.

why 32?

Beacause 32 is good for your health.. https://twitter.com/ylecun/status/989610208497360896?s=20

In [None]:
# create train and test data generators and rescale the data
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1/255.)
test_datagen = ImageDataGenerator(rescale=1/255.)

In [None]:
# Load in our image data from directories and turn them into batches
train_data =train_datagen.flow_from_directory(directory = train_dir, # target directory of images
                                              target_size=(224,224), # target size of images(height,width)
                                              class_mode="binary", # type of data you are working with
                                              batch_size=32) # size of minibatches to load data into
test_data =test_datagen.flow_from_directory(test_dir,
                                            target_size=(224,224),
                                            class_mode="binary",
                                            batch_size=32)


In [None]:
# get a sample of training data batch
images,labels =train_data.next() # get the 'next' batch of images/labels i8n train data
len(images),len(labels)

In [None]:
# how many batches are there
len(train_data)

In [None]:
# Get the first two images
images[:2],images[0].shape

In [None]:
images[7].shape

In [None]:
# view the first batch labels
labels

3. Create a CNN model(start with a baseline)

 A baseline is a relatively simple model or existing result than you setup when beginning a mchine learning experiment and then as you keep experimenting ,you try to beat the baseline.

 🔑**Note:** In deep learning there is almost an infinite amount of architectures you could create.So one of the best ways to get started is to start with something simple and see if ut works on your data and then introduce complexiety as required(e.g look at which current model is performing best in the field of your problem).

In [None]:
# Make the creating of our model  alittle easier
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense,Flatten,Conv2D,MaxPool2D,Activation
from tensorflow.keras import Sequential

In [None]:
# Create the model(this will be our baseline, a 3 layer convolutional network)
model_4 = Sequential([
    Conv2D(filters=10,# filter is the number of sliding windows going across an input(higher=more complex model)
           kernel_size=3,# the size of the sliding window going across an input
           strides=1, # the size of the step the sliding window takes across an input
           padding="valid", # if 'same',output is same as input shape,if "valid",output shape gets compressed
           activation="relu",
           input_shape=(224,224,3)), # input layer(specify input shape)
    Conv2D(10,3,activation="relu"),
    Conv2D(10,3,activation="relu"),
    Flatten(),
    Dense(1,activation="sigmoid") # output layer( working with binary classification so only 1 output neuron)
])

In [None]:
# compile the model
model_4.compile(loss="binary_crossentropy",
                optimizer=Adam(),
                metrics=["accuracy"])

In [None]:
#get the summary of our model_4
model_4.summary()

### 4. Fit the model

In [None]:
# check the lengths of training and test data generators
len(train_data),len(test_data)

In [None]:
# fit the model
history_4 = model_4.fit(train_data, # this is a combination of labels and sample data
                        epochs=5,
                        steps_per_epoch = len(train_data),
                        validation_data =test_data,
                        validation_steps=len(test_data))

In [None]:
model_1.evaluate(test_data)

In [None]:
model_1.summary()

### 5. Evaluating our model
It looks liek our model is learning something ,lket's evaluate it

In [None]:
# Let's plot the training curve's
import pandas as pd
pd.DataFrame(history_4.history).plot(figsize=(10,7));

In [None]:
# Plot the validation and training curves separately
def plot_loss_curves(history):
  """
  Returns spearate loss curves for training and validation metrics.
  """
  loss= history.history["loss"]
  val_loss =history.history["val_loss"]

  accuracy = history.history["accuracy"]
  val_accuracy= history.history["val_accuracy"]

  epochs=range(len(history.history["loss"])) # how many epochs did we run for?

  # Plot loss
  plt.plot(epochs,loss,label="training_loss")
  plt.plot(epochs,val_loss,label="val_loss")
  plt.title("loss")
  plt.xlabel("epochs")
  plt.legend()

  # Plot accuracy
  plt.figure()
  plt.plot(epochs,accuracy,label="training_accuracy")
  plt.plot(epochs,val_accuracy,label="val_accuracy")
  plt.title("accuracy")
  plt.xlabel("epochs")
  plt.legend()

> 🔑**Note:** When a model's **validation loss starts to increase**,it's likely that the model is **overfitting** the training dataset.This means,it's learning the patterns in the training dataset *too well* and thus model's ability to generalize to unseen data will be diminished.

In [None]:
# check out the loss and accuracy of model_4
plot_loss_curves(history_4)

**Note:** Ideally the two loss curves(training and validation) will be very similar to each other (training loass and validation loss decreasing at similar rates),when there are large differences your model may be **overfitting**.

### 6. Adjust the model parameters
Fitting a machine learning model comes in 3 steps:
0. Create a baseline
1. Beat the baseline by overfitting a larger model
2. Reduce overfitting

Ways to induce overfitting:
* Increase the number of conv layers
* Increase the number of conv filters
* Add another dense layer to the output of our flattened layer

Reduce Overfitting:
* Add data augmentation
* Add regularization layers(such as MaxPool2D)
* Add more data...

> 🔑**Note:** Reducing overfitting is also known as **regularization**.

In [None]:
# Create the model (this can be our baseline, a 3 layer Convolutional Neural Network)
model_5 = Sequential([
  Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)),
  MaxPool2D(pool_size=2), # reduce number of features by half
  Conv2D(10, 3, activation='relu'),
  MaxPool2D(),
  Conv2D(10, 3, activation='relu'),
  MaxPool2D(),
  Flatten(),
  Dense(1, activation='sigmoid')
])

In [None]:
# compile the model
model_5.compile(loss="binary_crossentropy",
                optimizer=Adam(),
                metrics=["accuracy"])

In [None]:
# Fit the model
history_5 =model_5.fit(train_data,
                       epochs=5,
                       steps_per_epoch=len(train_data),
                       validation_data=test_data,
                       validation_steps=len(valid_data))

In [None]:
# Get a summary of our model with max pooling
model_5.summary()

In [None]:
# Plot loss curves
plot_loss_curves(history_5)

### Opening bour bag of tricks and finding data augmentation

In [None]:
# Create ImageDataGenerator training instance with data augmentation
train_datagen_augmented = ImageDataGenerator(rescale=1/255.,
                                             rotation_range=0.2, # how much do you want to rotate an image?
                                             shear_range=0.2,# how much do you want to shear an image?
                                             zoom_range=0.2,# zoom in randomly on an image
                                             width_shift_range=0.2, # Move your data to x-axis
                                             height_shift_range=0.3, # move your image around y-axis
                                             horizontal_flip=True) # Do you want to flip your image?

# Create ImageDataGenerator without data augmentation
train_datagen =ImageDataGenerator(rescale=1/255.)

# Create ImageDataGenerator without data augmentation nfor the test dataset

test_datagen = ImageDataGenerator(rescale=1/255.)

> ❓**Question:** what is data augmentation?
Data augmentation is the process of altering our training data ,leading it to have more density and in turn allowing our models to learn  more generalizable(hopefully) patterns.Altering might mean adjusting the rotation of an image,flippinfg it,cropping it or something similar.

Let's write some code to visualize data augmentation..

In [None]:
# Import data and augment it from training directory
print("Augmented training data:")
train_data_augmented = train_datagen_augmented.flow_from_directory(train_dir,
                                                                   target_size=(224,224),
                                                                   batch_size=32,
                                                                   class_mode="binary",
                                                                   shuffle=False) # for demonstration purposes only
# create non-augmented train data batches
print("Non-augmented training data:")
train_data = train_datagen.flow_from_directory(train_dir,
                                               target_size=(224,224),
                                               batch_size=32,
                                               class_mode="binary",
                                               shuffle =False)
# create non-augmented test data batches
print("Non-augmented test data:")
test_data = test_datagen.flow_from_directory(test_dir,
                                             target_size=(224,224),
                                             batch_size=32,
                                             class_mode="binary")

🔑**Note:** Data augmentation is usually only performed on the training data.Using `ImageDataGenerator` built-in data augmentation parameters our images are left as they are in the directories but modeified as they're loaded into the model.

Finally...let's visualize some augmented data!!!

In [None]:
# Get sample data batches
images,labels = train_data.next()
augmented_images , augmented_labels = train_data_augmented.next() # note: labels aren't augmented...only data(images)


In [None]:
# show the original image and augmented image
import random
random_number = random.randint(0,32)# our batch size are 32...
print(f"showing image number: {random_number}")
plt.imshow(images[random_number])
plt.title(f"Original image")
plt.axis(False)
plt.figure()
plt.imshow(augmented_images[random_number])
plt.title(f"Augmented image")
plt.axis(False);


Now we've seen what augmented training data looks lie,let's build a model and see how it works with same model as before

In [None]:
# create the model(same as model_5)
model_6= Sequential([
    Conv2D(10,3,activation="relu",input_shape=(224,224,3)),
    MaxPool2D(pool_size=2),# reducr number of features by half
    Conv2D(10,3,activation="relu"),
    MaxPool2D(),
    Conv2D(10,3,activation="relu"),
    MaxPool2D(),
    Flatten(),
    Dense(1,activation="sigmoid")
])

# compile the model
model_6.compile(loss="binary_crossentropy",
                optimizer=Adam(),
                metrics=["accuracy"])

# Fit the model
history_6=model_6.fit(train_data_augmented,
                      epochs=5,
                      steps_per_epoch=len(train_data_augmented),
                      validation_data=test_data,
                      validation_steps=len(test_data))

In [None]:
# Check our model training curves
plot_loss_curves(history_6)

Let's shuffle our augmented training data and train another model(the same as before) on it and see what happens.

In [None]:
# Import data and augment it and shuffle from training directory
train_data_augmented_shuffled = train_datagen_augmented.flow_from_directory(train_dir,
                                                                            target_size=(224,224),
                                                                            class_mode="binary",
                                                                            batch_size=32,
                                                                            shuffle=True) # shuffle data this time


In [None]:
# Create the model(same as model_5 and model_6)
model_7 = Sequential([
    Conv2D(10,3,activation="relu",input_shape=(224,224,3)),
    MaxPool2D(),
    Conv2D(10,3,activation="relu"),
    MaxPool2D(),
    Conv2D(10,3,activation = "relu"),
    MaxPool2D(),
    Flatten(),
    Dense(1,activation="sigmoid")
])

# compile the model
model_7.compile(loss="binary_crossentropy",
                optimizer=Adam(),
                metrics=["accuracy"])

# Fit the model
history_7 = model_7.fit(train_data_augmented_shuffled, # now the augmented data is shuffled
                        epochs=5,
                        steps_per_epoch=len(train_data_augmented_shuffled),
                        validation_data= test_data,
                        validation_steps=len(test_data))

In [None]:
# check model's performance history training on augmented data
plot_loss_curves(history_7)

### 7. Repeat untill satisfied
Since we've already beaten our baseline,there are a few things we could try to continue to improve our model:
* Increase the number of model layers(e.g. add more `Conv2D`/`MaxPool2D` layers)
* Increase the number of filters in each convolutional layer(e.e 10 to 32 even 64)
* Train for longer
* Find an ideal rate
* Get more data (given the model more oppurtunities to learn)
* Use **tranfer learning** to leverage what another model has learnt and adjust it for our own use case.




## Making a prediction with our tarined model on our custom data


In [None]:
# classes we're working with
print(class_names)

In [None]:
# View our example image
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg
steak = mpimg.imread("03-steak.jpeg")
plt.imshow(steak)
plt.axis(False);

In [None]:
# check the shape of our image
steak.shape

In [None]:
steak

> 🔑**Note:**when you train a neural network and you want to make prediction with it on your own custom data,it's important that your custom data(or new data) is preprocessed into the same format as tha data your model is trained on

In [None]:
# create a function to import and reshape the image to be able to be used with our model
def load_and_prep_image(filename,img_shape=224):
  """
  Reads an image from filename,turns it into a tensor and reshapes it to (img_shape,img,shape,colour_channels)
  """
  # read in the image
  img = tf.io.read_file(filename)

  # decode the read fiel into a tensor
  img = tf.image.decode_image(img)

  # # resize the image
  img = tf.image.resize(img,size=[img_shape,img_shape])

  # rescale the image(get all values between 0 & 1)
  img = img/255.

  return img


In [None]:
# load in and preprocess our custyom image
steak = load_and_prep_image("03-steak.jpeg")
steak

In [None]:
pred =model_7.predict(tf.expand_dims(steak,axis=0))

Looks like our custom image is being put through our model,however,it currently outputs a prediction probability,wouldn't it be nice if we could visualize the image as well as the model's prediction?

In [None]:
# remind ourselves of our class names
class_names

In [None]:
# we can index the predicted class by rounding the prediction probability and indexing it on the class names
pred_class = class_names[int(tf.round(pred))]
pred_class

In [None]:
def pred_and_plot(model,filename,class_names=class_names):
  """
  Imports an image located at filename,makes a prediction with model and
  plots the image with the predicted class as the title
  """
  # Import the target image and preprocess it
  img = load_and_prep_image(filename)

  # Make a prediction
  pred = model.predict(tf.expand_dims(img,axis=0))

  # Get the predicted class
  pred_class = class_names[int(tf.round(pred))]

  # plot the image and predicted class
  plt.imshow(img)
  plt.title(f"Prediction:{pred_class}")
  plt.axis(False);


In [None]:
# test our model on a custom image
pred_and_plot(model_7,"03-steak.jpeg")

our model works! let's try it on another image..this time pizza

In [None]:
# Download another test image and make a prediction on it
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg
pred_and_plot(model_7, "03-pizza-dad.jpeg", class_names)

## Multi-class Image Classification

We've just been through a bunch of following steps with a binary classification problem(pizz vs. steak),now we're going to step things up in notch with 10 classes of food( multi-class classification).

1. Become one with data
2. Preprocess the data(get it ready for model)
3. Create a model(start with baseline)
4. Fit the model(overfit it to make it sure it works)
5. Evaluate the model
6. Adjust different hyperparameters and improve the model(try to beat baseline/reduce overfitting)
7. Repeat untill satisfied

## 1. Import and Become one with data

In [None]:
import zipfile

# Download zip file of 10_food_classes images
# See how this data was created - https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/extras/image_data_modification.ipynb
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip

# Unzip the downloaded file
zip_ref = zipfile.ZipFile("10_food_classes_all_data.zip", "r")
zip_ref.extractall()
zip_ref.close()

In [None]:
import os

# walk through 10 claasses of food image data
for dirpath,dirnames,filenames in os.walk("10_food_classes_all_data"):
  print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

In [None]:
!ls -la 10_food_classes_all_data/

In [None]:
# setup the train and test directories
train_dir ="10_food_classes_all_data/train/"
test_dir = "10_food_classes_all_data/test"

In [None]:
# Let's get the class names
import pathlib
import numpy as np
data_dir =pathlib.Path(train_dir)
class_names = np.array(sorted([item.name for item in data_dir.glob('*')]))
print(class_names)

In [None]:
# visualize visualize visualize
import random
img = view_random_image(target_dir = train_dir,
                        target_class =random.choice(class_names))

### 2. Preprocess the data(prepare it for a model)

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Rescale
train_datagen = ImageDataGenerator(rescale=1/255.)
test_datagen = ImageDataGenerator(rescale=1/255.)

# Load data in form of directoriesand turn it into batches
train_data = train_datagen.flow_from_directory(train_dir,
                                               target_size = (224,224),
                                               batch_size =32,
                                               class_mode = "categorical")

test_data = test_datagen.flow_from_directory(test_dir,
                                             target_size = (224,224),
                                             batch_size= 32,
                                             class_mode = "categorical")


### 3. Create a model(start ewith baseline)

We've been talking lot about [CNN Explainer](https://https://poloclub.github.io/cnn-explainer/) website....how about we just take their model(also on 10 classes) and use it for our problem

We can use the same model (TinyVGG) we used for the binary classification problem for our multi-class classification problem with a couple of small tweaks.

Namely:
* Changing the output layer to use have 10 ouput neurons (the same number as the number of classes we have).
* Changing the output layer to use `'softmax'` activation instead of `'sigmoid'` activation.
* Changing the loss function to be `'categorical_crossentropy'` instead of `'binary_crossentropy'`.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense

# Create our model (a clone of model_8, except to be multi-class)
model_9 = Sequential([
  Conv2D(10, 3, activation='relu', input_shape=(224, 224, 3)),
  Conv2D(10, 3, activation='relu'),
  MaxPool2D(),
  Conv2D(10, 3, activation='relu'),
  Conv2D(10, 3, activation='relu'),
  MaxPool2D(),
  Flatten(),
  Dense(10, activation='softmax') # changed to have 10 neurons (same as number of classes) and 'softmax' activation
])

# Compile the model
model_9.compile(loss="categorical_crossentropy", # changed to categorical_crossentropy
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

### 4. Fit a model

Now we've got a model suited for working with multiple classes,let's fir ir to our data.

In [None]:
# Fit the model
history_9 = model_9.fit(train_data,# now 10 different classes
                        epochs=5,
                        steps_per_epoch=len(train_data),
                        validation_data = test_data,
                        validation_steps=len(test_data))

### 5. Evaluate a model

We've just trained a model on 10 different classes of food images,let's see how it went

In [None]:
# Evaluate on the test data
model_9.evaluate(test_data)

In [None]:
# check out the model's loss curve on the 10 classes of  data
plot_loss_curves(history_9)

Woah, that's quite the gap between the training and validation loss curves.

What does this tell us?

It seems our model is **overfitting** the training set quite badly. In other words, it's getting great results on the training data but fails to generalize well to unseen data and performs poorly on the test data.

### 6. Adjust the model parameters

Due to its performance on the training data, it's clear our model is learning something. However, performing well on the training data is like going well in the classroom but failing to use your skills in real life.

Ideally, we'd like our model to perform as well on the test data as it does on the training data.

So our next steps will be to try and prevent our model overfitting. A couple of ways to prevent overfitting include:

- **Get more data** - Having more data gives the model more opportunities to learn patterns, patterns which may be more generalizable to new examples.
- **Simplify model** - If the current model is already overfitting the training data, it may be too complicated of a model. This means it's learning the patterns of the data too well and isn't able to generalize well to unseen data. One way to simplify a model is to reduce the number of layers it uses or to reduce the number of hidden units in each layer.
- **Use data augmentation** - Data augmentation manipulates the training data in a way so that's harder for the model to learn as it artificially adds more variety to the data. If a model is able to learn patterns in augmented data, the model may be able to generalize better to unseen data.
- **Use transfer learning** - Transfer learning involves leverages the patterns (also called pretrained weights) one model has learned to use as the foundation for your own task. In our case, we could use one computer vision model pretrained on a large variety of images and then tweak it slightly to be more specialized for food images.

> 🔑 **Note:** Preventing overfitting is also referred to as **regularization**.

If you've already got an existing dataset, you're probably most likely to try one or a combination of the last three above options first.

Since collecting more data would involve us manually taking more images of food, let's try the ones we can do from right within the notebook.

How about we simplify our model first?

To do so, we'll remove two of the convolutional layers, taking the total number of convolutional layers from four to two.



In [None]:
# TRY simplified model( remove two layers)
model_10 = Sequential([
    Conv2D(10,3,activation="relu",input_shape=(224,224,3)),
    MaxPool2D(),
    Conv2D(10,3,activation="relu"),
    MaxPool2D(),
    Flatten(),
    Dense(10,activation="softmax")
])

model_10.compile(loss="categorical_crossentropy",
                 optimizer=tf.keras.optimizers.Adam(),
                 metrics=["accuracy"])

history_10 = model_10.fit(train_data,
                          epochs=5,
                          steps_per_epoch = len(train_data),
                          validation_data=test_data,
                          validation_steps=len(test_data))

In [None]:
# check out loss curves of model_10
plot_loss_curves(history_10)

Hmm... even with a simplifed model, it looks like our model is still dramatically overfitting the training data.

What else could we try?

How about **data augmentation**?

Data augmentation makes it harder for the model to learn on the training data and in turn, hopefully making the patterns it learns more generalizable to unseen data.

To create augmented data, we'll recreate a new [`ImageDataGenerator`](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) instance, this time adding some parameters such as `rotation_range` and `horizontal_flip` to manipulate our images.

In [None]:
# Create augmented data generator instance
train_datagen_augmented= ImageDataGenerator(rescale=1/255.,
                                            rotation_range=20, # note: this is  int not float
                                            width_shift_range=0.2,
                                            height_shift_range=0.2,
                                            zoom_range=0.2,horizontal_flip=True)

train_data_augmented= train_datagen_augmented.flow_from_directory(train_dir,
                                                                  target_size=(224,224),
                                                                  batch_size=32,
                                                                  class_mode="categorical")

Now we've got augmented data, let's see how it works with the same model as before (`model_10`).

Rather than rewrite the model from scratch, we can clone it using a handy function in TensorFlow called [`clone_model`](https://www.tensorflow.org/api_docs/python/tf/keras/models/clone_model) which can take an existing model and rebuild it in the same format.

The cloned version will not include any of the weights (patterns) the original model has learned. So when we train it, it'll be like training a model from scratch.
> 🔑 **Note:** One of the key practices in deep learning and machine learning in general is to **be a serial experimenter**. That's what we're doing here. Trying something, seeing if it works, then trying something else. A good experiment setup also keeps track of the things you change, for example, that's why we're using the same model as before but with different data. The model stays the same but the data changes, this will let us know if augmented training data has any influence over performance.

In [None]:
# clone the model(use the same architecture)
model_11 = tf.keras.models.clone_model(model_10)

# Compile the cloned model (same setup as used for model_10)
model_11.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
              metrics=["accuracy"])

# Fit the model
history_11 = model_11.fit(train_data_augmented, # use augmented data
                          epochs=5,
                          steps_per_epoch=len(train_data_augmented),
                          validation_data=test_data,
                          validation_steps=len(test_data))

You can see it each epoch takes longer than the previous model. This is because our data is being augmented on the fly on the CPU as it gets loaded onto the GPU, in turn, increasing the amount of time between each epoch.

> **Note:** One way to improve this time taken is to use augmentation layers directly as part of the model. For example, with [`tf.keras.layers.RandomFlip`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip). You can also speed up data loading with the newer [`tf.keras.utils.image_dataset_from_directory`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory) image loading API (we cover this later in the course).

How do our model's training curves look?

In [None]:
# checjk out our model's performance with augmented data
plot_loss_curves(history_11)

That's looking much better, the loss curves are much closer to eachother. Although our model didn't perform as well on the augmented training set, it performed much better on the validation dataset.

It even looks like if we kept it training for longer (more epochs) the evaluation metrics might continue to improve.

### 7. Repeat until satisfied

We could keep going here. Restructuring our model's architecture, adding more layers, trying it out, adjusting the learning rate, trying it out, trying different methods of data augmentation, training for longer. But as you could image, this could take a fairly long time.

Good thing there's still one trick we haven't tried yet and that's **transfer learning**.

However, we'll save that for the next notebook where you'll see how rather than design our own models from scratch we leverage the patterns another model has learned for our own task.

In the meantime, let's make a prediction with our trained multi-class model.

## Making a prediction with our trained model

What good is a model if you can't make predictions with it?

Let's first remind ourselves of the classes our multi-class model has been trained on and then we'll download some of own custom images to work with.

In [None]:
# what classes has our model been trained on?
class_names

Beautiful, now let's get some of our custom images.

If you're using Google Colab, you could also upload some of your own images via the files tab.

In [None]:
# Download some custom images
# -q is for "quiet"
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-hamburger.jpeg
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-sushi.jpeg

In [None]:
def pred_and_plot(model,filename,class_names=class_names):
  """
  Imports an image located at filename,makes a prediction with model and
  plots the image with the predicted class as the title
  """
  # Import the target image and preprocess it
  img = load_and_prep_image(filename)

  # Make a prediction
  pred = model.predict(tf.expand_dims(img,axis=0))

  #print(len(pred[0]))
 # print(tf.argmax(pred))

  # Add in logic for multi-class & get pred_class name
  if len(pred[0])>1:
    pred_class= class_names[tf.argmax(pred[0])]
  else:
    pred_class= class_names[int(tf.argmax(pred[0]))]

  # Get the predicted class
  #pred_class = class_names[int(tf.round(pred))]

  # plot the image and predicted class
  plt.imshow(img)
  plt.title(f"Prediction:{pred_class}")
  plt.axis(False);

In [None]:
# make our prediction using model_10
pred_and_plot(model=model_10,
              filename="03-pizza-dad.jpeg",
              class_names=class_names)

In [None]:
# make our prediction using model_10
pred_and_plot(model=model_10,
              filename="03-steak.jpeg",
              class_names=class_names)

In [None]:
# make our prediction using model_10
pred_and_plot(model=model_10,
              filename="03-sushi.jpeg",
              class_names=class_names)

In [None]:
# make our prediction using model_10
pred_and_plot(model=model_10,
              filename="03-hamburger.jpeg",
              class_names=class_names)

# saving and loading a model

In [None]:
# save a model
model_10.save("saved_trained_model_10")

In [None]:
# load in a trained model and evaluate
loaded_model_10=tf.keras.models.load_model("saved_trained_model_10")
loaded_model_10.evaluate(test_data)

In [None]:
# compare our loaded model to our existing model
model_10.evaluate(test_data)