<a href="https://colab.research.google.com/github/prachi-khandelwal/Dog-Vision-MultiClassification-Project/blob/master/end_to_end_dog_vision.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🐶End-To-End MultiClass Dog Breed Identification

this Notebook builds an end-to-end multi-class image classifier using TensolrFlow 2.0 and Tensorflow Hub.

## 1. Problem

Identifying the breed of the dog given in the image of a dog.

When I am roaming around in my locality a dog passed and I wanted to Know the breed of the dog.😊😁

## 2. Data

The Data we're using is from Kaggle's Dog Breed identification competition.

https://www.kaggle.com/c/dog-breed-identification/data

## 3. Evaluation

The evaluation is the file with Prediction Probabilitiesfor each dog breed in the image.

https://www.kaggle.com/c/dog-breed-identification/overview/evaluation
## 4. Features

Some Information about the Data:
* We're dealing with images (unstructured Data) so its probably best to use Deep learning /Transfer Learning.
* There are 120 breeds of dogs (this means there are 120 different classes).
* There are around 10K images in Trainng set (These Images have Labels).
* There are around 10k images in Test set (These Images don't have labels).





In [None]:
# Unzip the data into Google Drive.
# !unzip "/content/drive/My Drive/Dog vision/dog-breed-identification.zip" -d "drive/My Drive/Dog vision"

# Get our Workspace Ready!
* Import Tensorflow 2.x ✅
* Import TensorFlow Hub✅
* Make sure we're using a GPU 


Import necessary Tools

In [None]:
# Import TensorFlow into COlAB
import tensorflow as tf
print("Tensorflow version:",tf.__version__)
# Import Tensorflow HUB
import tensorflow_hub as hub
print("Tensorflow hub version:",hub.__version__)

In [None]:
# Check If a GPU available
print("GPU", "available YEP!" if tf.config.list_physical_devices("GPU") else "not available :( ")

## Getting our Data ready! (turning into tensors)
With all ML models our data must be in numerical format. So that's what we're going to do.
Turning our images into tensors(Numerical Representation)

Let's start by accessing our data and checking labels



In [None]:
# Checkout the data labels
import pandas as pd
import numpy as np
labels_csv = pd.read_csv("drive/My Drive/Dog vision/labels.csv")
print(labels_csv.describe())
labels_csv.head()

In [None]:
labels_csv.info()

In [None]:
# Let's see the no. of images of each breed
labels_csv["breed"].value_counts()

In [None]:
# let's visualize it
ax = labels_csv["breed"].value_counts().plot.bar(figsize=(20,10));

In [None]:
# To display the Images
from IPython.display import display, Image
# Image("drive/My Drive/Dog vision/train/001513dfcb2ffafc82cccf4d8bbaba97.jpg")

## Getting Images and their labels
Let's get list of our images file Pathnames

In [None]:

# Create filenames from ID
filenames = ["drive/My Drive/Dog vision/train/" + fname + ".jpg" for fname in labels_csv["id"]]
filenames[:10]

In [None]:
# Checking whether the number of filenames mathcing to the actual images files in train folder
import os
if len(os.listdir("drive/My Drive/Dog vision/train/")) == len(filenames):
  print("Number of files Matched Proceed!")
else:
  print("Files Not Matched Erorr!")
  
  

### Since Now we've got our FilePath ready, let's prepare our labels

In [None]:
labels = labels_csv['breed'].to_numpy()  #to_numpy() converts into numpy array
labels

In [None]:
len(labels)

In [None]:
#let's see if number of labels matches number of filenames
if len(labels) == len(filenames):
  print("No. of Labels Matches No. of Filenames! Proceed.")
else:
  print("Check Again! Labels does't matches Filenames")


In [None]:
# Find the unique label values
unique_breeds = np.unique(labels)
unique_breeds

In [None]:
# turn single label into an array
print(labels[1])
labels[1] == unique_breeds

In [None]:
# Turn every label into a Boolean Array
boolean_labels = [label == unique_breeds for label in labels]
boolean_labels[:2]

In [None]:
len(boolean_labels)

## Turning **Boolean Labels into Integers**

In [None]:
print(labels[1]) #Original Label
print(np.where(unique_breeds == labels[1])) #index where label occur in unique_breeds 
print(boolean_labels[1].argmax()) #argmax returns max value in the array
print(boolean_labels[1].astype(int)) #converts boolean values into Int

## Creating our own Validation set
Since kaggle doesn't provide any validation set.

In [None]:
# Setup X and Y variables
X = filenames
y = boolean_labels

Since Experimenting with 10k images might take long, so we'll experiment with ~1000 images at first & increases as needed.

In [None]:
# Set number of images
NUM_IMAGES = 1000 #@param {type:"slider", min:1000, max:10000, step:1000}

In [None]:
# Let's split our data into train & valid sets
from sklearn.model_selection import train_test_split

# let's split our data into train and valid
X_train, X_valid, y_train, y_valid = train_test_split(X[:NUM_IMAGES],
                                                      y[:NUM_IMAGES],
                                                      test_size=0.2,
                                                      random_state=42)
len(X_train), len(X_valid), len(y_train), len(y_valid)

In [None]:
#Let's peek into our train and test data
X_train[:5], y_train[:2]

## Pre-processing of Image (Turning into Tensors)
To process our images into Tensors we're going to write a function which does few things.
1. Take Image filepath as input.
2. Use Tensorflow to read the image file and save it to a variable, `image`.
3. Turn our `image` (jpg) into Tensors.
4. Normalise our image (convert our color channel values from 0-255 to 0-1). 
5. Resize the image to be a shape of (224,224).
6. Return the modified image.

Before creating function let's peek how a tensor image look like

In [None]:
# Convert single image into numpy array
from matplotlib.pyplot import imread
image = imread(filenames[42])
len(image),image

In [None]:
# Convert single image into tensor
tf.constant(image)

In [None]:
# Define image size
IMG_SIZE = 224

# Create a Function for preprocessing
def preprocess(image_path, img_size=IMG_SIZE):
  """
  Preprocess The Image and convert into tensors.
  """
  # Input the Image file path
  image = tf.io.read_file(image_path)

  # Turn the jpeg image into numerical Tensors with 3 color channel(Red, Green and Blue)
  image = tf.image.decode_jpeg(image, channels=3)

  # normalise the color channel values from 0-255 to 0-1
  image = tf.image.convert_image_dtype(image, tf.float32)

  # Resize the image
  image = tf.image.resize(image,[img_size, img_size])

  return image

## Turning our Data into Batches
Why turn our data into batches?
let's say if your are trying to process 10000 images at one go...

They all might not fit into memory.

So that's why we do 32(batch size) images at a time.

In order to use Tensorflow effectively, we need our data in the form of tensor tuples: `(image, label)`.



In [None]:
# Create a simple function to return a tuple
def get_image_label(image_path, label):
  """
  returns the image and label into tuple form
  """
  image = preprocess(image_path)
  return image, label

In [None]:
(preprocess(X[42]),tf.constant(y[42]))

Now've got a way to turn our data into Tuples of Tensors in the form of `(image, label)`, Let's make a function to create Batches for our data `X` and `y`.

In [None]:
# define the batch size, 32 is good start
BATCH_SIZE = 32

# Create a Function to Turn Data into Batches
def create_data_batches(X, y, batch_size=BATCH_SIZE, valid_data=False, test_data=False):
  """
  Creates Batches of data out of image (X) and label (y) pairs.
  Shuffle the data if it's training data but doesn't shuffle valid data.
  Also accepts test data as input(no labels).
  """
  # If the Data is Test data, we probably don't have labels
  if test_data:
    print("Creating Test Data Batches...")
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X))) #Only filepaths(no label)

    data_batch = data.map(process_image).batch(BATCH_SIZE)

    return data_batch

  # If the Data is Valid data, we don't need to shuffle it
  elif valid_data:
    print("Creating Valid data batches")
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X), #Filepath
                                               tf.constant(y))) #Label

    data_batch = data.map(get_image_label).batch(BATCH_SIZE)
    return data_batch

  # If the Data is Training data, we had to shuffle it before processing Images
  # As it will save Computation Time
  else:
    print("Creating Training Data Batches...")
    # Turn Filepaths and Labels into Tensors
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X), 
                                               tf.constant(y)))
    # Shuffling the pathnames and labels before Mapping image process function
    data = data.shuffle(buffer_size=len(X))

    # Create (image, label) tuples, it also turns the image path into processed image
    data_batch = data.map(get_image_label).batch(BATCH_SIZE)
  return data_batch
    

In [None]:
# Creating Validation and training data BATCHES
train_data = create_data_batches(X_train, y_train)
valid_data = create_data_batches(X_valid, y_valid, valid_data=True)

In [None]:
# Check out the attributes of our data Batches
train_data.element_spec, valid_data.element_spec

## Visualizing Data Batches
Our Data is in batches, this can be a liitle hard to comprehend let's visulaise our data specifically 25 Images. 

In [None]:
import matplotlib.pyplot as plt

# Create a function to visulaise batches of Images
def image_batch_visualise(image, label):
  """
  Displays a batch of 25 image batch with their labels.
  """
  # Set the figsize
  plt.figure(figsize=(10,10))
  #Set the loops to display 25 images
  for i in range(25):
     # set the axis subplots
     ax = plt.subplot(5, 5, i+1)
     # Display image
     plt.imshow(image[i])
     # Set the title over the image using label
     plt.title(unique_breeds[label[i].argmax()],{'color':'white','fontweight':'23'})
     # set the grid off
     plt.axis("off")


Now we've craeted our function to visulise the images but before that we first need to `UNBATCH` the proceesed images to visualise them.

In [None]:
train_images, train_labels = next(train_data.as_numpy_iterator())

In [None]:
len(train_images), len(train_labels)

In [None]:
# Let's visualise Training batch
image_batch_visualise(train_images, train_labels)


In [None]:
# Let's visualise valid batch
valid_images, valid_labels = next(valid_data.as_numpy_iterator())



In [None]:
image_batch_visualise(valid_images, valid_labels)

# Building a Model
Before building a model we need to specify certain things to define.
1. The input shape (our image shape in the form of tensors) to our model.
2. the output shape (our image labels in the form of tensors) to our model.
3. URL of the Model which we're going to use from  Tensorflow Hub.
https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4

In [None]:
# Setup Input shape (Images)
INPUT_SHAPE = [None, IMG_SIZE,IMG_SIZE, 3] #batch, width, height, color channel

# Setup OutPut shape (labels)
OUTPUT_SHAPE = len(unique_breeds)

# URL of TensorFlow Hub
MODEL_URL = "https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4"

Now we've input, output and our model ready to go, Let's put them together using Keras deep learning model!

Knowing this Let's create a model which does following tasks:
* Takes input, output and model we've chosen as parameters.
* Define the layers in Keras model as Sequential manner(do this, then this, then that).
* Complies the model(say evaluate and improved).
* Build the model tell it the input shape it'll be getting.
* Returns the model.





In [None]:
# Create a function which builds a Keras model
def create_model(input_shape=INPUT_SHAPE, output_shape=OUTPUT_SHAPE, model=MODEL_URL):
  print(" Building model with", MODEL_URL)

  #setup the model layers
  model = tf.keras.Sequential([
          hub.KerasLayer(MODEL_URL), #Layer 1 (input Layer)
          tf.keras.layers.Dense(units=OUTPUT_SHAPE,
                                activation='softmax') #Layer 2 output layer
  ])

  # Compile the model
  model.compile(
      loss=tf.keras.losses.CategoricalCrossentropy(),
      optimizer=tf.keras.optimizers.Adam(),
      metrics=["accuracy"]
  )

  # Build the model
  model.build(INPUT_SHAPE)

  return model




In [50]:
model = create_model()
model.summary()

 Building model with https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
keras_layer_1 (KerasLayer)   multiple                  5432713   
_________________________________________________________________
dense_1 (Dense)              multiple                  120240    
Total params: 5,552,953
Trainable params: 120,240
Non-trainable params: 5,432,713
_________________________________________________________________


###  Creating Callbacks
Callback are helper functions used in model training to do alot of things like save the Progress of the model, check it's progress and stop training model if it's stop improving.

We'll create two callbacks for TensorBoard one for saving it's progress and another for preventing model form traing too long.

### TensorBoard Callback

To set up TensorBoard callback, we need to do 3 things..
* Load the TensorBoard extension.
* Create a TensorBoard callback which is able to save logs to a directory and pass it to the `fit()` model's function.
* Visualize our model training logs with the `%tensorboard` magic function

In [51]:
# Load tensorBoard notebook extension
%load_ext tensorboard

In [53]:
import datetime

#Create function to build a tensorBoard callback
def create_tensorboard_callback():
  #Create log directory to store tensorboard logs
  logdir = os.path.join("drive /My drive/Dog vision/logs",
                        #Make it so logs get tracked whenever we run an experiment
                        datetime.datetime.now().strftime("%Y%m%d-%H%M%s"))
                        
  return tf.keras.callbacks.TensorBoard(logdir)

### Early stopping callback

Early stopping callback stops our model from overfitting by stopping training if a certain evaluation metrics stops Improving

In [54]:
# create early stopping callback
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_accuracy",
                                                  patience=3)

## Training a model (on subset data)
 Our first model is only going train 1,000 images, to make sure everything works fine..


In [57]:
NUM_EPOCHS = 100 #@param {type:'slider',min:10, max:100, step:10}


In [60]:
# Check to make sure we'are still running ona  GPU
print("GPU available Bravo!" if tf.config.list_physical_devices('GPU') else "Not available ಠ_ಠ  ")

GPU available Bravo!
