## Classification of Automobile parts

This notebook builds an end-end multiclass image classifier using Tensorflow 2.0 and TensorFlow Hub.

## 1. Problem
  Identifying the automobile parts given an image.

## 2.The data we are using is from `Kaggle`
https://www.kaggle.com/datasets/mdwaquarazam/automobilepartsindentification

## 3. Evaluation
For each image in the test set, you must predict a probability for each of the different breeds. The evaluation is the file with the prediction probabilities with the image of each dog of each test image.

## 4.Features
there are 689 images in the dataset and we will use 50% of the image to train the model.these data is having a label. Test images are not having any labels.

### Get our workspace ready

In [None]:
# Import Tensorflow into Colab 
# Import Tensorflow Hub
# Make sure we are using GPU.
# Importing necessary tools

In [None]:
import tensorflow as tf
import tensorflow_hub as hub
print("TensorFlow Version is : " ,tf.__version__)
print("TensorFlow Hub Version: ", hub.__version__)

In [None]:
# Check for GPU Availability
print("GPU is available (YESSSSSSSS !!!!!)" if tf.config.list_physical_devices("GPU") else "Not available")

## Getting our data ready , turing them into **Tensors**.

In [None]:
#Checkout the labels of our data.
import pandas as pd
data_dir ='../input/automobilepartsindentification/Automobile-parts,.Typecast.csv'
labels = pd.read_csv(data_dir)
print(labels.describe)

In [None]:
#How many images we have per Automobile category.
labels.Type.value_counts().plot.bar(figsize = (30,20), color ='SeaGreen')


In [None]:
labels.Type.value_counts().median()

In [None]:
# Lets view images of each category in next few line items
from IPython.display import Image


## Getting images and all their labels
Lets get the images and all their labels

In [None]:
labels.head()

In [None]:
# Create pathnames from Image ID's
filename = ['../input/automobilepartsindentification/Automobile-parts/Train/'+fname for fname in labels["ID"]]

In [None]:
## Check whther number of filename matches with actual image files
import os
if len(os.listdir("'../input/automobilepartsindentification/Automobile-parts'/Train")) == len(filename):
  print("File names match actual amount of files, proceed!!!")
else:
   print("File names does not match actual amount of files, check the target directory!!")

In [None]:
Image(filename[120])

### We have got our training images filepaths in a list, let's prepare our labels

In [None]:
import numpy as np
labelnames = labels["Type"].to_numpy()
#labelnames = np.array(labelnames) ## does same thing as above

In [None]:
## See if any number of labels matches the number of file names
if len(labelnames) == len(filename):
  print("Number of labels and number of filenames are matching, Proceed!!!")
else:
  print("Number of labels and number of filenames are not matching, please check the labels")

In [None]:
#Find Unique label values
unique_labels = np.unique(labelnames)
unique_labels

In [None]:
labelnames[0], unique_labels[0]

In [None]:
# Turn a single label into an array of booleans
print(labelnames[0])
labelnames[0] == unique_labels


In [None]:
# turn every label into Boolean array
boolean_labels = [label == unique_labels for label in labelnames]
len(boolean_labels)
print(boolean_labels)

In [None]:
# Turing Boolean Array into integer
print(labelnames[300]) # Original label
print(np.where(unique_labels == labelnames[1])) # Index where label occurs
print(boolean_labels[300].argmax()) # Index where label occurs in boolean array
print(boolean_labels[300].astype(int)) # There will be a 1 where the sample label occurs

## Creating our own validation set
### we are creating our own validation set as data from Kaggle does not come with validation

In [None]:
# Setup X and y
X = filename
y = boolean_labels

In [None]:
len(X), len(y)

### We are starting with 40% of the training data set and will increase as needed.


In [None]:
NUM_IMAGES = 270 #@param {type:"slider", min:150, max:350, step:30}

In [None]:
# Let's split the data with training and validation dataset.
from sklearn.model_selection import train_test_split
# Split them into training and validation of total size of NUM_IMAGE
X_train, X_val, y_train, y_val = train_test_split(X[:NUM_IMAGES], y[:NUM_IMAGES],
                                                  test_size =0.1,
                                                  random_state =42)
len(X_train), len(y_train) , len(X_val), len(y_val)

In [None]:
y_train[:2], y_val[:2]

In [None]:
X_train[:2], y_train[:2]

## Preprocessing Images - Turning them into **Tensors**

To preprocess image into Tensors, we are going to write a function which does following things:
* Take an image file as input
* Use Tensorflow to read the file save in the variable called `Image`
* Turn our image into Tensor ( we will convert into PNG).
* Resize the image to be the shape of 224,224
* Return the modified Image.


Lets see what importing an image looks like.

In [None]:
# Convert an image to a numby array
from matplotlib.pyplot import imread
image = imread(filename[300])
image.shape

In [None]:
image[::2]

In [None]:
# Turn image into tensor
tf.constant(image)

In [None]:
# Define size 
IMG_SIZE = 224
# Create a function for pre-processing image
def process_image(image_path, image_size = IMG_SIZE):
  """
  Takes an image path and turns that image to Tensor
  """
  #Read in an image file
  image = tf.io.read_file(image_path)

  #Turn the image to PNG formated numerical Tensor with 3 color channels(Red, Green and Blue)
  image = tf.image.decode_png(image, channels =3)

  # Convert the color channel values from 0-255 to 0-1
  image = tf.image.convert_image_dtype(image, tf.float32)

  #Resize the image to our desired value (224, 224)

  image = tf.image.resize(image, size = [IMG_SIZE, IMG_SIZE])
  return image


## Turning data into batches


#### We need to turn all the data into data batches. we will make a batch of 30 images at a time to avoid any utilization issue and will convert them into batches.
In order to use tensor flow effectively,we need our data to be in the form of tensor tuples which may look like `(image, label)`

In [None]:
# Create a simple function to return tuple
def get_image_label(image_path, label):
  """
  Takes an image path and associated label, process them and returns a tupe of format ( image, label)
  """
  image = process_image(image_path)
  return image, label


In [None]:
# demo of the above:
process_image(X[300]), tf.constant(y[300])

## Now we have got a way to turn our data into Tensors in the form ( image, label). Let's make a function to turn all the data ( X , y) into batches

In [None]:
# Define the batch size as 30
BATCH_SIZE = 30

#Create a function to turn data into batches
def create_data_batches(X, y= None, batch_size = BATCH_SIZE, valid_data = False, test_data = False ):
  """
  Create a data of batches out of image (X) and label (y) pairs.
  Shuffles the data if it is training data, does not shuffle it if it is not valid data
  Also, accepts test data as input.
  """
  # if the data set is test data set, then probably w don't have labels.
  if test_data:
    print("Creating test data batches......")
    data = tf.data.Dataset.from_tensor_slices(tf.constant(X)) # Only file paths, no labels.
    data_batch = data.map(process_image).batch(BATCH_SIZE)
    return data_batch
  #if the dataset is a valid dataset, we don't need to shuffle it.
  elif valid_data:
    print("Creating valid data batches....")
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X),
                                                      tf.constant(y)))
    data_batch=data.map(get_image_label).batch(BATCH_SIZE)
    return data_batch
  else:
    print("Creating Training Data batches ...........")  
    # Turn file paths and labels into Tensors.
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X),
                                                      tf.constant(y)))

    #Shuffling the pathnames and labels before mapping image proceesor functions is faster than shuffling image
    data= data.shuffle(buffer_size = len(X))

    # Create (image, label) tuple and it also turns the image path into a preprocessing image
    data = data.map(get_image_label)

    #turn training data into batches
    data_batch = data.batch(BATCH_SIZE)
    return data_batch

In [None]:
#Create training and validation data batches
train_data = create_data_batches(X_train, y_train)
val_data = create_data_batches(X_val, y_val , valid_data = True)

In [None]:
train_data, val_data

In [None]:
#Visualizing different aspect of data batches
train_data.element_spec

### Our training data is in batches now, hence it is diffcult to understand,let's visualize it.

In [None]:
import matplotlib.pyplot as plt
#create a function to view 30 images as batch
def show_30_images(images,labels):
  """
    Shows 30 images along with their labels
  """
  #Setup figure
  plt.figure(figsize=(20,20))
  #Loop through 30 to show 30 images
  for i in range(25):
    #Create subplots
    ax = plt.subplot(5,5, i+1)
    #Display image
    plt.imshow(images[i])
    plt.title(unique_labels[labels[i].argmax()])
    #turn gridlines off
    plt.axis("off")

In [None]:
#Checking if any file is not supported image type
from pathlib import Path
import imghdr

data_dir = "drive/MyDrive/Automobile/Train/"
image_extensions = [".png", ".jpg"]  # add there all your images file extensions

img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in image_extensions:
        img_type = imghdr.what(filepath)
        if img_type is None:
            print(f"{filepath} is not an image")
        elif img_type not in img_type_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow")

In [None]:
#Lets visualize the data in training batch
train_images, train_labels = next(train_data.as_numpy_iterator())
show_30_images(train_images, train_labels)

In [None]:
# Now let's visualize our validation set
val_images, val_label = next(val_data.as_numpy_iterator())
show_30_images(val_images,val_label)

### Build a model.
We need to define below mentioned information before building a model.

1) The Input shape ( shape of our images in form of Tensors).

2) The Output shape( Image labels in the form of Tensor) of our model.

3) The URL model which we want to use from Tensorflow Hub https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5


In [None]:
#Setup Input shape for the model.
INPUT_SHAPE = [None, IMG_SIZE, IMG_SIZE, 3] #[BATCH, HEIGHT, WIDTH, COLOR]
OUTPUT_SHAPE = len(unique_labels)


In [None]:
#Setup model URL from TensorFlow HUB
MODEL_URL ="https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5"

### Now we have received the input, output and the model ready to go, let's put them together in a keras deep learning. Let's create a function.

* Create a function providing input,output and model.
* define the layers in Keras model in sequantial function.
* Compile the model ( it should be evaluated and improved).
* Returns the model. Steps can be found in https://www.tensorflow.org/guide/keras/sequential_model 


In [None]:
#Creates a function which builds a Keras model.
def create_model(input_shape = INPUT_SHAPE, Output_shape = OUTPUT_SHAPE, model_url = MODEL_URL):
  """
  Create a Tensorflow Keras Sequantial model
  """
  print(f"Building model with: {MODEL_URL}")

  #Setup model layers:
  model = tf.keras.Sequential(
      [
            hub.KerasLayer(MODEL_URL), #1st Layer (Input Layer)
            tf.keras.layers.Dense(units = OUTPUT_SHAPE,
                                  activation = "softmax")]) #2nd Layer(Output Layer)
  #Compile the model
  model.compile(loss = tf.keras.losses.CategoricalCrossentropy(),
                 optimizer = tf.keras.optimizers.Adam(),
                 metrics =["accuracy"])
   
  #Build model.
  model.build(INPUT_SHAPE)
  return model



In [None]:
model = create_model()
model.summary()

### Creating call backs.
call backs are helper functions a model can use during training to save its progress/ stop training if the model stops improving.
we will create 2 call back functions ( 1 for Tensorboards) which will help to track the progress and other for early stopping for preventing our model from training for too long. 

### Tensorboard callback
To load Tensorboard call back, we need to perform 3 things:
* Load TensorBoard notebook extension.
* Create a Tensorboard callback which will save logs to a directory and pass it our models `fit()` function.
* Visualize our model's training logs with `%tensorboard` magic function

In [None]:
#import Tensorboard magic function
%load_ext tensorboard


In [None]:
import datetime
#Create a Tensorboard callback which will save logs to a directory and pass it our models fit() function.
def create_tensorboard_callback():
  #Create a log directory to save tensorboard logs
  logdir = os.path.join("drive/MyDrive/Automobile/logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
  return tf.keras.callbacks.TensorBoard(log_dir = logdir)
  

### Create a function to build a early stopping callback.
Early stoppingn helps our model from overfitting by stopping training if certain evaluation metrics is stopped.
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping

In [None]:
# Create early stopping callback.
early_stopping = tf.keras.callbacks.EarlyStopping(monitor = 'val_accuracy',
                                                  patience =3)


#Training a model on a subset of data.
Our 1st model is going to train only few images to make sure everything is working.

In [None]:
NUM_EPOCHS = 100 #@param {type:"slider",min:10,max:100, step:10}


In [None]:
print("GPU is available") if tf.config.list_physical_devices("GPU") else print("Not Available")

### Lets create model which trains a model
* Create a model using `create_model`
* Setup a tensorboard callback using `create_tensorboard_callback()`
* call the `fit()`on our model passing it to the training data,validation data,numer of epochs to train under `NUM_EPOCHS` and the callbacks we would like to use.
* Return the model.

In [None]:
# Let's create , build and train a model.
def train_model():
  """
    Trains a model and returns it
  """
  #Create a model.
  model = create_model()

  # Create a new tensorboard session everytime we train a model.
  tensorboard = create_tensorboard_callback()
  #Fit the model to the data passing it to the callbacks we created.
  model.fit(x = train_data,
            epochs = NUM_EPOCHS,
            validation_data = val_data,
            validation_freq = 1,
            callbacks = [tensorboard, early_stopping])
  #Return the fitted model.
  return model

In [None]:
model= train_model()

In [None]:
# Make predictions on the validation dataset.
predictions = model.predict(val_data, verbose =1)
predictions

In [None]:
predictions.shape

In [None]:
predictions[2]

In [None]:
np.sum(predictions[2])

In [None]:
# first prediction
index = 20
print(predictions[index])
print(f"Max value ( Probability of predictions): {np.max(predictions[index])}")
print(f"Sum:{np.sum(predictions[index])}")
print(f"Max Index : {np.argmax(predictions[index])}")
print(f"Predicted Label: {unique_labels[np.argmax(predictions[index])]}")


In [None]:
# Turn prediction probabilities into their respective labels
def get_pred_labels(prediction_probabilities):
  """
  Turns an array of prediction probabilities into labels
  """
  return unique_labels[np.argmax(prediction_probabilities)]

In [None]:
# Get a predicted label based on array of prediction probabilities
pred_label = get_pred_labels(predictions[20])
pred_label

In [None]:
val_data

### Validation data set is in batches, we need to unbatch them to make predictions on validation data set.

In [None]:
# Create a function to unbatch a batched Training and Validation dataset.
def unbatchify(data):
  images_ =[]
  labels_ =[]
  #Loop through the unbatched data.
  for image, label in data.unbatch().as_numpy_iterator():
    images_.append(image)
    labels_.append(unique_labels[np.argmax(label)])
  return images_, labels_

In [None]:
# Create a function to unbatch a batched Test dataset.
def unbatchify_test(data):
  images_= []
  for image in data.unbatch().as_numpy_iterator():
    images_.append(image)
  return images_


In [None]:
# Unbatchify the validation dataset
val_images, val_label = unbatchify(val_data)

In [None]:
val_images[0] , val_label[0]

In [None]:
get_pred_labels(val_label[0])

### Now we have ways to get:
* Prediction Labels
* Validation Labels ( Truth Labels)
* Validation Images
Let's make a function to make them more visualizing, we will create a function which will do the following
* Taken an array of prediction possibilities, array of truth label, an array of image and an integer.
* Convert the prediction possibilities into the predicted label.
* Plot the predicted label, its predicted possibilities, the truth label and the target image in a single plot.


In [None]:
def plot_pred(prediction_possibilities, labels, images, n=26):
  """
  View predicted label, its predicted possibilities, the truth label and the target image in a single plot
  """
  pred_prob, true_label, true_image = prediction_possibilities[n], labels[n], images[n]
  # get the predicted label
  pred_label = get_pred_labels(pred_prob)
  # plot images and remove ticks
  plt.imshow(true_image)
  plt.xticks=[]
  plt.yticks=[]
  if pred_label == true_label:
    color = "green"
  else:
    color ="red"
  # Change plot title to get Predicted, Probability of prediction and the truth label
  plt.title("{} {:2.0f}% {}" .format(pred_label, np.max(pred_prob)*100, true_label),color=color)

In [None]:
 plot_pred(predictions, labels = val_label, images = val_images, n=9)
  

While we have got one more function to visualize our model's top prediction, let's make another view to visualize top 10 prediction of the model.
This function will:
* Take an input with the prediction probabilities array, Ground truth and an integer.
* Find the prediction using `get_prep_labels()`
* Find the top 10
  * Prediction probabilities index
  * Prection probabilities values
  * Prediction labels
* Plot top 10 prediction probabilty values and labels coloring true labels as `green`

In [None]:
model.evaluate(val_data)

### Making predictions on the test dataset.

In [None]:
#import shutil
#data_dir = "drive/MyDrive/Automobile/Test"
#shutil.rmtree(data_dir)

In [None]:
#Checking if any file is not supported image type on test dataset.
from pathlib import Path
import imghdr

data_dir = '../input/automobilepartsindentification/Automobile-parts
image_extensions = [".png", ".jpg"]  # add there all your images file extensions

img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in image_extensions:
        img_type = imghdr.what(filepath)
        if img_type is None:
            print(f"{filepath} is not an image")
        elif img_type not in img_type_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow")

In [None]:
# Training the model with full data.
full_data = create_data_batches(X,y)
full_data

In [None]:
# Creating a model.
full_model = create_model()

In [None]:
full_model.summary()

In [None]:
NUM_EPOCHS

In [None]:
# Create full model callback
full_model_tensorboard = create_tensorboard_callback()
# No validation set when training on all the data , we cannot monitor accurary
full_model_early_stopping = tf.keras.callbacks.EarlyStopping (monitor = "accuracy",
                                                              patience =3)
# Fit the full model
full_model.fit(x= full_data,
               epochs = NUM_EPOCHS,
               callbacks = [full_model_tensorboard,full_model_early_stopping ])


In [None]:
# Load test image data files
test_path = "drive/MyDrive/Automobile/Test/"
test_filename = [test_path + fname for fname in os.listdir(test_path)]
test_filename[:10]

In [None]:
import shutil
if (os.path.exists("'../input/automobilepartsindentification/Automobile-parts/Test/.ipynb_checkpoints")):
  shutil.rmtree("'../input/automobilepartsindentification/Automobile-parts/Test/.ipynb_checkpoints")

In [None]:
from pathlib import Path
import imghdr

data_dir = "'../input/automobilepartsindentification/Automobile-parts'Test/"
image_extensions = [".png", ".jpg","jpeg"]  # add there all your images file extensions

img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in image_extensions:
        img_type = imghdr.what(filepath)
        if img_type is None:
            print(f"{filepath} is not an image")
        elif img_type not in img_type_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow")

In [None]:
print(len(os.listdir(test_path)))

In [None]:
#Create test data batch
test_data_batch = create_data_batches(test_filename, test_data = True)
test_data_batch

In [None]:
# Make predictions on test data batches
test_prediction = full_model.predict(test_data_batch, verbose =1)

In [None]:
test_prediction[0]

In [None]:
test_image = unbatchify_test(test_data_batch)

In [None]:
def predict_label(prediction_test,images_test,end_batch=30):
  """
  get the predicted image and predict its label.
  """
  #Setup figure
  plt.figure(figsize=(20,20))
  for i in range(end_batch):
    #print(i)
    # plot images and remove ticks
    ax = plt.subplot(10,10, i+1)
    plt.imshow(test_image[i])
    plt.xticks=[]
    plt.yticks=[]  
    # Change plot title to get Predicted, Probability of prediction and the truth label
    plt.title("{}" .format(get_pred_labels(test_prediction[i])))
    plt.axis("off")

In [None]:
predict_label(prediction_test = test_prediction, images_test = os.listdir(test_path), end_batch=100)