In [1]:
import os
import matplotlib.pyplot as plt
import numpy as np
import cv2
from PIL import Image

The initial dataset presented a few challenges that needed to be addressed. 
1) the images in the dataset were of different sizes, which could potentially affect the model's performance. 
2) some images were in GIF format, which needed to be eliminated. 
3) there were images in the dataset where the car logo was not clearly visible, making them less useful for the brand prediction task. 
4) Additionally, images depicting cars in side view posed a challenge since the frontal view with the logo was more informative.
5) the images from one particular class are repeated in other classes, this would affect the final model's performance in predicting the car logos.
6) Lastly, the dataset was not equally balanced among the car brands, which could introduce bias in the model's predictions.




In this task, the goal was to automate the data preprocessing and eliminate unnecessary data that would not contribute to the final model's ability to predict the car brand accurately. 

To achieve this, a model was developed to prune the dataset, resulting in a cleaner and more focused dataset. 




In [2]:
import tensorflow as tf
from tensorflow import keras
from keras.layers import Dense
from keras import Sequential
from keras.layers import MaxPooling2D, Conv2D, Dropout,Flatten
from keras.models import Model 
from keras.callbacks import ModelCheckpoint , EarlyStopping

In [60]:
dir = r'C:\Projects\e-AUTO\eAuto_photos\automation_prune'

In [61]:
classes = ['junk' , 'correct']

In [62]:
size = 128

In [63]:
training_data = []

In [64]:
for category in classes:
    path = os.path.join(dir , category)
    label = classes.index(category)
    
    

    for img in os.listdir(path):
        image = cv2.imread(os.path.join(path , img))
        
        
        try:
            image = cv2.imread(os.path.join(path , img))
            if image is not None:
                image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                img_resize = cv2.resize(image_rgb , (128,128))
                
            else:
                print(f"Error loading image: {os.path.join(path, img)}")
        except Exception as e:
            print(f"Error processing image: {os.path.join(path, img)}")
            print(f"Error message: {str(e)}")
        
        training_data.append([img_resize, label])


Error loading image: C:\Projects\e-AUTO\eAuto_photos\automation_prune\junk\frontfacingHyundaicar362.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\automation_prune\junk\frontfacingNissancar503.gif


REMOVING THE IMAGES THAT END WITH .gif

In [116]:
gif = [r'C:\Projects\e-AUTO\eAuto_photos\original_set\Hyundai\frontfacingHyundaicar327.gif' , r'C:\Projects\e-AUTO\eAuto_photos\original_set\Hyundai\frontfacingHyundaicar339.gif' , r'C:\Projects\e-AUTO\eAuto_photos\original_set\Hyundai\frontfacingHyundaicar362.gif' , r'C:\Projects\e-AUTO\eAuto_photos\original_set\Nissan\frontfacingNissancar123.gif',r'C:\Projects\e-AUTO\eAuto_photos\original_set\Nissan\frontfacingNissancar463.gif',r'C:\Projects\e-AUTO\eAuto_photos\original_set\Nissan\frontfacingNissancar503.gif',r'C:\Projects\e-AUTO\eAuto_photos\original_set\Renault\frontfacingRenaultcar458.gif',r'C:\Projects\e-AUTO\eAuto_photos\original_set\Suzuki\frontfacingsuzukicar527.gif',r'C:\Projects\e-AUTO\eAuto_photos\original_set\Tata\frontfacingTatamotorscar407.gif' , r'C:\Projects\e-AUTO\eAuto_photos\original_set\Toyota\frontfacingToyotacar359.gif']

In [119]:
len(gif)

10

In [118]:
for img in gif:
    try:

        os.remove(img)
    except FileNotFoundError:
        print('Already removed')
        continue

Already removed
Already removed
Already removed
Already removed
Already removed
Already removed
Already removed


In [65]:
import random

random.shuffle(training_data)

In [66]:
x = []
y = []

In [67]:
for features , label in training_data:
    x.append(features)
    y.append(label)

In [68]:
X = np.array(x)
Y = np.array(y)

In [69]:
X = X/255.0

In [70]:
check_point = ModelCheckpoint(
    filepath = 'best_model.v2',
    monitor = 'val_accuracy',
    save_best_only=True,
    save_weights_only=False,
    mode = 'max',
    
    
    verbose=1
)

In [71]:
# early_stopping = EarlyStopping(
#     monitor='val_accuracy',
#     patience=3,
#     mode='max',
#     verbose=1
# )

In [86]:
model = Sequential([
    Conv2D(32 , kernel_size=(3,3) , activation='relu' , input_shape = (128,128,3)),
    MaxPooling2D(pool_size=(2,2)),
    # Conv2D(32 , kernel_size=(3,3) , activation='relu'),
    # MaxPooling2D(pool_size=(2,2)),
    
    Flatten(),
    Dense(128 , activation='relu'),
    Dropout(0.3),
    Dense(64, activation = 'relu'),
    
    Dense(1 , activation='sigmoid')

])

In [87]:
model.compile(optimizer='adam' , loss='binary_crossentropy' , metrics=['accuracy'])

In [88]:
model.fit( X, Y , epochs = 30 , validation_split = 0.1  , batch_size = 64 , callbacks=[check_point])

Epoch 1/30
Epoch 1: val_accuracy did not improve from 0.91429
Epoch 2/30
Epoch 2: val_accuracy did not improve from 0.91429
Epoch 3/30
Epoch 3: val_accuracy did not improve from 0.91429
Epoch 4/30
Epoch 4: val_accuracy did not improve from 0.91429
Epoch 5/30
Epoch 5: val_accuracy did not improve from 0.91429
Epoch 6/30
Epoch 6: val_accuracy did not improve from 0.91429
Epoch 7/30
Epoch 7: val_accuracy did not improve from 0.91429
Epoch 8/30
Epoch 8: val_accuracy did not improve from 0.91429
Epoch 9/30
Epoch 9: val_accuracy did not improve from 0.91429
Epoch 10/30
Epoch 10: val_accuracy did not improve from 0.91429
Epoch 11/30
Epoch 11: val_accuracy did not improve from 0.91429
Epoch 12/30
Epoch 12: val_accuracy did not improve from 0.91429
Epoch 13/30
Epoch 13: val_accuracy did not improve from 0.91429
Epoch 14/30
Epoch 14: val_accuracy did not improve from 0.91429
Epoch 15/30
Epoch 15: val_accuracy did not improve from 0.91429
Epoch 16/30
Epoch 16: val_accuracy did not improve from 0.

<keras.callbacks.History at 0x1aee8acbdc0>

In [6]:
p_dir = r'C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos'

In [7]:
categories = ['Ford' , 'Honday' , 'Hyundai' , 'Nissan' , 'Renault' , 'Suzuki','Tata' , 'Toyota' , 'Volkswagen']

In [8]:
for category in categories:
    path = os.path.join(p_dir , category)
   
    for img in os.listdir(path):
        
        try:
            image = cv2.imread(os.path.join(path , img))
            if image is not None:
                image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                img_resize = cv2.resize(image_rgb , (128,128))
                
            else:
                print(f"Error loading image: {os.path.join(path, img)}")
        except Exception as e:
            print(f"Error processing image: {os.path.join(path, img)}")
            print(f"Error message: {str(e)}")
            continue
        try:
            

            img_resize = img_resize/255
            img_resize = np.expand_dims(img_resize , axis=0)
            prediction = model.predict(img_resize)

        except Exception as e:
            continue
        
        
        if prediction[0]<0.3:
            os.remove(os.path.join(path , img))


Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Hyundai\frontfacingHyundaicar327.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Hyundai\frontfacingHyundaicar339.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Hyundai\frontfacingHyundaicar362.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Nissan\frontfacingNissancar123.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Nissan\frontfacingNissancar463.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Nissan\frontfacingNissancar503.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Renault\frontfacingRenaultcar458.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Suzuki\frontfacingsuzukicar527.gif
Error loading image: C:\Projects\e-AUTO\eAuto_photos\eAuto_photos\photos\Tata\frontfacingTatamotorscar407.gif
Error loading imag

In [111]:
for category in categories:

    lst = os.listdir(os.path.join(p_dir , category))
    print(f"The no_of_images in the class {category} : {len(lst)}")

The no_of_images in the class Ford : 268
The no_of_images in the class Honday : 19
The no_of_images in the class Hyundai : 272
The no_of_images in the class Nissan : 230
The no_of_images in the class Renault : 321
The no_of_images in the class Suzuki : 233
The no_of_images in the class Tata : 191
The no_of_images in the class Toyota : 268
The no_of_images in the class Volkswagen : 153


By applying this model to the dataset, the non-relevant images were pruned, resulting in a more focused dataset for further analysis.
So, the updated no of images in each class after pruning are as follows:

-->After the initial automated data pruning process, some manual editing was performed to refine the dataset even further. 

-->Some images required manual cropping and zooming to focus specifically on the car logo, effectively reducing the noise in the      data and improving the model's ability to learn the distinguishing features of each brand. 

-->Additionally, a few images that were deemed irrelevant or redundant were removed from the dataset during this manual editing phase. 



In [112]:
for category in categories:

    lst = os.listdir(os.path.join(p_dir , category))
    print(f"The no_of_images in the class {category} : {len(lst)}")

The no_of_images in the class Ford : 240
The no_of_images in the class Honday : 19
The no_of_images in the class Hyundai : 262
The no_of_images in the class Nissan : 223
The no_of_images in the class Renault : 307
The no_of_images in the class Suzuki : 185
The no_of_images in the class Tata : 160
The no_of_images in the class Toyota : 200
The no_of_images in the class Volkswagen : 152


Through a combination of automated data pruning and manual editing, the dataset for car brand prediction was effectively cleaned and refined. 

The challenges presented by differently sized images, GIF format, unclear logo visibility, side view images, and dataset imbalance were addressed, resulting in a more balanced, informative, and cleaner dataset. 

This prepared dataset is expected to contribute positively to the training and performance of the final model, enabling accurate predictions of car brands based on their logos.

* After this, we have very small dataset, having only few images in each class. Hence, we can augment data to some 1000 images in each class with different shear range, brightness, rotation at different angles, flips etc.

*  This would create a sufficient dataset for the final_model to predict the car logo.

*  And also, standardize the images to same size and normalize the pixels values so the training would happen in a smooth way, reaching the global maxima for this problem statement.