**Task-03: Image Classification: Cats vs. Dogs SVM Project**


**Overview:** The implementation of a Support Vector Machine (SVM) model for classifying images of cats and dogs from the Kaggle dataset. The project explores the application of SVMs in handling high-dimensional image data, emphasizing feature extraction, model training, evaluation, and parameter optimization.

**Dataset:** The training archive comprises 25,000 images of dogs and cats. The project focuses on training the SVM model using these files to enable accurate classification. Subsequently, the trained model predicts labels for test1.zip, differentiating between dogs (1) and cats (0).

**Dataset:** https://www.kaggle.com/c/dogs-vs-cats/data

In [9]:
import os
import cv2
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [14]:
# importing files
train_dir = "CatDog/dog vs cat/dataset/training_set"
test_dir = "CatDog/dog vs cat/dataset/test_set"
categories = ['cats', 'dogs']
X = []
y = []

In [15]:
for category in categories:
    train_path = os.path.join(train_dir, category)
    test_path = os.path.join(test_dir, category)
    class_num = categories.index(category)
    for img in os.listdir(train_path):
        img_path = os.path.join(train_path, img)
        img_array = cv2.imread(img_path)
        img_array = cv2.resize(img_array, (224, 224))
        X.append(img_array)
        y.append(class_num)

In [16]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Data augmentation
datagen = ImageDataGenerator(rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
datagen.fit(X_train)

In [17]:
# Use a pre-trained CNN to extract features
from keras.applications.vgg16 import VGG16, preprocess_input
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Preprocess the images before feeding to VGG16
X_train_preprocessed = np.array([preprocess_input(img) for img in X_train])
X_test_preprocessed = np.array([preprocess_input(img) for img in X_test])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 0us/step


In [18]:
# Extract features using VGG16
X_train_features = base_model.predict(X_train_preprocessed)
X_test_features = base_model.predict(X_test_preprocessed)

# Flatten the features
X_train_flatten = X_train_features.reshape(X_train_features.shape[0], -1)
X_test_flatten = X_test_features.reshape(X_test_features.shape[0], -1)

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m981s[0m 5s/step
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m479s[0m 10s/step


In [19]:
# Train the SVM model
model = SVC(kernel='linear', C=1, gamma='auto')
model.fit(X_train_flatten, y_train)

In [20]:
# Test the SVM model
y_pred = model.predict(X_test_flatten)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.976875
