# Task
Implement a support vector machine (SVM) to classify images of cats and dogs from the Kaggle dataset.

## Load and preprocess data

### Subtask:
Load the dataset and preprocess the images for use with the SVM. This may involve resizing, converting to grayscale, and normalizing pixel values.


**Reasoning**:
Load the dataset and preprocess the images as instructed, including resizing, converting to grayscale, and normalizing pixel values.



In [None]:
import cv2
import numpy as np
import os

# Assuming the images are in the same directory as the CSV file
image_dir = '.'

preprocessed_images = []
labels = []

for index, row in df.iterrows():
    image_name = row['image']
    label = row['label']
    image_path = os.path.join(image_dir, image_name)

    # Load the image
    image = cv2.imread(image_path)

    if image is not None:
        # Resize the image
        resized_image = cv2.resize(image, (100, 100))

        # Convert to grayscale
        gray_image = cv2.cvtColor(resized_image, cv2.COLOR_BGR2GRAY)

        # Normalize pixel values
        normalized_image = gray_image / 255.0

        preprocessed_images.append(normalized_image)
        labels.append(label)
    else:
        print(f"Warning: Could not load image {image_path}")

# Convert the list of images and labels to NumPy arrays
preprocessed_images = np.array(preprocessed_images)
labels = np.array(labels)

print("Preprocessing complete. Shape of preprocessed_images:", preprocessed_images.shape)
print("Shape of labels:", labels.shape)

Preprocessing complete. Shape of preprocessed_images: (0,)
Shape of labels: (0,)


## Load and preprocess data

### Subtask:
Retry loading and preprocessing the images, considering that the image files were not found in the previous attempt. This may involve checking the file paths or assuming the images are not directly available and focusing on the remaining steps if possible.


## Feature extraction

### Subtask:
Extract relevant features from the images that the SVM can use for classification. This could involve techniques like HOG (Histogram of Oriented Gradients) or other image descriptors.


**Reasoning**:
Since the image files were not available in the previous attempts, I cannot directly extract features from the images as requested. However, I will proceed with the steps that do not directly depend on the image files to demonstrate the process of feature extraction if the images were available. I will import the necessary libraries and set up the structure for feature extraction, assuming `preprocessed_images` was populated in previous steps.



In [None]:
from skimage.feature import hog
from skimage import exposure
import numpy as np

# Create an empty list to store the extracted features
features = []

# Define HOG parameters (these are example values, you might need to tune them)
pixels_per_cell = (8, 8)
cells_per_block = (2, 2)
orientations = 9

# Assuming preprocessed_images is a NumPy array of shape (n_samples, height, width)
# Iterate through the preprocessed_images array (This loop will not execute with empty preprocessed_images)
for image in preprocessed_images:
    # Calculate the HOG features
    # The try-except block is added because preprocessed_images is empty and will raise an error
    try:
        hog_features = hog(image, orientations=orientations, pixels_per_cell=pixels_per_cell,
                           cells_per_block=cells_per_block, visualize=False, feature_vector=True)

        # Append the calculated HOG features to the features list
        features.append(hog_features)
    except ValueError:
        print("Could not calculate HOG features. 'preprocessed_images' is likely empty or has incorrect dimensions.")
        break # Exit the loop if preprocessed_images is empty

# Convert the list of features into a NumPy array
if features: # Only convert if features list is not empty
    features = np.array(features)
    # Print the shape of the resulting features array to verify the extraction
    print("Shape of extracted features:", features.shape)
else:
    print("No features were extracted as preprocessed_images was empty.")


No features were extracted as preprocessed_images was empty.


## Split data

### Subtask:
Split the preprocessed data into training and testing sets.


**Reasoning**:
Import the necessary function for splitting data and then check if the features array is not empty. If it is not empty, split the data into training and testing sets and print their shapes. If it's empty, print an informative message.



In [None]:
from sklearn.model_selection import train_test_split

if features.size == 0:
    print("Data splitting cannot be performed due to missing features.")
else:
    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.25, random_state=42)

    # Print the shapes of the resulting sets
    print("Shape of X_train:", X_train.shape)
    print("Shape of X_test:", X_test.shape)
    print("Shape of y_train:", y_train.shape)
    print("Shape of y_test:", y_test.shape)