In [3]:
import os

def rename_files_in_folder(folder_path, prefix):
    # List all files in the folder
    files = os.listdir(folder_path)
    # Sort files to ensure sequential naming
    files.sort()
    # Loop through the files and rename them
    for i, filename in enumerate(files):
        # Construct new name with sequential number
        new_name = f"{prefix}{i + 1}{os.path.splitext(filename)[1]}"
        # Get full path for the current and new filename
        src = os.path.join(folder_path, filename)
        dst = os.path.join(folder_path, new_name)
        # Rename the file
        os.rename(src, dst)
    print(f"Renamed {len(files)} files in folder: {folder_path}")

def main(directory):
    # Define folder names
    folders = ['cats', 'panda', 'dogs']
    for folder in folders:
        # Construct the full path to each folder
        folder_path = os.path.join(directory, folder)
        if os.path.exists(folder_path):
            # Rename files in the folder
            rename_files_in_folder(folder_path, folder[:-1])
        else:
            print(f"Folder not found: {folder_path}")

if __name__ == "__main__":
    # Use a raw string to avoid unicode escape issues
    main(r'C:\Users\LAPTOPINN\Downloads\ANIMALS')


Renamed 1000 files in folder: C:\Users\LAPTOPINN\Downloads\ANIMALS\cats
Renamed 1000 files in folder: C:\Users\LAPTOPINN\Downloads\ANIMALS\panda
Renamed 1000 files in folder: C:\Users\LAPTOPINN\Downloads\ANIMALS\dogs


## DATA SET

This Python script automates the renaming of files within specific subfolders of an image dataset directory. Here's a breakdown of the script's functionality and the structure of the dataset:

### Script Functionality
- The script defines a function `rename_files_in_folder()` to sequentially rename files within a given folder. It sorts files to ensure sequential naming and renames each file by appending a sequential number to a prefix.
- The `main()` function orchestrates the renaming process for multiple subfolders within a directory. It iterates through predefined folder names ('cats', 'panda', 'dogs'), constructs the full path to each folder, and calls `rename_files_in_folder()` for renaming.
- The script execution block ensures that the `main()` function is called when the script is run directly.

### Dataset Structure
- The dataset directory specified in the script contains three subfolders: 'cats', 'panda', and 'dogs'.
- Each subfolder represents a distinct class of images corresponding to a specific animal category.
- The dataset is well-organized, with over 1000 images in each subfolder, ensuring a sufficient amount of data for training or analysis tasks.

### Usage
To use this script:
1. Specify the directory containing the image dataset.
2. Ensure that the dataset directory structure matches the expected structure (i.e., subfolders for each animal category).
3. Run the script, which will automatically rename all files within the subfolders according to their category and a sequential number.

This automated renaming process enhances dataset organization and facilitates subsequent data processing tasks, such as training machine learning models or conducting image analysis.

In [4]:
import os
import cv2
import numpy as np
import random
from collections import Counter

# Function to augment an image
def augment_image(image):
    # Rotate image
    angle = random.randint(-30, 30)
    M = cv2.getRotationMatrix2D((image.shape[1]//2, image.shape[0]//2), angle, 1)
    rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))

    # Flip image
    flipped = cv2.flip(rotated, 1)

    # Scale image
    scale = random.uniform(0.8, 1.2)
    scaled = cv2.resize(flipped, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
    
    return scaled

# Function to preprocess an image
def preprocess_image(image):
    # Convert image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Normalize image
    normalized = cv2.normalize(gray, None, 0, 255, cv2.NORM_MINMAX)

    # Histogram equalization
    equalized = cv2.equalizeHist(normalized)

    # Denoising
    denoised = cv2.fastNlMeansDenoising(equalized, None, 30, 7, 21)

    return denoised

# Function to load, augment, and preprocess images
def load_and_prepare_images(folder):
    images = []
    labels = []
    counter=0
    for class_name in os.listdir(folder):
        class_path = os.path.join(folder, class_name)
        if os.path.isdir(class_path):
            for img_name in os.listdir(class_path):
                img_path = os.path.join(class_path, img_name)
                img = cv2.imread(img_path)
                if img is not None:
                    # Augment and preprocess image
                    augmented_img = augment_image(img)
                    preprocessed_img = preprocess_image(augmented_img)
                    images.append(preprocessed_img)
                    labels.append(class_name)
                    counter += 1
                    if counter % 50 == 0:
                        print(f"{counter} done")
    return images, labels

# Define the dataset folder path
dataset_folder = r'C:\Users\LAPTOPINN\Downloads\ANIMALS'

# Load, augment, and preprocess dataset
images, labels = load_and_prepare_images(dataset_folder)

# Convert lists to numpy arrays for compatibility with further processing
images = np.array(images)
labels = np.array(labels)

# Print class distribution
class_distribution = Counter(labels)
print("Class distribution:", class_distribution)

# Example to show shape of images and labels
print("Images shape:", images.shape)
print("Labels shape:", labels.shape)

50 done
100 done
150 done
200 done
250 done
300 done
350 done
400 done
450 done
500 done
550 done
600 done
650 done
700 done
750 done
800 done
850 done
900 done
950 done
1000 done
1050 done
1100 done
1150 done
1200 done
1250 done
1300 done
1350 done
1400 done
1450 done
1500 done
1550 done
1600 done
1650 done
1700 done
1750 done
1800 done
1850 done
1900 done
1950 done
2000 done
2050 done
2100 done
2150 done
2200 done
2250 done
2300 done
2350 done
2400 done
2450 done
2500 done
2550 done
2600 done
2650 done
2700 done
2750 done
2800 done
2850 done
2900 done
2950 done
3000 done
Class distribution: Counter({'cats': 1000, 'dogs': 1000, 'panda': 1000})
Images shape: (3000,)
Labels shape: (3000,)


  images = np.array(images)


### AUGMENTING AND PREPROCESSING
This Python script efficiently handles image processing tasks, from loading and augmenting images to preprocessing them for further analysis. Leveraging OpenCV and NumPy libraries, it streamlines the transformation of raw image data into a format suitable for machine learning or computer vision applications.

The script employs functions like augment_image() and preprocess_image() to introduce variability and standardize images, respectively. It systematically processes images from specified folders using load_and_prepare_images(), logging progress as it progresses through each image.

After processing, it converts image and label lists into NumPy arrays and analyzes class distribution using the Counter module. This structured approach lays a strong foundation for image-based tasks, facilitating efficient dataset preparation and analysis for subsequent modeling endeavors.

In [10]:
import os
import cv2

# Function to compute HOG features for an image
def compute_hog_features(image):
    # Convert image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Resize image to reduce dimensionality
    resized_image = cv2.resize(gray, (64, 128))  # Adjust size as needed
    
    # Initialize HOG descriptor
    hog = cv2.HOGDescriptor()
    
    # Compute HOG features
    features = hog.compute(resized_image)
    
    return features.flatten()  # Flatten the feature vector

# Function to compute HOG features for all images in a folder
def compute_hog_features_in_folder(folder_path):
    hog_features_list = []
    for filename in os.listdir(folder_path):
        # Load image
        image_path = os.path.join(folder_path, filename)
        image = cv2.imread(image_path)
        if image is not None:
            # Compute HOG features
            hog_features = compute_hog_features(image)
            hog_features_list.append(hog_features)
    return hog_features_list

# Path to the folder containing subfolders with images
dataset_folder = r'C:\Users\LAPTOPINN\Downloads\ANIMALS'

# List of subfolder names
subfolders = ['cats', 'dogs', 'panda']

# Iterate over each subfolder
for subfolder in subfolders:
    print(f"Computing HOG features for images in '{subfolder}' folder:")
    
    # Path to the current subfolder
    subfolder_path = os.path.join(dataset_folder, subfolder)
    
    # Compute HOG features for all images in the current subfolder
    hog_features_list = compute_hog_features_in_folder(subfolder_path)
    
    # Print the number of images and their corresponding HOG features
    print(f"Number of images: {len(hog_features_list)}")
    for i, hog_features in enumerate(hog_features_list):
        print(f"Image {i+1}: HOG Features Shape: {hog_features.shape}")


Computing HOG features for images in 'cats' folder:
Number of images: 1000
Image 1: HOG Features Shape: (3780,)
Image 2: HOG Features Shape: (3780,)
Image 3: HOG Features Shape: (3780,)
Image 4: HOG Features Shape: (3780,)
Image 5: HOG Features Shape: (3780,)
Image 6: HOG Features Shape: (3780,)
Image 7: HOG Features Shape: (3780,)
Image 8: HOG Features Shape: (3780,)
Image 9: HOG Features Shape: (3780,)
Image 10: HOG Features Shape: (3780,)
Image 11: HOG Features Shape: (3780,)
Image 12: HOG Features Shape: (3780,)
Image 13: HOG Features Shape: (3780,)
Image 14: HOG Features Shape: (3780,)
Image 15: HOG Features Shape: (3780,)
Image 16: HOG Features Shape: (3780,)
Image 17: HOG Features Shape: (3780,)
Image 18: HOG Features Shape: (3780,)
Image 19: HOG Features Shape: (3780,)
Image 20: HOG Features Shape: (3780,)
Image 21: HOG Features Shape: (3780,)
Image 22: HOG Features Shape: (3780,)
Image 23: HOG Features Shape: (3780,)
Image 24: HOG Features Shape: (3780,)
Image 25: HOG Features

### HOG
This script is designed to extract Histogram of Oriented Gradient (HOG) features from an input image using Python's OpenCV library. At its core is the compute_hog_features() function, which processes the image through a series of steps. Initially, it converts the image to grayscale to simplify subsequent calculations. Then, it initializes a HOG descriptor and computes the HOG features for the grayscale image. Finally, the resulting feature vector is flattened to ensure compatibility with further processing.

In the main execution block, an example image is loaded from a specified file path using cv2.imread(). This image is then passed to the compute_hog_features() function to extract its HOG features. The shape of the resulting feature vector is printed, providing insight into the dimensionality of the extracted features.

This script serves as a foundational tool for extracting descriptive features from images, which is essential for various computer vision tasks such as object detection, recognition, and classification.

In [12]:
import os
import tensorflow as tf
import numpy as np

# Load the pre-trained VGG16 model
model = tf.keras.applications.VGG16(weights='imagenet', include_top=False)

# Function to extract high-level CNN features from an image
def extract_high_level_cnn_features(image_path, layer_name='block5_pool'):
    # Load and preprocess the image
    img = tf.keras.preprocessing.image.load_img(image_path, target_size=(224, 224))
    img_array = tf.keras.preprocessing.image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = tf.keras.applications.vgg16.preprocess_input(img_array)
    
    # Extract features from the specified layer
    intermediate_model = tf.keras.Model(inputs=model.input, outputs=model.get_layer(layer_name).output)
    features = intermediate_model.predict(img_array)
    
    return features.flatten()  # Flatten the feature vector

# Function to extract high-level CNN features for all images in a folder
def extract_high_level_cnn_features_in_folder(folder_path, layer_name='block5_pool'):
    cnn_features_list = []
    for filename in os.listdir(folder_path):
        # Load image
        image_path = os.path.join(folder_path, filename)
        # Extract high-level CNN features for the image
        cnn_features = extract_high_level_cnn_features(image_path, layer_name)
        cnn_features_list.append(cnn_features)
    return cnn_features_list

# Path to the folder containing subfolders with images
dataset_folder = r'C:\Users\LAPTOPINN\Downloads\ANIMALS'

# List of subfolder names
subfolders = ['cats', 'dogs', 'panda']

# Specify the layer from which to extract features
layer_name = 'block5_pool'

# Dictionary to store high-level CNN features for each subfolder
all_cnn_features = {}

# Iterate over each subfolder
for subfolder in subfolders:
    print(f"Extracting high-level CNN features for images in '{subfolder}' folder:")
    
    # Path to the current subfolder
    subfolder_path = os.path.join(dataset_folder, subfolder)
    
    # Extract high-level CNN features for all images in the current subfolder
    cnn_features_list = extract_high_level_cnn_features_in_folder(subfolder_path, layer_name)
    
    # Store the features in the dictionary
    all_cnn_features[subfolder] = cnn_features_list

    # Print the number of images and their corresponding high-level CNN features
    print(f"Number of images: {len(cnn_features_list)}")
    for i, cnn_features in enumerate(cnn_features_list):
        print(f"Image {i+1}: High-Level CNN Features Shape: {cnn_features.shape}")

# Print the dictionary containing high-level CNN features for all subfolders
print("\nHigh-Level CNN Features for All Subfolders:")
print(all_cnn_features)


Extracting high-level CNN features for images in 'cats' folder:
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 491ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 911ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 395ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 377ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 417ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 536ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 376ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 424ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 377ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 434ms/s

### CNN USING VGG16 FROM TENSORFLOW AND KERAS API
This script utilizes TensorFlow and Keras to extract deep Convolutional Neural Network (CNN) features from an input image, leveraging the pre-trained VGG16 model. Here's a concise overview:

The VGG16 model, pre-trained on the ImageNet dataset, is loaded via TensorFlow's Keras applications module. This model serves as a powerful feature extractor, capable of recognizing a broad spectrum of visual features.

The `extract_deep_cnn_features()` function is then defined to process an input image, ensuring it aligns with VGG16's input requirements. It loads, resizes, converts, and preprocesses the image, before passing it through the VGG16 model to extract deep CNN features.

In the main execution block, an example image's path is specified, and the `extract_deep_cnn_features()` function is called to compute the CNN features. The resulting feature vector's shape is printed, offering insights into its dimensionality.

### Comparing HOG and deep CNN features:
- HOG features excel in capturing local gradient orientation, making them suitable for simpler tasks like object detection and tracking.
- Deep CNN features, as learned from hierarchical representations, offer rich semantic information, making them superior for complex tasks like image classification and recognition.

In this scenario, deep CNN features from VGG16 typically outperform HOG features, particularly for tasks demanding high-level image understanding. However, the choice depends on specific task requirements, including computational efficiency and interpretability.

In [13]:
import os
import cv2
import numpy as np
import random
import tensorflow as tf
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Step 1: Data Preparation with OpenCV and NumPy
def load_and_prepare_images(folder):
    images = []
    labels = []
    for class_name in os.listdir(folder):
        class_path = os.path.join(folder, class_name)
        if os.path.isdir(class_path):
            for img_name in os.listdir(class_path):
                img_path = os.path.join(class_path, img_name)
                img = cv2.imread(img_path)
                if img is not None:
                    # Apply augmentation techniques (rotate, flip, scale)
                    img = augment_image(img)
                    # Apply preprocessing techniques (normalize, histogram equalization, denoising)
                    img = preprocess_image(img)
                    images.append(img)
                    labels.append(class_name)
    return images, labels

def augment_image(image):
    # Rotate image
    angle = random.randint(-30, 30)
    M = cv2.getRotationMatrix2D((image.shape[1]//2, image.shape[0]//2), angle, 1)
    rotated = cv2.warpAffine(image, M, (image.shape[1], image.shape[0]))
    # Flip image
    flipped = cv2.flip(rotated, 1)
    # Scale image
    scale = random.uniform(0.8, 1.2)
    scaled = cv2.resize(flipped, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
    return scaled

def preprocess_image(image):
    # Skip grayscale and normalization for images fed into VGG16 model
    # Apply histogram equalization and denoising if necessary, but ensure the image stays 3-channel
    image = cv2.cvtColor(image, cv2.COLOR_BGR2YCrCb)
    channels = cv2.split(image)
    cv2.equalizeHist(channels[0], channels[0])
    image = cv2.merge(channels)
    image = cv2.cvtColor(image, cv2.COLOR_YCrCb2BGR)
    # Denoising
    denoised = cv2.fastNlMeansDenoisingColored(image, None, 30, 7, 21)
    return denoised

# Step 2: Feature Extraction with TensorFlow's Keras API
def extract_deep_cnn_features(images):
    model = tf.keras.applications.VGG16(weights='imagenet', include_top=False)
    features = []
    for img in images:
        img = cv2.resize(img, (224, 224))  # Resize image to VGG16 input size
        img_array = tf.keras.preprocessing.image.img_to_array(img)  # Convert image to array
        img_array = np.expand_dims(img_array, axis=0)  # Expand dimensions to create batch
        img_array = tf.keras.applications.vgg16.preprocess_input(img_array)  # Preprocess input (e.g., normalization)
        # Extract features using pre-trained VGG16 model
        features.append(model.predict(img_array).flatten())
    return features

# Step 3: Dimensionality Reduction with NumPy and scikit-learn
def perform_pca(features, n_components):
    pca = PCA(n_components=n_components)
    reduced_features = pca.fit_transform(features)
    return reduced_features, pca

# Step 4: Classification using SVM with scikit-learn
def train_svm(features, labels):
    # Split data into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
    
    # Train SVM classifier
    svm = SVC(kernel='linear')  # Use linear kernel
    svm.fit(X_train, y_train)
    
    # Predict on test set
    y_pred = svm.predict(X_test)
    
    # Evaluate performance
    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    
    return svm

# Load and prepare training images
dataset_folder = r'D:\ANIMALS'
print("PREPROCESSING TRAINING IMAGES")
images, labels = load_and_prepare_images(dataset_folder)

# Extract deep CNN features
print("EXTRACTING DEEP FEATURES FROM TRAINING IMAGES")
deep_features = extract_deep_cnn_features(images)

# Perform PCA for dimensionality reduction
n_components = 60
print("REDUCING FEATURES OF TRAINING IMAGES")
reduced_features, pca = perform_pca(deep_features, n_components)

# Train SVM classifier
print("TRAINING SVM ON TRAINING IMAGES")
svm_classifier = train_svm(reduced_features, labels)

# Function to load and preprocess test images
def load_and_prepare_test_images(folder):
    images = []
    image_names = []
    for img_name in os.listdir(folder):
        img_path = os.path.join(folder, img_name)
        img = cv2.imread(img_path)
        if img is not None:
            # Preprocess image (without augmentation for test set)
            img = preprocess_image(img)
            images.append(img)
            image_names.append(img_name)
    return images, image_names

# Load and prepare test images
test_folder = r'D:\ANIMALS TEST'
print("PREPROCESSING TESTING IMAGES")
test_images, test_image_names = load_and_prepare_test_images(test_folder)

# Extract deep CNN features for test images
print("EXTRACTING FEATURES OF TESTING IMAGES")
test_features = extract_deep_cnn_features(test_images)

# Apply PCA transformation to test features
print("REDUCING FEATURES OF TESTING IMAGES")
reduced_test_features = pca.transform(test_features)

# Predict test images using the trained SVM classifier
print("PREDICTING TESTING IMAGES")
predicted_labels = svm_classifier.predict(reduced_test_features)

# Print the test results
print()
for img_name, label in zip(test_image_names, predicted_labels):
    print(f"Image: {img_name}, Predicted Label: {label}")


PREPROCESSING TRAINING IMAGES


KeyboardInterrupt: 

## Image Classification Pipeline Using SVM with Deep Features

This Python script outlines a comprehensive pipeline for image classification using Support Vector Machine (SVM) with deep features extracted from a pre-trained VGG16 model. The steps involved in this pipeline are:

### 1. Data Preparation
- Utilizing OpenCV and NumPy, the script loads and prepares training images from a specified dataset folder. Augmentation techniques such as rotation, flipping, and scaling are applied alongside preprocessing techniques like histogram equalization and denoising.

### 2. Feature Extraction
- TensorFlow's Keras API is employed to extract deep CNN features from the preprocessed images using the pre-trained VGG16 model. These features serve as informative representations of the image content.

### 3. Dimensionality Reduction
- The extracted deep features are subjected to Principal Component Analysis (PCA) using scikit-learn to reduce their dimensionality while retaining essential information. This step enhances computational efficiency and mitigates the curse of dimensionality.

### 4. Training SVM Classifier
- The reduced features are split into training and testing sets. A linear Support Vector Machine (SVM) classifier is trained on the training set, leveraging scikit-learn's SVM implementation. The model is then evaluated using the test set, providing insights into its classification performance.

### 5. Testing and Prediction
- The script further prepares test images from a separate folder, applying the same preprocessing steps. Deep CNN features are extracted and dimensionality-reduced using the previously trained PCA model. The trained SVM classifier predicts the labels of the test images, facilitating evaluation and inference.

### Conclusion
- This image classification pipeline showcases the integration of deep learning-based feature extraction with traditional machine learning techniques like SVM. By leveraging the power of deep features and SVM, the pipeline offers a robust solution for image classification tasks, demonstrating the synergy between deep learning and classical machine learning approaches.
