Image Data Preprocessing:

Resizing: Images may come in various sizes, and resizing them to a uniform size is often necessary for model compatibility and efficiency.

Normalization: Normalizing pixel values helps in improving model convergence and performance. Typically, this involves scaling pixel values to a range like [0, 1] or [-1, 1].

Color Space Conversion: Converting images to different color spaces (e.g., RGB, grayscale, HSV) can sometimes improve model performance or reduce computational complexity.

Handling Missing Data: Sometimes, images may have missing data or be corrupted. Dealing with such cases might involve image inpainting or removing the corrupted images from the dataset.

Feature Engineering for Image Data:
    
Feature Extraction: This involves extracting relevant features from images that are informative for the given task. Features can include edges, textures, shapes, or more abstract representations learned through convolutional neural networks (CNNs).

Dimensionality Reduction: High-dimensional image data can be computationally expensive and prone to overfitting. Techniques like Principal Component Analysis (PCA) or t-SNE can be applied to reduce the dimensionality of the feature space while preserving important information.

Histogram of Oriented Gradients (HOG): HOG is a feature descriptor technique used for object detection. It calculates the distribution of intensity gradients in localized portions of an image.


In [1]:
#Import Required Library [Details are available in README.md file]
from sklearn.model_selection import train_test_split
import os
from skimage.feature import hog
import cv2
import pickle

In [2]:
# This function is defined to convert image data set orientation and pixels
def extract_hog_features(image):
	# Calculate HOG features
	hog_features = hog(image, orientations=9, pixels_per_cell=(8, 8),
	cells_per_block=(2, 2), visualize=False)
	return hog_features


In [3]:
# Since our data data set is in the form of Image so to work with machine learning models , it should in the numeric form.
# Also with the help of cv2 module , we will process the data
def load_and_extract_features(directory):
	X = []
	y = []
	for label in os.listdir(directory):
		label_dir = os.path.join(directory, label)
		for filename in os.listdir(label_dir):
			image_path = os.path.join(label_dir, filename)
			# Load image using OpenCV
			img = cv2.imread(image_path)
			# Resize image to (128, 128)
			img_resized = cv2.resize(img, (128, 128))
			# Convert image to grayscale
			img_gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)
			# Calculate HOG features
			hog_features = extract_hog_features(img_gray)
			X.append(hog_features)
			y.append(label)
	return X, y


In [4]:
# Get the current directory
current_dir = os.getcwd()

# Get the parent directory (one level up)
current_dir = os.path.dirname(current_dir)

# Get the parent directory (one level up)
parent_dir = os.path.dirname(current_dir)

# Print the parent directory
print("Parent Directory:", parent_dir)

Parent Directory: E:\upgrade_capston_project-main


In [5]:
# Load and extract features from training data
dataset_X, dataset_y = load_and_extract_features(parent_dir+"/datasets/raw_dataset/Digital images of defective and good condition tyres")

In [6]:
# Splitting the dataset for training and validation purpose in the 80-20 ratio
X_train,X_test,y_train,y_test = train_test_split(dataset_X,dataset_y,test_size=0.2,random_state=42)

In [7]:
# Save the data which will be helpful to train multiple Machine learning models
with open(parent_dir+'/datasets/processed_dataset/X_train.pkl', 'wb') as f:
    pickle.dump(X_train, f)

with open(parent_dir+'/datasets/processed_dataset/X_test.pkl', 'wb') as f:
    pickle.dump(X_test, f)

with open(parent_dir+'/datasets/processed_dataset/y_train.pkl', 'wb') as f:
    pickle.dump(y_train, f)

with open(parent_dir+'/datasets/processed_dataset/y_test.pkl', 'wb') as f:
    pickle.dump(y_test, f)

In [8]:
# def display_images(directory, num_images=5):
# 	fig, axes = plt.subplots(2, num_images, figsize=(15, 5))
# 	fig.suptitle(f"Images from {directory.split('/')[-1]}", fontsize=16)
	
# 	for i, label in enumerate(os.listdir(directory)):
# 		label_dir = os.path.join(directory, label)
# 		image_files = os.listdir(label_dir)
# 		random.shuffle(image_files)
# 		for j in range(num_images):
# 			image_path = os.path.join(label_dir, image_files[j])
# 			img = cv2.imread(image_path)
# 			axes[i, j].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
# 			axes[i, j].set_title(f"{label} Image {j+1}")
# 			axes[i, j].axis('off')
# 	plt.tight_layout()
# 	plt.show()

# # Display training images
# display_images(parent_dir+"/datasets/raw_dataset/Digital images of defective and good condition tyres")