# Step 3: Feature extraction (image → vector)

Implement in src/features.py later
### Techniques:
- Color histogram → simple RGB histogram vector
- Texture features → Local Binary Patterns (LBP)
- Shape/structure features → Histogram of Oriented Gradients (HOG)
### Output: 1D fixed-length feature vector for each image.
- Save extracted features (e.g., numpy arrays) to avoid re-computation.

> Raw images are 2D (height × width × channels).
SVM and kNN require 1D vectors per sample.
Features should capture color, shape, and texture of trash items, which helps the classifier distinguish metal, paper, plastic, etc.

+ If a sample has high values in the LBP section corresponding to “rough texture” → likely paper.
+ If it has high values in HOG section corresponding to vertical edges → maybe a bottle.
+ If it has high blue in color section → maybe a plastic bottle.

In [26]:
import cv2
import numpy as np
from skimage.feature import hog
from skimage.color import rgb2gray
from skimage.feature import local_binary_pattern
import os
import pandas as pd

In [27]:
# Color histogram
def color_histogram(img, bins=32):
    # img: RGB image
    hist_r = cv2.calcHist([img], [0], None, [bins], [0, 256]) # Red
    hist_g = cv2.calcHist([img], [1], None, [bins], [0, 256]) # Green
    hist_b = cv2.calcHist([img], [2], None, [bins], [0, 256]) # Blue
    hist = np.concatenate([hist_r, hist_g, hist_b]).flatten()
    hist = hist / np.sum(hist)  # normalize
    return hist


This function computes HOG features for an image. HOG is a powerful feature descriptor often used in object detection and image classification, because it captures shape and edge information rather than just color.
It’s especially useful when the structure of objects (like the contours of waste items) is more important than their color.

In [28]:
# HOG features
def hog_features(img, pixels_per_cell=(16,16), cells_per_block=(2,2)):
    gray = rgb2gray(img)
    features = hog(gray,
                   pixels_per_cell=pixels_per_cell,
                   cells_per_block=cells_per_block,
                   block_norm='L2-Hys')
    return features



In [29]:
def lbp_features(img, P=8, R=1):
    # Convert to grayscale
    gray = rgb2gray(img.astype(np.float32) / 255.0)


    # Compute LBP
    lbp = local_binary_pattern(gray, P, R, method='uniform')

    # Build histogram of LBP
    n_bins = int(lbp.max() + 1)
    hist, _ = np.histogram(lbp, bins=n_bins, range=(0, n_bins), density=True)

    return hist

| Feature type    | Vector length                             | What it represents              |
| --------------- | ----------------------------------------- | ------------------------------- |
| Color histogram | 96 (32 bins × 3 channels)                 | Color distribution of the image |
| HOG             | 500+ (depends on image size & HOG params) | Edge/shape patterns             |
| LBP             | 59 (uniform patterns for P=8)             | Local texture patterns          |


- Concatenated vector length ≈ 96 + 500 + 59 = 655 (example).

In [30]:
def extract_features(img):
    color_feat = color_histogram(img)
    hog_feat = hog_features(img)
    lbp_feat = lbp_features(img)

    # Concatenate all features into a single vector
    features = np.concatenate([color_feat, hog_feat, lbp_feat])
    return features

# Output Sample

In [31]:
# --- Path to images ---
image_folder = "/Users/rodynaamr/Image_Classification_SVM_kNN/data/paper"
image_files = [f for f in os.listdir(image_folder) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]

# Limit to first 10 images
image_files = image_files[:10]

# --- Extract features ---
feature_matrix = []
for file in image_files:
    path = os.path.join(image_folder, file)
    img = cv2.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    features = extract_features(img)
    feature_matrix.append(features)

feature_matrix = np.array(feature_matrix)
print("Feature matrix shape:", feature_matrix[1])

Feature matrix shape: [0.00000000e+00 8.47710544e-06 2.37358938e-04 ... 5.94228109e-02
 4.57326253e-01 4.71038818e-02]


# add vectors to a csv file, create a table for each waste type with the data from both folder with images augmented and not

In [32]:
# --- Paths ---
base_path = "/Users/rodynaamr/Image_Classification_SVM_kNN/data"
waste_types = ["cardboard", "glass", "metal", "paper", "plastic", "trash"]

In [33]:
# --- Loop through each waste type and their augmented versions ---
all_features = []
all_labels = []

for waste in waste_types:
    for folder_suffix in ["", "_aug"]:  # original + augmented
        folder_path = os.path.join(base_path, waste + folder_suffix)
        if not os.path.exists(folder_path):
            continue
        image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]

        for file in image_files:
            path = os.path.join(folder_path, file)
            img = cv2.imread(path)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            features = extract_features(img)

            all_features.append(features)
            all_labels.append(waste)


In [36]:
# --- Convert to DataFrame ---
feature_matrix = np.array(all_features)
df = pd.DataFrame(feature_matrix)
df['label'] = all_labels  # add waste type as label

In [37]:
# --- Save to CSV ---
df.to_csv("waste_features.csv", index=False)
print("Saved features to waste_features.csv")
print("Feature matrix shape:", feature_matrix.shape)

Saved features to waste_features.csv
Feature matrix shape: (20503, 25774)
