# CITS4402 Project: Histograms of Oriented Gradients for Human Detection
#### By Franco Meng (23370209), Laine Mulvay (22708032)
**Date: 08-May-2025**

## Phase 1: Dataset Collection

### Human Data 

The human image data were downloaded from the Pedestrian Attribute Recognition at Far Distance (PETA) dataset. PETA stands for PEdesTrian Attribute dataset. For both training and testing, the human images were extracted from two folders in this dataset: PRID and MIT.

The images are currently sized at 64 × 128 px, which is ideal for training and is the required input size for the project. The Python function used to load the images also includes a resizing step to ensure that any raw images with different dimensions are resized to 64 × 128 px.

The selected human images for both training and testing include all viewing directions (e.g., front, back, side, etc.). The PRID dataset contains multiple frames of the same individual, denoted as xxxx_a, xxxx_b, etc. To increase the variability of the training dataset, we aim to avoid including two images of the same person captured from different perspectives. The images from MIT are located in the JPG folder, while PRID images are in PNG format.

## Phase 2: Feature Extraction and Model Training

The HOG feature extraction function was developed from scratch, rather than using prebuilt functions such as cv2.HOGDescriptor or skimage.feature.hog. This was done to demonstrate a full understanding of every step in the HOG pipeline and to provide flexibility for hyperparameter tuning and ablation studies. For example, cv2.HOGDescriptor does not allow users to modify the filter used for gradient computation, limiting experimentation with different smoothing techniques.

Our custom function has the following signature:

compute_hog(image, cell_size=8, block_size=16, num_bins=9, block_stride=1, filter_="default", angle_=180)

Parameters:

- __image__: A grayscale image of size 64 × 128 px. This resizing is handled by the load_image function.

- __cell_size__: Pixel dimension of each cell. Default is 8, meaning each cell is 8 × 8 px.

- __block_size__: Pixel dimension of each block. Default is 16 px, corresponding to a block of 2 × 2 cells (i.e., 16 × 16 px).

- __num_bins__: Number of orientation bins. The default value of 9 divides the angle range (either 180° or 360°) into 9 bins.

- __block_stride__: Step size for moving the block, in cell units. A value of 1 means the block moves one cell at a time.

- __filter__: Filter used for gradient computation. Options:

- - __default__: 1D derivative filter [-1, 0, 1] for both x and y directions.

- - __prewitt__: Uses equal weights; produces sharper edges.

- - __sobel__: Uses weighted coefficients; emphasizes the center pixel and produces smoother results, making it more robust to noise.

- __angle___: Angle range in degrees (not radians). The function uses cv2.cartToPolar to compute magnitude and angle, which is accurate to ~0.3°. (OpenCV docs), The default angle range is 180°, meaning angles beyond 180° are wrapped (e.g., 270° becomes 90°). However, the function allows for an angle range of 360° if needed.



Each image is processed through this pipeline:

Raw image → Grayscale → Resize to 64×128 (if needed) → Apply filter → Compute magnitude and angle →
Create cell histograms → Normalize block histograms → Concatenate into final feature vector

The final feature vector length depends on cell_size, block_size, and block_stride.

Each image feature is paired with a ground truth label: 1 for human and 0 for non-human. These features and labels are then used to train a classifier.

We use LinearSVC() from scikit-learn as our classifier. No special hyperparameter tuning was applied to the SVM itself.

Both ROC curves (with AUC scores) and DET curves (Detection Error Tradeoff) were used to evaluate model performance. The DET curve plots miss rate (1 - recall) vs. false positives per window (FPPW). While the original HOG paper used this to evaluate detection across multiple sliding windows, in our case, each image corresponds to one window, so FPPW ≈ FP.

Both ROC and DET curves were plotted on a logarithmic scale due to the high performance of HOG features with the SVM classifier.

## Phase 3: Ablation Study

Our ablation study uses a sequential parameter tuning approach: we start by optimizing parameter A based on test/validation performance, then use that optimal A setting to evaluate parameter B, and so on. While this is a valid approach, the order of parameter tuning matters. To guide this order, we relied on insights from the original HOG paper:

__*"The main conclusions are that for good performance, one should use fine-scale derivatives (essentially no smoothing), many orientation bins, and moderately sized, strongly normalized, overlapping descriptor blocks."*__

We structured our study accordingly, starting with filter choices (derivative smoothness), followed by orientation bins and block size.