# [IAPR][iapr]: Final project - Chocolate Recognition


**Moodle group ID:** *39*  
**Kaggle challenge:** *`Classic`*
**Kaggle team name (exact):** "*xx*"  

**Author 1 (SCIPER):** *Léo Bruneau (xxxxx)*  
**Author 2 (SCIPER):** *Louis Pivron (xxxxx)*  
**Author 3 (SCIPER):** *Huckleberry Thums (xxxxx)*  

**Due date:** 21.05.2025 (11:59 pm)


## Key Submission Guidelines:
- **Before submitting your notebook, <span style="color:red;">rerun</span> it from scratch!** Go to: `Kernel` > `Restart & Run All`
- **Only groups of three will be accepted**, except in exceptional circumstances.


[iapr]: https://github.com/LTS5/iapr2025

---

## Justification of Design Choices

### Sliding Window

One of the core parts of our pipeline is the sliding window detector. We came about this idea after having tried multiple classical object detection and segmentation methods. For example, we tried using region growing and contour detection, but these methods did not perform well on images with non-uniform backgrounds. Due to the clutter and the variety of backgrounds, it was difficult to foresee using contour or region based methods.

Using a sliding window is much more robust to these variations. The idea is to take a window of a fixed size and slide it over the image with a fixed stride. For each position of the window, we compute the histogram of the pixels inside the window. This histogram is then compared to a set of histograms that we have pre-computed for each reference chocolate. The best-matching histogram is then used to determine the likelihood of the window containing a chocolate.

This procedure leaves us with a heatmap showing the likelihood of each pixel being part of a chocolate. By thresholding, we can obtain a binary mask of the detections. The detections are then in the form of 'blobs' in the heatmap. To detect these blobs we use edge detection through the Laplacian of Gaussian (LoG) method. This method has input parameters that allow us to control the size of the blobs we want to detect, and how many by specifying a threshold.

After the blobs have been detected, we can use the bounding boxes of these blobs to crop the original image and obtain the chocolate candidates. From these patches, we can then compute the features we want to use for classification. 

### Features

In our project we use a variety of features. Since we had trouble dealing with noisy backgrounds, our method does not rely on segmentations of the chocolates. So, the features we compute do not include any shape or contour based features. Instead, we use a combination of texture features, color features, and some basic statistics. Below we list the features we compute:

**Color based features**: color statistics such as means and standard deviations in multiple color spaces (RGB, LAB) and color histograms in these same color spaces. \
**Texture based features**: Local Binary Patterns (LBP), Haralick GLCM features, and Gabor energy.

Note that these features were computed on subdivisions of the segmentated reference chocolates and on the extracted patches of the train/test images. Subdivisions were done by splitting the masked reference images into a certain number of patches. The number of patches depends on the size of the reference chocolate.

### Classification

In [None]:
# why we used a certain classifier
# use of training set (fully labaled, by hand): supervised classification
# optimization

## Technical Description

### Reference Image Processing

Some preprocessing is done on the reference images before we we start segmenting and computing the features of the chocolates. 

The first step is to downsample the images by a factor of 4 in order to speed up the processing time.

The next step is to detect the chocolates. Since the reference images contain one chocolate per image on a uniform background, it is relatively easy to detect the chocolate. We use Canny edge detection to detect the edges (outer contour) of the chocolate. Canny includes a Gaussian filter to smooth the image before computing the gradients. This is done to reduce noise and improve the edge detection. 

The Canny edge detection was found to work well with most of the reference images. However, in the case of the Jelly White and Comtesse chocolates, the edges were not detected properly. This is most likely due to the fact that these two chocolates are uniform in color, with a color close to that of the background. To improve the contour detection we therefore used hough transforms to fit ellipses to the detected edges. 

Finally, we are left with masks of the chocolates. To make sure the contours are closed, we perform a morphological closing operation. This is done by dilating the edges and then eroding them. Then, since we want filled masks of the chocolates, we fill the holes in the masks. Finally, we erode the masks to ensure they do not contain any background pixels. 

At the end of this processing applied on each image, the result is a binary mask of the chocolate (see figure below).

These masks will be used for feature extraction, described further down below, but also compute the color histograms on the whole chocolate to be used in the sliding window detector.

### Sliding Window

The sliding window detector is implemented in our `sliding_window_compare` function, which systematically scans the image with a fixed-size window (default: 64×64 pixels) and stride (default: 16 pixels). At each position, a patch is extracted and its color histogram is computed using the RGB channels, with histogram parameters tuned for sufficient granularity (16 bins per channel). We use a smoothed histogram via a Gaussian kernel to improve robustness to noise and small variations.

This patch histogram is then compared to all reference chocolate histograms using the Bhattacharyya distance, a metric that quantifies similarity between probability distributions. The closest match is selected, and the inverse distance (1 − distance) is used as a similarity score. These scores are stored in a heatmap aligned with the image grid, representing local similarity to known chocolate types.

This heatmap serves as the input for the next stage of the pipeline, where we apply blob detection via the Laplacian of Gaussian (LoG) to isolate likely chocolate candidates.

### Blob Detection

The `compute_blobs` function identifies chocolate candidates by detecting local maxima in the heatmap produced by the sliding window detector. It uses the Laplacian of Gaussian (LoG) method via skimage.feature.blob_log, which is well-suited for detecting roughly circular blobs at multiple scales. The parameters min_sigma, max_sigma, and thr control the minimum and maximum expected blob size and the detection threshold, respectively.

Once raw blobs are detected, their scale (σ) is converted to an approximate radius R=2σR=2​σ. However, LoG often produces multiple detections for a single object, especially when objects are large or span several overlapping regions. To address this, the function clusters nearby blobs using DBSCAN, treating detections within avg_R × 1.2 pixels of each other as belonging to the same chocolate candidate. For each cluster, the center is computed as the average of member centers, and the radius is set based on the selected merge_policy: the average radius, the maximum, or a fixed value (avg_R).

The final output is a list of candidate detections, each represented as a circle (y, x, R) in heatmap coordinates, which can be mapped back to the original image for cropping or further analysis.

### Patch Extraction

The `extract_crops` function extracts square patches from the original images, centered on the detected chocolate blobs. It takes as input a list of images, a list of blob detections for each image, and parameters defining the crop size, sliding window size, and stride.

For each blob, the function first converts the blob coordinates from heatmap space to image space. This is done by scaling the coordinates by the stride and offsetting by half the window size to recover the center position in the original image. A square region of size crop_size × crop_size is then extracted around this center.

To ensure consistency in input dimensions, the function pads the crop with zeros if it would otherwise extend beyond the image boundary or be smaller than the target size. This guarantees that all extracted patches are uniform in shape and suitable for downstream classification.

The output is a list of lists of crops, where each sublist corresponds to the set of chocolate candidate regions extracted from a single image.

### Training Data Labeling

### Feature Extraction

At this point in the pipeline we have both the reference chocolate masks and labeled training patches. For the reference images, we subdivide the masks into patches of size 64x64 and compute the features on these patches. This is in a sense data augmentation, since we can use the same chocolate to compute multiple feature vectors and thus increasing the number of training samples. 

Considering all reference patches and training patches, this gives us a matrix of features of size 821 x 818.

Using all features gives us feature vectors of length 818. Not all of these features may be useful, so we use PCA to reduce the dimensionality of the feature vectors. The number of components is set such that 95% of the variance is explained, which is a common practice. Before doing so it is necessary to standardize the features, since they are on different scales. We use the StandardScaler from sklearn to do this. The PCA is done using the PCA class from sklearn. The PCA is fitted on the training data (which includes the reference data) and then used to transform the future test data. This is done to ensure that the test data is transformed in the same way as the training data. The same can be said for the StandardScaler.

### Classification

In [None]:
# how labeling was done
# SVM
# optimization

## Quantitative and Qualitative Analysis

### Quantitative Results

In [None]:
# > For the quantitative analysis, your Kaggle results, along with some intermediate results obtained throughout the project, should be sufficient. 

### Qualitative Results

In [None]:
# > For qualitative analysis, we are looking for an interpretation of how your model works. Your model is not counting chocolates "magically"—it likely segments them internally and uses that information to compute useful descriptors.
# > We expect you to show some examples of this internal segmentation (e.g., binary masks), and to demonstrate that the model can extract meaningful features.
# > A helpful suggestion: you can extract the features and visualize them using a 2D PCA or t-SNE plot to assess whether the model learns discriminative representations.

## TA comments from the forum

In [None]:
# Mais ducoup on peux upload des images des resultats intermediraires?
# Par exemple la visualization de la classification, de la segmentation, ...
# Ou on doit upload le code pour generer ces resultats?

# Reponse sur le forum:
# > Hello, Yes you can include the code that generates the figures in your notebook

# Sauvegarder le classifier, scaler, etc. :
# > As we will have to run your code in main.py, you can't use joblib. If you have a sklearn model, you can use pickle to save it. 

# > It is better to have a fully narrative description.
# Donc en gros ne pas mettre de code dans le rapport

# > Label for L1000780 is wrong: there are two jelly milk items instead of one jelly milk and one jelly black.