
## ResNet-18 Classifier 

Before any modeling, we begin by segmenting the uploaded clothing images to remove background distractions and isolate just the clothing item. This step ensures clean inputs for our classifier.

Once segmented, we pass each image through a ResNet-18 classifier, trained to categorize items into one of four clothing types: shirts, t-shirts, pants, and shorts.


### Why ResNet-18?

We chose this model because it's a widely used model for image classification and has proven effective for visual feature extraction due to its residual connections and lightweight depth.  
It was ideal for our task because it allowed us to both:
- Classify clothing items accurately
- Extract visual embeddings that could be used for downstream similarity and compatibility tasks

We achieved a high classification accuracy of 99.3%, validating the quality of the model’s learning.


### Inputs:
- Segmented clothing images
- Labels for 4 clothing categories: `pants`, `shorts`, `shirts`, `t-shirts`

### Feature Extraction & Similarity Indexing:

After training:
- We repurposed the ResNet model as a feature extractor, using the 512-dimensional vector from its penultimate layer.
- These embeddings capture details like shape, fit, collar type, and texture.
- All embeddings were saved in `clothing_embedding.json`.

To support fast similarity search, we built a FAISS index and saved it as `clothing_faiss.index`. This allowed us to later retrieve visually similar items efficiently — a crucial step for building outfit recommendations.

### Output:
- Trained model weights: `feature_model.pth`
- Embedding storage: `clothing_embedding.json`
- FAISS index for similarity: `clothing_faiss.index`

## Evaluation

We evaluated the effectiveness of the embeddings in representing clothing similarity through three strategies:

### 1. Random and Top-K Visual Matches
We ran visual similarity queries using random clothing items and validated whether the top retrieved matches were perceptually similar. Matches were manually verified to have similar structure, texture, and category.

### 2. Intra- and Inter-Class Distance Analysis
We computed the average distances within each class (e.g., t-shirts with t-shirts) and across different classes (e.g., t-shirts vs pants). The significant gap between intra- and inter-class distances confirmed that the embeddings were discriminative and structured.

<img src="images/model1_distance.png" alt="Distance" width="200">

### 3. t-SNE Visualization
To visually inspect the structure of the learned embedding space, we projected the 512-D vectors to 2D using t-SNE. This confirmed strong clustering by category, indicating that similar items were embedded close together.


## Visualisation

### Random Visual Similarity Search

We validated the embedding quality using similarity queries:

<img src="images/model1_search.png" alt="Visual Similarity Search Samples" width="650">


### t-SNE Visualization of Clothing Embeddings

We reduced the high-dimensional embedding space to 2D using t-SNE and color-coded each category:

<img src="images/model1_tsne.png" alt="t-SNE Clothing Embedding Visualization" width="600">

Clear visual clustering of pants, shirts, t-shirts, and shorts confirmed the model's ability to learn semantically meaningful embeddings.


### Transition to Model 2

These high-quality embeddings formed the foundation for our second model — the Siamese Network — which uses these vectors to learn pairwise compatibility and outfit matching.

Let’s move on to how we modeled style and compatibility using those embeddings in Model 2.

