## AI-Powered Multi-Attribute Fashion Search and Prediction System
### A two-phase approach to fashion image understanding using attribute-based filtering and deep learning.

### 1. Environment Setup and Configuration
This cell sets up the core environment and dependencies required for the project. It imports essential Python libraries for data handling (pandas, numpy), image manipulation (PIL), visualization (matplotlib), and interactive UI components (ipywidgets). It also includes PyTorch and HuggingFace Transformers for deep learning operations, specifically the CLIP model used for text-image embeddings.

We then define the project base paths, including the root folder (BASE), the images directory (IM_DIR), and the optional captions file (CAP_FILE). This ensures that all files are consistently referenced throughout the project. Finally, the code detects whether a GPU (CUDA) is available for acceleration; if not, it falls back to CPU, and prints the device being used.

In [3]:
import os, json, math, shutil, time
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
import ipywidgets as W
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import torch
from transformers import CLIPModel, CLIPProcessor

BASE   = "/Users/krishna/Desktop/FinalProject"   
IM_DIR = os.path.join(BASE, "images")           
CAP_FILE = os.path.join(BASE, "captions.json")   

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

Using device: cpu


### 2. Loading Attribute Annotations and Building Metadata Table
In this step, we load the attribute annotations for each fashion image. The dataset provides three main types of labels:
Shape Attributes – loaded from shape_anno_all.txt, containing 12 shape-related features (e.g., sleeves, pant length, neckline).

Fabric Attributes – loaded from fabric_ann.txt, which records the fabric type for upper, lower, and outer garments.

Color/Pattern Attributes – loaded from pattern_ann.txt, including upper, lower, and outer color or pattern information.

Additionally, optional captions are loaded from captions.json (if available), giving natural language descriptions of the images. Each data source is converted into a Pandas DataFrame with meaningful column names, then merged into a single unified table (labels_df).

To ensure data integrity, a helper function (find_path) verifies whether each image actually exists in the dataset folder. Only valid images with confirmed file paths are retained, and their absolute paths are stored in a new column img_path. This filtered dataset forms the master metadata table, linking every image to its shape, fabric, color, and caption attributes.

In [5]:
shape_df = pd.read_csv(os.path.join(BASE, 'labels/shape/shape_anno_all.txt'),
                       sep=r'\s+', header=None)
shape_df.columns = ['image'] + [f'shape_{i}' for i in range(12)]

fabric_df = pd.read_csv(os.path.join(BASE, 'labels/texture/fabric_ann.txt'),
                        sep=r'\s+', header=None, names=['image','upper_fabric','lower_fabric','outer_fabric'])

pattern_df = pd.read_csv(os.path.join(BASE, 'labels/texture/pattern_ann.txt'),
                         sep=r'\s+', header=None, names=['image','upper_color','lower_color','outer_color'])

# captions
if os.path.exists(CAP_FILE):
    with open(CAP_FILE) as f:
        cap_json = json.load(f)
    cap_df = pd.DataFrame(list(cap_json.items()), columns=['image','caption'])
else:
    cap_df = pd.DataFrame(columns=['image','caption'])

labels_df = (shape_df.merge(fabric_df, on='image', how='outer')
                      .merge(pattern_df, on='image', how='outer')
                      .merge(cap_df, on='image', how='left'))

# Build absolute image paths; keep only those that actually exist
def find_path(img_name):
    p = os.path.join(IM_DIR, img_name)
    return p if os.path.exists(p) else None

labels_df["img_path"] = labels_df["image"].map(find_path)
labels_df = labels_df[labels_df["img_path"].notna()].reset_index(drop=True)
print("Images found:", len(labels_df))

Images found: 44096


### 3. Mapping Numerical Annotations to Human-Readable Labels
The raw dataset encodes all attributes (shapes, fabrics, and colors) as integer codes. To make these annotations interpretable, we define dictionaries that map numeric IDs to descriptive labels:

shape_defs & shape_enums: Provide human-readable descriptions for shape-related attributes, such as sleeve length, hat presence, or neckline type.

fabric_map: Translates fabric codes into materials like cotton, denim, or knitted.

color_map: Maps pattern and color codes into categories like floral, striped, or pure color.
Helper functions (safe_map and shape_text) are then used to safely transform these numeric columns into string labels, handling missing values gracefully by assigning "NA". New string columns (e.g., upper_fabric_str, upper_color_str, sleeves_str, hat_str) are created inside labels_df. This transformation ensures that attributes are human-readable for both analysis and the interactive UI, making the dataset much easier to explore and filter.

In [7]:
# --- Maps
shape_defs = {
    0:'sleeve length',1:'lower clothing length',2:'socks',3:'hat',4:'glasses',5:'neckwear',
    6:'wrist wearing',7:'ring',8:'waist accessories',9:'neckline',10:'outer clothing a cardigan?',11:'upper clothing covering navel'
}
shape_enums = {
    0: {0:'sleeveless',1:'short',2:'medium',3:'long',4:'not long-sleeve',5:'NA'},
    1: {0:'three-point',1:'medium short',2:'three-quarter',3:'long',4:'NA'},
    2: {0:'no',1:'socks',2:'leggings',3:'NA'},
    3: {0:'no',1:'yes',2:'NA'},
    4: {0:'no',1:'eyeglasses',2:'sunglasses',3:'have a glasses in hand or clothes',4:'NA'},
    5: {0:'no',1:'yes',2:'NA'},
    6: {0:'no',1:'yes',2:'NA'},
    7: {0:'no',1:'yes',2:'NA'},
    8: {0:'no',1:'belt',2:'have a clothing',3:'hidden',4:'NA'},
    9: {0:'V-shape',1:'square',2:'round',3:'standing',4:'lapel',5:'suspenders',6:'NA'},
    10:{0:'yes',1:'no',2:'NA'},
    11:{0:'no',1:'yes',2:'NA'}
}
fabric_map = {0:'denim',1:'cotton',2:'leather',3:'furry',4:'knitted',5:'chiffon',6:'other',7:'NA'}
color_map  = {0:'floral',1:'graphic',2:'striped',3:'pure color',4:'lattice',5:'other',6:'color block',7:'NA'}

def safe_map(series, dct, prefix):
    """Map integer codes to text; fallback to prefix_idx if map unknown."""
    def _one(v):
        if pd.isna(v): return "NA"
        try:
            vi = int(v)
            return dct.get(vi, f"{prefix}_{vi}") if dct else f"{prefix}_{vi}"
        except Exception:
            return "NA"
    return series.map(_one)

# Fabric/color string columns
try:
    labels_df["upper_fabric_str"] = safe_map(labels_df["upper_fabric"], globals().get("fabric_map"), "fabric")
    labels_df["lower_fabric_str"] = safe_map(labels_df["lower_fabric"], globals().get("fabric_map"), "fabric")
    labels_df["outer_fabric_str"] = safe_map(labels_df["outer_fabric"], globals().get("fabric_map"), "fabric")
    labels_df["upper_color_str"]  = safe_map(labels_df["upper_color"],  globals().get("color_map"),  "color")
    labels_df["lower_color_str"]  = safe_map(labels_df["lower_color"],  globals().get("color_map"),  "color")
    labels_df["outer_color_str"]  = safe_map(labels_df["outer_color"],  globals().get("color_map"),  "color")
except Exception as e:
    print("Mapping error:", e)

# Shape strings (sleeves, lower length, hat, glasses)
def shape_text(col_idx, series, prefix):
    enum = None if "shape_enums" not in globals() else shape_enums.get(col_idx)
    def _one(v):
        if pd.isna(v): return "NA"
        try:
            vi = int(v)
            if enum: return enum.get(vi, f"{prefix}_{vi}")
            return f"{prefix}_{vi}"
        except Exception:
            return "NA"
    return series.map(_one)

labels_df["sleeves_str"]   = shape_text(0, labels_df["shape_0"], "sleeves")
labels_df["lowerlen_str"]  = shape_text(1, labels_df["shape_1"], "lowerlen")
labels_df["hat_str"]       = shape_text(3, labels_df["shape_3"], "hat")
labels_df["glasses_str"]   = shape_text(4, labels_df["shape_4"], "glasses")

labels_df.head(2)

Unnamed: 0,image,shape_0,shape_1,shape_2,shape_3,shape_4,shape_5,shape_6,shape_7,shape_8,...,upper_fabric_str,lower_fabric_str,outer_fabric_str,upper_color_str,lower_color_str,outer_color_str,sleeves_str,lowerlen_str,hat_str,glasses_str
0,MEN-Denim-id_00000080-01_7_additional.jpg,5.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,...,cotton,cotton,,pure color,lattice,,,long,no,no
1,MEN-Denim-id_00000089-01_7_additional.jpg,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,...,cotton,cotton,,pure color,pure color,,sleeveless,long,no,no


### 4. Loading Precomputed CLIP Embeddings from Cache
This step handles the loading of CLIP image embeddings, which were precomputed earlier to save time. Instead of recalculating embeddings for every run, the system reads them from a cached file stored in .cache/clip_vitb32_img_embeds.npz. This file contains:

##### paths – the list of image file paths corresponding to each embedding.

##### embeddings – the actual CLIP vector representations of the images.

A sanity check ensures that the cached image paths exactly match the current labels_df["img_path"] order. If there’s any mismatch (e.g., if dataset files were moved or renamed), the code raises a clear error and instructs the user to rebuild the cache.

Finally, the embeddings are L2-normalized, so cosine similarity can be computed directly as a dot product. The output prints the total shape of the embedding matrix, confirming how many images are represented and the dimensionality of each vector. This caching mechanism significantly improves performance, as embeddings only need to be computed once and can then be reused for fast similarity search and filtering.

In [9]:
import numpy as np, os

CACHE_DIR = os.path.join(BASE, ".cache")
EMB_NPZ   = os.path.join(CACHE_DIR, "clip_vitb32_img_embeds.npz")

if not os.path.exists(EMB_NPZ):
    raise FileNotFoundError(
        f"Missing cache: {EMB_NPZ}\n"
        "Run the original Step 4 once to build the embeddings cache."
    )

cache = np.load(EMB_NPZ, allow_pickle=True)
cached_paths = cache["paths"].tolist()
img_embs     = cache["embeddings"]

# sanity check: ensure cached paths match current labels_df order
if cached_paths != labels_df["img_path"].tolist():
    raise RuntimeError(
        "Cached image list differs from current labels_df. "
        "Since you don't add images, this likely means paths changed; "
        "restore the old paths or rebuild the cache once."
    )

# normalize (just to be safe)
img_embs = img_embs / (np.linalg.norm(img_embs, axis=1, keepdims=True) + 1e-9)
print("Loaded embeddings:", img_embs.shape)

Loaded embeddings: (44096, 512)


In [10]:
#pip install faiss-cpu

### 5. Initializing FAISS Index and CLIP Text Encoder
##### This section sets up the search engine for content-based retrieval:

FAISS Integration: The code first attempts to import FAISS, a high-performance similarity search library. If available, it builds an IndexFlatIP (inner-product index) using the preloaded CLIP embeddings, enabling very fast nearest-neighbor searches. If FAISS is not available, the system falls back to a slower NumPy-based dot product approach for cosine similarity. This ensures portability across environments.

CLIP Text Encoder Setup: Next, the CLIP text encoder (CLIPModel and CLIPProcessor) is re-initialized to guarantee availability. The model is loaded onto GPU if available (cuda), otherwise defaults to CPU. This encoder transforms free-text queries into vector embeddings aligned with the image embeddings.

##### Utility Functions:
text_to_vec(query) → Converts a query string into a normalized CLIP text embedding. cosine_search_text(query, topn) → Retrieves the top-n most similar images for a given query. If FAISS is enabled, it uses fast indexing; otherwise, it computes cosine similarity directly with NumPy. This dual setup provides the core retrieval mechanism: CLIP ensures semantic alignment between text and images, while FAISS accelerates the similarity search, making the interactive filtering system responsive and scalable.

In [12]:
try:
    import faiss
    use_faiss = True
except Exception:
    use_faiss = False
    print("FAISS not available → falling back to NumPy search.")

if use_faiss:
    index = faiss.IndexFlatIP(img_embs.shape[1])
    index.add(img_embs.astype("float32"))
# ---- CLIP init (text encoder) — tiny patch ----
from transformers import CLIPModel, CLIPProcessor
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

# (Re)create CLIP objects if missing
if 'clip_model' not in globals() or clip_model is None:
    clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(device).eval()

if 'clip_proc' not in globals() or clip_proc is None:
    clip_proc  = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

def text_to_vec(query: str) -> np.ndarray:
    with torch.no_grad():
        inputs = clip_proc(text=[query], return_tensors="pt", padding=True).to(device)
        t = clip_model.get_text_features(**inputs)            # (1, 512)
        t = torch.nn.functional.normalize(t, dim=-1)
    return t.cpu().numpy()[0]

def cosine_search_text(query: str, topn=400):
    q = text_to_vec(query)
    if use_faiss:
        sims, idx = index.search(q.astype("float32")[None, :], topn)
        return sims[0], idx[0]
    else:
        sims = img_embs @ q
        idx = np.argsort(-sims)[:topn]
        return sims[idx], idx

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


### 6. Attribute Filtering and Scoring Utilities
This block defines the tools for filtering search results by user-selected attributes and ranking them based on combined similarity and attribute match scores:

Dropdown Options (opts): For each attribute column (e.g., sleeves, hat, fabrics, colors), this function extracts the unique values from the dataset, cleans them up, and provides the list of valid choices for the Gradio dropdown menus. If no values exist, it defaults to a placeholder.

Attribute Columns (ATTR_COLS): A structured list of all attributes that the user can filter on, mapped to human-readable names for UI display.

#### Filtering Functions:
apply_attr_mask(df, selections) applies Boolean masks to the candidate dataset, keeping only rows that match the selected attributes.

attr_match_score(row, selections) calculates how well a given image matches the chosen attributes, producing a fractional score (0–1). For example, if 4 attributes are chosen and 3 match, the score is 0.75.

Result Formatting (format_caption): Creates a structured, human-readable caption for each image. It displays the decoded attribute labels (sleeves, fabrics, colors, etc.), along with the cosine similarity score, the attribute match score, and the final blended score.

Together, these utilities enable the system to not only find visually similar images but also respect user-defined attribute filters, ensuring precise and interpretable results.

In [14]:
def opts(col):
    vals = sorted([str(v) for v in labels_df[col].dropna().unique().tolist() if str(v) != "NA"])
    return vals if vals else ["(none)"]

ATTR_COLS = [
    ("sleeves_str",      "Sleeves"),
    ("lowerlen_str",     "Lower Length"),
    ("hat_str",          "Hat"),
    ("glasses_str",      "Glasses"),
    ("upper_fabric_str", "Upper Fabric"),
    ("lower_fabric_str", "Lower Fabric"),
    ("outer_fabric_str", "Outer Fabric"),
    ("upper_color_str",  "Upper Color"),
    ("lower_color_str",  "Lower Color"),
    ("outer_color_str",  "Outer Color"),
]
def search_pipeline(query, alpha, top_k, 
                    sleeves, lowerlen, hat, glasses,
                    upfab, lofab, oufab, upcol, locol, oucol):
    top_k = int(top_k)  # <-- add this line
    
def apply_attr_mask(df, selections: dict):
    mask = pd.Series(True, index=df.index)
    for col, _label in ATTR_COLS:
        chosen = selections.get(col, [])
        if chosen:  # apply only if user selected values
            mask &= df[col].isin(chosen)
    return df[mask]

def attr_match_score(row, selections: dict):
    wanted = [(c, set(v)) for c,v in selections.items() if v]
    if not wanted: return 1.0
    hits = 0
    for c, choices in wanted:
        val = row.get(c, None)
        if pd.isna(val): continue
        if val in choices: hits += 1
    return hits / len(wanted)

def format_caption(row, cos=0.0, attr=0.0, final=0.0):
    return (
        f"{row['image']}\n"
        f"Sleeves:{row.get('sleeves_str','NA')}  Lower:{row.get('lowerlen_str','NA')}\n"
        f"UpFab:{row.get('upper_fabric_str','NA')}  LoFab:{row.get('lower_fabric_str','NA')}  OutFab:{row.get('outer_fabric_str','NA')}\n"
        f"UpCol:{row.get('upper_color_str','NA')}  LoCol:{row.get('lower_color_str','NA')}  OutCol:{row.get('outer_color_str','NA')}\n"
        f"Hat:{row.get('hat_str','NA')}  Glasses:{row.get('glasses_str','NA')}\n"
        f"cos={cos:.2f}  attr={attr:.2f}  score={final:.2f}"
    )

### 7. End-to-End Search Pipeline Implementation
This block ties together content-based similarity and multi-attribute filtering into one unified search function for the interactive tool:
Safe Wrapper (search_pipeline_safe): Runs the main pipeline inside a try/except block. If an error occurs (e.g., missing data, unexpected input), it gracefully returns an empty gallery and shows a readable error message in the Gradio UI instead of crashing the app.

##### Main Search Pipeline (search_pipeline):

Query Validation – Ensures that the user has provided a non-empty text query.

Content Search – Uses CLIP embeddings to retrieve candidate images most similar to the text query. The top candidates are returned with cosine similarity scores.

Attribute Selections – Collects the user’s filter choices (sleeves, fabric, colors, etc.) into a dictionary.

Filtering + Scoring – Applies attribute masks to eliminate non-matching items. Then, for each remaining image, calculates an attribute match score and blends it with the cosine similarity using a tunable weight parameter α:


final_score=α⋅cosine_sim+(1−α)⋅attr_score
This allows balancing between visual similarity and attribute strictness.

##### Ranking – Sorts results by final score, selects the top k, and prepares them for display.

##### Output – Returns each result as an (image path, caption) pair, where captions include attributes and scores.

The output also includes a status message (e.g., “Showing 24 results”), giving the user clear feedback about the search results. This function is the heart of the Phase-1 system, combining content embeddings, FAISS search, and rule-based attribute filtering into a single pipeline.

In [16]:
import traceback

def search_pipeline_safe(*args, **kwargs):
    try:
        items, status = search_pipeline(*args, **kwargs)
        # Gradio sometimes prefers tuples; convert to tuples to be safe
        items = [(p, c) for (p, c) in items]
        return items, status
    except Exception as e:
        tb = traceback.format_exc()
        # Return empty gallery and show the error inside the app
        return [], f"**Error in search:** {e}\n\n```\n{tb}\n```"

def search_pipeline(query, alpha, top_k,
                    sleeves, lowerlen, hat, glasses,
                    upfab, lofab, oufab,
                    upcol, locol, oucol):
    if not query or not query.strip():
        return [], "Please enter a text query."

    # 1) content candidates
    sims, idxs = cosine_search_text(query.strip(), topn=max(200, top_k*6))
    cand = labels_df.iloc[idxs].copy()
    cand["cosine_sim"] = sims

    # 2) attribute selections
    selections = {
        "sleeves_str": sleeves or [],
        "lowerlen_str": lowerlen or [],
        "hat_str": hat or [],
        "glasses_str": glasses or [],
        "upper_fabric_str": upfab or [],
        "lower_fabric_str": lofab or [],
        "outer_fabric_str": oufab or [],
        "upper_color_str": upcol or [],
        "lower_color_str": locol or [],
        "outer_color_str": oucol or [],
    }

    # 3) filter + score
    cand = apply_attr_mask(cand, selections).copy()
    if len(cand) == 0:
        return [], "No matches for this combination. Try relaxing filters."

    cand["attr_score"] = cand.apply(lambda r: attr_match_score(r, selections), axis=1)
    cand["final_score"] = float(alpha)*cand["cosine_sim"] + (1-float(alpha))*cand["attr_score"]
    cand = cand.sort_values("final_score", ascending=False).head(int(top_k)).reset_index(drop=True)

    # 4) return gallery items: (image, caption)
    items = []
    for _, r in cand.iterrows():
        cap = format_caption(r, r["cosine_sim"], r["attr_score"], r["final_score"])
        items.append([r["img_path"], cap])
    status = f"Showing {len(items)} result(s)."
    return items, status

### 8. Building the Interactive Gradio Interface
This cell creates the front-end user interface for Phase-1, enabling interactive content-based fashion search with multi-attribute filters:

Query Controls:

Text Query Box – Users can enter a natural language description like “red floral dress with long sleeves” to trigger CLIP-based similarity search.

Content Weight Slider (α) – Lets users control the balance between visual similarity (cosine score) and attribute matching.

Top K Slider – Adjusts how many top-ranked results are displayed in the gallery.

Attribute Filters: Multi-select dropdown menus are generated dynamically from the dataset using opts_map. Users can refine results by choosing specific shape attributes (sleeves, hat, glasses), fabrics (upper, lower, outer), and colors (upper, lower, outer). These filters directly connect to the backend pipeline.

Results Display:

Search Button executes the search_pipeline_safe function, passing the query, weight, and filter selections.

Gallery Component shows the retrieved images in a clean 4-column layout, along with captions containing attribute values and scores.

Status Widget displays feedback, such as the number of results found or any caught errors.

Execution: Finally, demo.launch() starts a lightweight local web app, allowing the user to run real-time searches outside the notebook.

This interface integrates the backend pipeline with a user-friendly front end, enabling intuitive exploration of the dataset while combining textual search and structured attribute filtering.



In [18]:
import gradio as gr

# Precompute options for dropdowns
opts_map = {col: opts(col) for col, _ in ATTR_COLS}

with gr.Blocks() as demo:  # no title arg for compatibility
    gr.Markdown("## Content-Based Search with Multi-Attribute Filters")

    with gr.Row():
        query = gr.Textbox(label="Text Query", placeholder="e.g., red floral dress with long sleeves")
        alpha = gr.Slider(0.0, 1.0, value=0.8, step=0.05, label="Content weight (α)")
        topk  = gr.Slider(6, 60, value=24, step=6, label="Top K")

    with gr.Row():
        sleeves   = gr.Dropdown(opts_map["sleeves_str"],  multiselect=True, label="Sleeves")
        lowerlen  = gr.Dropdown(opts_map["lowerlen_str"], multiselect=True, label="Lower Length")
        hat       = gr.Dropdown(opts_map["hat_str"],      multiselect=True, label="Hat")
        glasses   = gr.Dropdown(opts_map["glasses_str"],  multiselect=True, label="Glasses")

    with gr.Row():
        upfab = gr.Dropdown(opts_map["upper_fabric_str"], multiselect=True, label="Upper Fabric")
        lofab = gr.Dropdown(opts_map["lower_fabric_str"], multiselect=True, label="Lower Fabric")
        oufab = gr.Dropdown(opts_map["outer_fabric_str"], multiselect=True, label="Outer Fabric")

    with gr.Row():
        upcol = gr.Dropdown(opts_map["upper_color_str"], multiselect=True, label="Upper Color")
        locol = gr.Dropdown(opts_map["lower_color_str"], multiselect=True, label="Lower Color")
        oucol = gr.Dropdown(opts_map["outer_color_str"], multiselect=True, label="Outer Color")

    run_btn = gr.Button("Search")

    # Gallery: v3/v4 friendly settings
    gallery = gr.Gallery(
        label="Results",
        show_label=True,
        columns=4,
        height=800,
        allow_preview=True,     # ok on v3/4
        object_fit="contain"    # better rendering; safe on v3/4
    )
    status  = gr.Markdown()

    run_btn.click(
        fn=search_pipeline_safe,
        inputs=[query, alpha, topk,
                sleeves, lowerlen, hat, glasses,
                upfab, lofab, oufab,
                upcol, locol, oucol],
        outputs=[gallery, status]
    )

demo.launch()

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


