In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('DataSets/ItemDataWithAiDescription.csv', index_col=0)

In [None]:
df["AIDescription"]

0    The item is a hoodie from the category of Hood...
0    The item is categorized as trousers and is spe...
0    The item is a denim jacket categorized under C...
0    The item is an **Oversized Camo Nylon Bomber J...
0    The item is a coat categorized as a varsity ja...
                           ...                        
0    The YMC Embroidered Cap falls under the catego...
0    The YMC Ibiza '89 Pyramid T-Shirt falls under ...
0    The YMC Ibiza '89 Dancers T-Shirt is categoriz...
0    The YMC Ibiza '89 Sunset T-Shirt is categorize...
0    The Yogi Lennon Suede is a pair of footwear ca...
Name: AIDescription, Length: 14912, dtype: object

In [5]:
product_descriptors = {
    "product_name": "The name of the item",
    "category": "The category, like t-shirt, trousers, jeans, skirt, long sleeve t-shirt, jacket, boots, dress shoes etc.",
    "brand": "Brand Name",
    "fashion_season": "The season the item/collection came out. Mostly applicable to high end fashion. E.g., Spring/Summer 2024 is SS24 or Autumn/Winter 2024 is AW24",
    "primary_colour": "The primary colour",
    "secondary_colour": "The secondary colour",
    "pattern": "The pattern on the clothing like floral, striped, plain etc.",
    "fabric": "The material of the fabric; e.g., cotton, polyester, wool",
    "price": "This is the price of the item in dollars",
    "price_category": "budget, mid-range, luxury, or haut couture",
    "size": "sizing of the product",
    "fit": "the fit of the item, e.g., tight, slim fit, regular, oversized, boxy",
    "gender_audience": "Is this aimed at men, women, or unisex",
    "age_audience": "Is this aimed at toddlers, children, teens, adults, or old aged.",
    "season": "What season is this aimed to be worn in: i.e., summer, winter, autumn, spring",
    "occasion": "Occasions or settings (e.g., casual, formal, athletic).",
    "sleeve_length": "For tops/jackets (e.g., short, long, sleeveless).",
    "neckline": "The style of the neckline (e.g., V-neck, crew neck).",
    "country_of_manufacture": "Where the item was made.",
    "closure_type": "Details such as buttons, zippers, hooks, or laces.",
    "pocket_details": "Specifics on the number, placement, or style of pockets",
    "embellishments": "Information on additional decorative elements like embroidery, sequins, or patches.",
    "second_hand": "True if the item is pre-owned",
    "description": "A textual description that highlights special features or details.",
    "image_urls": "Image urls",
    "sustainability_info": "Information on eco-friendliness, ethical production, or sustainability certifications.",
    "additional_features": "Any other unique details like waterproofing, stretchability, or tech integration. that haven't been described by the attributes above."
}

Some ideas for building the most accurate retriever

- Using the feature dictionary as above. An LLM can use description analysis and image analysis to populate a dictionary for the item with words/phrases as each value using the above reference dictionary as an example.

- I could force the LLM to choose between specific words for each key-value pair and then use a simple search for specific options when finding a similar item.

- Or I could use short embeddings for description words and use a similarity check so that descriptors that are close to eachother come up in the search. E.g., very expensive, expensive, and cheap could be one of 5 categories but I would prefer that an item in the very expensive category is more "similar" than the "cheap" one. This is especially applicable to descriptors of style where the lines are slightly more blurred and natural language would be better than specific options.

- I may want to use an AI agent to look at the customer description to assign weights (or even hard stops) on features. For exmaple if a user asks for shorts specifically in the query then the agent sets a hard filter on the catgegory to match shorts. Or also if a person says something under £20 then we only want to show items under £20.

- When I retrieve some items I might want to use an AI agent to have a look at the recommendations and rank them in usefullness compared to the user query as an additional check at the end.

- I could include a 'addtional comments' bit of the description where I ask the AI to give a brief description of the style of the item and any other information that might be useful and use a vector store on that specific column in conjunction with the other search functions.

- An important aspect will be how to handle features of the item dictionary which are null. I won't be able to fill every feature with the user query. Maybe only comparing on a certain subset of features which can actually be determined from the query.