1 Slowfashion
1. Specific Category Recognition (deepest subcategories in our category tree)
* Below you’ll find our priority of subcategories to be identified by the
tool (not sure how many you estimate the students can cover so I
added our top 9):
1. Dresses  
2. High Heels  
3. Shoulder Bags  
4. Skirts  
5. Tote Bags  
6. Clutches  
7. Outerwear  
8. Boots  
9. Flats

2 These are all found at the farest end of the group category
of “Women” (Women>Clothing>Dresses and for shoes it’s
Women>Shoes>High Heels and for bags it’s Women>Bags>Shoulder
bags.
2. Grouping (Bags & Shoes):
• For products in categories that all “look characteristic for its cat-
egory” – in our case Bags and Shoes – the AI could identify the
group category, say “Bags”, when unsure of it’s a specific subcate-
gory (e.g. “Should Bag”)
• For example, if the tool cannot with certainty say that the image
resemble a Shoulder Bag, but still make out that it is a Bag, it would
be very useful to have the AI tool identify that the image is most
likely some sort of Bag and give the user a nudge taking them to
the category Bags (where the user themselves can choose the correct
subcategory under Bags). Same for Shoes.
Ideally we as a company could later follow a similar process as the students used
and further work on adding additional subcategories.
1


# We Have

| Category       | Source                  |
|----------------|-------------------------|
| Dresses        | DeepFashion             |
| High Heels     | Custom Dataset          |
| Shoulder Bags  | Handbag Custom Dataset  |
| Skirts         | DeepFashion             |
| Tote Bags      | Handbag Custom Dataset  |
| Clutches       | ❌ Not Available         |
| Outerwear      | DeepFashion             |
| Boots          | Custom Dataset          |
| Flats          | ❌ Not Available         |

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) 

# Global params

In [2]:
accepted_categories = ["dress", "high_heel", "handbag", "skirt", "outerwear", "boot"]
label2id = {
    "dress": 0,
    "high_heel": 1, 
    "handbag": 2,
    "skirt": 3, 
    "outerwear": 4, 
    "boot": 5
    
}
id2label = {
    0: "dress",
    1: "high_heel", 
    2: "handbag",
    3: "skirt", 
    4: "outerwear", 
    5: "boot"
}

RANDOM_STATE = 12345

# Loading custom dataset

In [3]:
import pandas as pd
import numpy as np
def get_dataframe_custom_dataset(cropped=False):
    path = None
    if cropped:
        path = "../custom_dataset/train/Cropped/information_dataframe.csv"
    else:
        path = "../custom_dataset/train/Labeled/information_dataframe.csv"
        
    dataframe = pd.read_csv(path)

    mapping_deepfashion = {
    "Boot": "boot",
    "Handbag": "handbag",
    "High heels": "high_heel",
    }
    
    dataframe["global_category"] = dataframe["category"].apply(lambda item: mapping_deepfashion.get(item, item))
    dataframe["is_valid"] =  np.where(dataframe["global_category"].isin(accepted_categories),1,0)
    dataframe = dataframe.query("is_valid == 1").reset_index()
    dataframe["label_id"] = dataframe["global_category"].apply(lambda item: label2id[item])
    dataframe["path"] = dataframe["path"].apply(lambda item: path.replace("information_dataframe.csv","")+item)
    return dataframe

In [4]:
custom_dataframe = get_dataframe_custom_dataset(cropped=False)

In [5]:
custom_dataframe.groupby("global_category").size()

global_category
boot         500
handbag      500
high_heel    500
dtype: int64

In [6]:
from sklearn.model_selection import train_test_split

# Stratified split based on 'global_category'
t_custom_dataframe, v_custom_dataframe = train_test_split(
    custom_dataframe,
    test_size=0.2,
    stratify=custom_dataframe["global_category"],
    random_state=RANDOM_STATE  # for reproducibility
)

In [7]:

t_custom_dataframe.groupby("global_category").size()

global_category
boot         400
handbag      400
high_heel    400
dtype: int64

In [8]:
v_custom_dataframe.groupby("global_category").size()

global_category
boot         100
handbag      100
high_heel    100
dtype: int64

# Loading deepfashion data

In [9]:
import pandas as pd
import numpy as np

def remove_images_with_multiple_categories(dataframe):
    
    duplicates_clothes = dataframe.groupby("path").size().reset_index()
    duplicates_clothes.columns = ["path", "count"]
    no_multiple_cases_val = duplicates_clothes[duplicates_clothes["count"] == 1]["path"].to_list()
    dataframe = dataframe[dataframe["path"].isin(no_multiple_cases_val)]
    return dataframe
    
def get_dataframe_deepfashion(name="train"):
    dataframe = pd.read_csv(f"../archive/DeepFashion2/img_info_dataframes/{name}.csv")
    # Remove images with multiple categories to reduce the dataset and reduce mistakes related with no good classifications
    dataframe = remove_images_with_multiple_categories(dataframe)
    mapping_deepfashion = {
    "short sleeve dress": "dress",
    "long sleeve dress": "dress",
    "vest dress": "dress",
    "sling dress": "dress",
    "skirt": "skirt",
    "long sleeve outwear": "outerwear",
    "short sleeve outwear": "outerwear",
    }
    
    dataframe["global_category"] = dataframe["category_name"].apply(lambda item: mapping_deepfashion.get(item, item))
    dataframe["is_valid"] =  np.where(dataframe["global_category"].isin(accepted_categories),1,0)
    dataframe = dataframe.query("is_valid == 1").reset_index()
    dataframe["label_id"] = dataframe["global_category"].apply(lambda item: label2id[item])
    dataframe["path"] = dataframe["path"].apply(lambda item: item.replace("/kaggle/input/deepfashion2-original-with-dataframes", "../archive/"))
    return dataframe

In [10]:
deepfashion_dataframe = get_dataframe_deepfashion("train")
deepfashion_validation_dataframe = get_dataframe_deepfashion("validation")

In [11]:
deepfashion_dataframe.groupby("global_category").size()

global_category
dress        39186
outerwear     3713
skirt         1921
dtype: int64

In [12]:
deepfashion_validation_dataframe.groupby("global_category").size()

global_category
dress        7172
outerwear     499
skirt         747
dtype: int64

In [13]:
full_training_dataframe = pd.concat([
    t_custom_dataframe[["path", "global_category", "label_id"]],
    deepfashion_dataframe[["path", "global_category", "label_id"]]])

full_validation_dataframe = pd.concat([
    v_custom_dataframe[["path", "global_category", "label_id"]],
    deepfashion_validation_dataframe[["path", "global_category", "label_id"]]])

In [14]:
full_training_dataframe.groupby("global_category").size()

global_category
boot           400
dress        39186
handbag        400
high_heel      400
outerwear     3713
skirt         1921
dtype: int64

In [15]:
full_validation_dataframe.groupby("global_category").size()

global_category
boot          100
dress        7172
handbag       100
high_heel     100
outerwear     499
skirt         747
dtype: int64

In [16]:
full_training_dataframe.to_csv("conf/train.csv", index=False)
full_validation_dataframe.to_csv("conf/validation.csv", index=False)

In [17]:
2

2