# Fictional Product Name Generation and Mapping

For us to produce meaningful insights with our analysis, we need to augment the dataset with product names. We'll do this by extracting the unique categories from the amazon_products table and creating a dictionary of fictional names for each one. We'll then map the names to identifiers and save this as a separate CSV.

## 1. Load the Provided Datasets


In [1]:
import pandas as pd
import os

# Load the provided datasets
amazon_order_items = pd.read_csv(r'C:\Users\matth\ecommerce_mba_project\data\cleaned\amazon_order_items_cleaned.csv')
amazon_orders = pd.read_csv(r'C:\Users\matth\ecommerce_mba_project\data\cleaned\amazon_orders_cleaned.csv')
amazon_products = pd.read_csv(r'C:\Users\matth\ecommerce_mba_project\data\cleaned\amazon_products_cleaned.csv')

# Display the first few rows of each dataset to understand the structure
print(amazon_order_items.head())
print(amazon_orders.head())
print(amazon_products.head())


              order_id              sku    style       category size  \
0  405-8078784-5731545   SET389-KR-NP-S   SET389            Set    S   
1  171-9198151-1101146  JNE3781-KR-XXXL  JNE3781          kurta  3XL   
2  404-0687676-7273146    JNE3371-KR-XL  JNE3371          kurta   XL   
3  403-9615377-8133951       J0341-DR-L    J0341  Western Dress    L   
4  407-1069790-7240320  JNE3671-TU-XXXL  JNE3671            Top  3XL   

         asin  qty  amount  
0  B09KXVBD7Z    0  647.62  
1  B09K3WFS32    1  406.00  
2  B07WV4JV4D    1  329.00  
3  B099NRCT7B    0  753.33  
4  B098714BZP    1  574.00  
              order_id        date                        status fulfillment  \
0  405-8078784-5731545  2022-04-30                     Cancelled    Merchant   
1  171-9198151-1101146  2022-04-30  Shipped - Delivered to Buyer    Merchant   
2  404-0687676-7273146  2022-04-30                       Shipped      Amazon   
3  403-9615377-8133951  2022-04-30                     Cancelled    Merch

## 2. Identify Unique Product Identifiers

In [2]:
# Extract unique categories from the amazon_products dataset
unique_products = amazon_products['sku'].unique()
unique_categories = amazon_products['category'].unique()

print("Unique Categories:")
for category in unique_categories:
    print(category)


Unique Categories:
Set
kurta
Western Dress
Top
Ethnic Dress
Bottom
Saree
Blouse
Dupatta
Kurta
Kurta Set
Gown
Tops
Unknown


## 3. Generate Fictional Product Names for Each Category

We'll create a dictionary of fictional product names for each unique category

In [3]:
import random

# Define fictional product names based on identified categories
product_names = {
    'Set': ['Elegant Set', 'Designer Set', 'Casual Set', 'Formal Set'],
    'kurta': ['Stylish Kurta', 'Cotton Kurta', 'Designer Kurta', 'Casual Kurta'],
    'Western Dress': ['Evening Dress', 'Cocktail Dress', 'Summer Dress', 'Casual Dress'],
    'Top': ['Summer Top', 'Designer Top', 'Casual Top', 'Formal Top'],
    'Ethnic Dress': ['Traditional Dress', 'Festival Dress', 'Designer Ethnic Dress', 'Casual Ethnic Dress'],
    'Bottom': ['Jeans', 'Chinos', 'Shorts', 'Leggings'],
    'Saree': ['Silk Saree', 'Cotton Saree', 'Designer Saree', 'Casual Saree'],
    'Blouse': ['Designer Blouse', 'Cotton Blouse', 'Silk Blouse', 'Casual Blouse'],
    'Dupatta': ['Silk Dupatta', 'Cotton Dupatta', 'Designer Dupatta', 'Casual Dupatta'],
    'Kurta': ['Designer Kurta', 'Cotton Kurta', 'Silk Kurta', 'Casual Kurta'],
    'Kurta Set': ['Designer Kurta Set', 'Cotton Kurta Set', 'Silk Kurta Set', 'Casual Kurta Set'],
    'Gown': ['Evening Gown', 'Wedding Gown', 'Designer Gown', 'Casual Gown'],
    'Tops': ['Summer Top', 'Designer Top', 'Casual Top', 'Formal Top'],
    'Unknown': ['Miscellaneous Item', 'Unspecified Product', 'Unknown Item', 'Generic Product']
}

# Create a mapping DataFrame for the products
product_mapping = []

for product in unique_products:
    category = amazon_products.loc[amazon_products['sku'] == product, 'category'].values[0]
    if category in product_names:
        name = random.choice(product_names[category])
        product_mapping.append({'sku': product, 'category': category, 'product_name': name})

product_mapping_df = pd.DataFrame(product_mapping)

print(product_mapping_df.head())


               sku       category    product_name
0   SET389-KR-NP-S            Set    Designer Set
1  JNE3781-KR-XXXL          kurta  Designer Kurta
2    JNE3371-KR-XL          kurta   Stylish Kurta
3       J0341-DR-L  Western Dress    Casual Dress
4  JNE3671-TU-XXXL            Top      Summer Top


## 4. Map Fictional Product Names to Identifiers in amazon_products_cleaned and save new csv

In [4]:
# Merge the product names with the amazon_products dataset
amazon_products_with_names = amazon_products.merge(product_mapping_df, on='sku', how='left')

# Drop one of the redundant category columns and rename the remaining one
if 'category_y' in amazon_products_with_names.columns:
    amazon_products_with_names.drop(columns=['category_y'], inplace=True)
    amazon_products_with_names.rename(columns={'category_x': 'category'}, inplace=True)

# Save the updated dataset to a new CSV file
amazon_products_with_names.to_csv(r'C:\Users\matth\ecommerce_mba_project\data\cleaned\amazon_products_with_names.csv', index=False)

# Display the first few rows of the updated dataset
print(amazon_products_with_names.head())


               sku    style       category size        asin    product_name
0   SET389-KR-NP-S   SET389            Set    S  B09KXVBD7Z    Designer Set
1  JNE3781-KR-XXXL  JNE3781          kurta  3XL  B09K3WFS32  Designer Kurta
2    JNE3371-KR-XL  JNE3371          kurta   XL  B07WV4JV4D   Stylish Kurta
3       J0341-DR-L    J0341  Western Dress    L  B099NRCT7B    Casual Dress
4  JNE3671-TU-XXXL  JNE3671            Top  3XL  B098714BZP      Summer Top
