# Generate `products.csv` from `online_retail.csv`

This notebook reads the original **Online Retail** transaction dataset, extracts a unique list of products (`Description`), and appends a random `price` and `category` for demo purposes. Finally, it saves the enriched data to **products.csv**.


In [8]:
import pandas as pd
import random

In [9]:
# Constants for file paths
ONLINE_RETAIL_PATH = "../data/online_retail.csv"
PRODUCTS_CSV_PATH = "../data/products.csv"

# Load data
df = pd.read_csv(ONLINE_RETAIL_PATH, encoding='ISO-8859-1')
df.head()


Unnamed: 0,ï»¿InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


In [10]:
# Assuming the dataset has a 'Description' column for item names
unique_items = df['Description'].dropna().unique()
len(unique_items)


4223

In [11]:
categories = ["Home Decor", "Kitchen", "Gifts", "Stationery", "Toys", "Lighting"]
data = []
for item in unique_items:
    data.append({
        "item_id": item,
        "price": round(random.uniform(2.0, 50.0), 2),
        "category": random.choice(categories)
    })

products_df = pd.DataFrame(data)
products_df.to_csv(PRODUCTS_CSV_PATH, index=False)
products_df.head()


Unnamed: 0,item_id,price,category
0,WHITE HANGING HEART T-LIGHT HOLDER,16.91,Lighting
1,WHITE METAL LANTERN,42.7,Toys
2,CREAM CUPID HEARTS COAT HANGER,17.43,Gifts
3,KNITTED UNION FLAG HOT WATER BOTTLE,28.88,Gifts
4,RED WOOLLY HOTTIE WHITE HEART.,48.97,Home Decor
