<a href="https://colab.research.google.com/github/lav162329/product-category-classifier/blob/main/notebooks/01_data_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## ðŸŽ¯ Project Overview: E-Commerce Product Category Classifier

Autor: Vojce Lazic

### Business Context

This project addresses a critical challenge in high-volume e-commerce operations: the slow, resource-intensive, and error-prone process of manually categorizing thousands of new products introduced daily. Accurate and swift classification is essential for efficient inventory management, product discoverability, and improving the overall customer experience on the platform.

---

### ðŸ“¥ Initialization and Data Loading

This cell imports all the necessary Python libraries and loads products.csv, taking into account the path within the `data/ folder` and Clean up column names by stripping leading/trailing whitespace

In [2]:
# Import Libraries and Load Data
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from scipy.sparse import hstack
import joblib
import matplotlib.pyplot as plt
import seaborn as sns

# load dataset from GitHub
url = "https://raw.githubusercontent.com/lav162329/product-category-classifier/main/data/products.csv"
df = pd.read_csv(url)

print(f"DataFrame loaded. Shape: {df.shape}")

# Clean up column names by stripping leading/trailing whitespace
df.columns = df.columns.str.strip()
print("\nCleaned Column Names:\n", df.columns.tolist())

# Display initial data check
print("\n--- Initial Data Preview (First 5 Rows) ---")
print(df[['Product Title', 'Category Label', 'Number_of_Views', 'Merchant Rating']].head())
print("\nData Types:")
print(df.dtypes)

DataFrame loaded. Shape: (35311, 8)

Cleaned Column Names:
 ['product ID', 'Product Title', 'Merchant ID', 'Category Label', '_Product Code', 'Number_of_Views', 'Merchant Rating', 'Listing Date']

--- Initial Data Preview (First 5 Rows) ---
                                       Product Title Category Label  \
0                    apple iphone 8 plus 64gb silver  Mobile Phones   
1                apple iphone 8 plus 64 gb spacegrau  Mobile Phones   
2  apple mq8n2b/a iphone 8 plus 64gb 5.5 12mp sim...  Mobile Phones   
3                apple iphone 8 plus 64gb space grey  Mobile Phones   
4  apple iphone 8 plus gold 5.5 64gb 4g unlocked ...  Mobile Phones   

   Number_of_Views  Merchant Rating  
0            860.0              2.5  
1           3772.0              4.8  
2           3092.0              3.9  
3            466.0              3.4  
4           4426.0              1.6  

Data Types:
product ID           int64
Product Title       object
Merchant ID          int64
Category L