In [2]:
!pip install pandas numpy



In [3]:
import pandas as pd
import numpy as np
import os

# Load merged data
path = r"C:\Users\branc\Downloads\capstone project\merged_data.csv"
try:
    df = pd.read_csv(path, encoding='utf-8')
except FileNotFoundError:
    print(f"Error: File not found at {path}")
    print("Check directory contents:", os.listdir(r"C:\Users\branc\Downloads\capstone project"))
    raise

# Check for time_stamp column
if 'time_stamp' not in df.columns:
    print("Error: 'time_stamp' column not found in merged_data.csv")
    print("Columns available:", df.columns.tolist())
    raise KeyError("Please verify the time_stamp column in merged_data.csv")

# Convert time_stamp to datetime
df['Timestamp'] = pd.to_datetime(df['time_stamp'], errors='coerce')
print("Converted 'time_stamp' to 'Timestamp' (datetime format)")

# Basic exploration
print("Shape (rows, columns):", df.shape)
print("Columns:", df.columns.tolist())
print("Data types:\n", df.dtypes)
print("Missing values (%):\n", df.isnull().mean() * 100)
print("Summary statistics:\n", df.describe())
print("Unique categories:", df['category_level1'].nunique(), df['category_level1'].unique())
print("First 5 rows:\n", df.head())

# Save for Excel/Tableau
output_excel = r"C:\Users\branc\Downloads\capstone project\merged_data.xlsx"
df.to_excel(output_excel, index=False)
print("Saved to:", output_excel)

Converted 'time_stamp' to 'Timestamp' (datetime format)
Shape (rows, columns): (2950, 40)
Columns: ['Customer ID', 'Product ID', 'interaction_type', 'time_stamp', 'product_name', 'selling_price', 'model_number', 'about_product', 'product_specification', 'technical_details', 'shipping_weight', 'image', 'variants', 'product_url', 'is_amazon_seller', 'length', 'width', 'height', 'category_level1', 'category_level2', 'category_level3', 'category_level4', 'age', 'gender', 'item_purchased', 'category', 'purchase_amount(usd)', 'location', 'size', 'color', 'season', 'review_rating', 'subscription_status', 'shipping type', 'discount_applied', 'promo_code_used', 'previous_purchases', 'payment_method', 'frequency_of_purchases', 'Timestamp']
Data types:
 Customer ID                        int64
Product ID                        object
interaction_type                  object
time_stamp                        object
product_name                      object
selling_price                    float64
m

# E-Commerce Data Analysis - Google Data Analytics Capstone
## Step 1: Exploratory Data Analysis (EDA)
**Objective**: Understand the structure, quality, and key characteristics of the merged e-commerce dataset to prepare for further analysis.

**Dataset**: `merged_data.csv` (~3,000 rows, ~40 columns), combining Sales, Customer, and Product Details.

**Process**:
- Loaded `merged_data.csv` using pandas.
- Converted `time_stamp` to `Timestamp` (datetime format) for time-based analysis.
- Explored shape, columns, data types, missing values, summary statistics, and unique product categories.
- Saved `merged_data.xlsx` for use in Excel and Tableau.

**Key Findings**:
- Shape: (rows, columns): (2950, 40)
- Columns: Includes `Customer ID`, `Product ID`, `selling_price`, `Purchase Amount (USD)`, `product_name`, `category_level1`, `Age`, `Gender`, `Timestamp`.
- Missing Values: ~ 0% in Customer ID, Product ID, interaction type, time_stamp, product_name, selling_price, model_number, about_product, product_specification, technical_details, shipping_weight, image, variants, product_url, is_amazon_seller, length, width, height, category_level1,category_level2, category_level3, category_level4, age, genderr, item_purchased, category, purchase_amount(usd), location, size, season, review_rating, subscription_status, shippingg type, discount_applied, promo_code_used, previous_purchases, payment_method, frequency_of_purchases; >60% in Timestamp. 
- Categories: Unique categories: 22 ['Sports & Outdoors' 'Clothing, Shoes & Jewelry' 'Toys & Games' 'Unknown'
 'Health & Household' 'Baby Products' 'Home & Kitchen'
 'Arts, Crafts & Sewing' 'Pet Supplies' 'Office Products' 'Hobbies'
 'Patio, Lawn & Garden' 'Grocery & Gourmet Food' 'Beauty & Personal Care'
 'Industrial & Scientific' 'Tools & Home Improvement' 'Video Games'
 'Remote & App Controlled Vehicle Parts' 'Automotive'
 'Remote & App Controlled Vehicles & Parts' 'Electronics'
 'Musical Instruments']
- Summary Stats: Average selling_price ~$30, Purchase Amount (USD) ~$59, Age ~44

**Tools Used**: Python (pandas, numpy), Jupyter Notebook