# üìä Week 1: Explore E-Commerce Data

**Goal:** Understand the data structure and get basic insights

## What we'll do:
1. Load the data
2. Check data quality
3. Basic statistics
4. Simple visualizations

---

## Step 1: Import Libraries

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Better looking plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully!")

## Step 2: Load the Data

In [None]:
# Load customers data
customers = pd.read_csv('../data/customers.csv')

# Load transactions data
transactions = pd.read_csv('../data/transactions.csv')

print(f"‚úÖ Loaded {len(customers):,} customers")
print(f"‚úÖ Loaded {len(transactions):,} transactions")

## Step 3: Explore Customer Data

In [None]:
# Look at first 5 customers
customers.head()

In [None]:
# Check data types and missing values
customers.info()

In [None]:
# Basic statistics
customers.describe()

## Step 4: Visualize Customer Demographics

In [None]:
# Age distribution
plt.figure(figsize=(10, 5))
plt.hist(customers['age'], bins=30, edgecolor='black', alpha=0.7)
plt.title('Customer Age Distribution', fontsize=16, fontweight='bold')
plt.xlabel('Age')
plt.ylabel('Number of Customers')
plt.grid(axis='y', alpha=0.3)
plt.show()

In [None]:
# Customers by country
country_counts = customers['country'].value_counts()

plt.figure(figsize=(10, 5))
country_counts.plot(kind='bar', color='steelblue', edgecolor='black')
plt.title('Customers by Country', fontsize=16, fontweight='bold')
plt.xlabel('Country')
plt.ylabel('Number of Customers')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nPercentage by country:")
print((country_counts / len(customers) * 100).round(1))

In [None]:
# Marketing channels
channel_counts = customers['marketing_channel'].value_counts()

plt.figure(figsize=(10, 5))
plt.pie(channel_counts, labels=channel_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Customer Acquisition Channels', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## Step 5: Explore Transaction Data

In [None]:
# Look at first transactions
transactions.head(10)

In [None]:
# Transaction statistics
print("üí∞ TRANSACTION SUMMARY")
print("=" * 50)
print(f"Total Transactions: {len(transactions):,}")
print(f"Total Revenue: ‚Ç¨{transactions['total_amount'].sum():,.2f}")
print(f"Average Order Value: ‚Ç¨{transactions['total_amount'].mean():.2f}")
print(f"Median Order Value: ‚Ç¨{transactions['total_amount'].median():.2f}")
print(f"Min Order: ‚Ç¨{transactions['total_amount'].min():.2f}")
print(f"Max Order: ‚Ç¨{transactions['total_amount'].max():.2f}")

In [None]:
# Most popular products
product_sales = transactions.groupby('product_name').agg({
    'transaction_id': 'count',
    'total_amount': 'sum'
}).rename(columns={'transaction_id': 'quantity_sold', 'total_amount': 'revenue'})

product_sales = product_sales.sort_values('revenue', ascending=False)

plt.figure(figsize=(12, 6))
product_sales['revenue'].plot(kind='bar', color='coral', edgecolor='black')
plt.title('Revenue by Product', fontsize=16, fontweight='bold')
plt.xlabel('Product')
plt.ylabel('Total Revenue (‚Ç¨)')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüìä Top Products by Revenue:")
print(product_sales)

## Step 6: Customer Purchase Behavior

In [None]:
# How many purchases per customer?
purchases_per_customer = transactions.groupby('customer_id').size()

plt.figure(figsize=(12, 5))
plt.hist(purchases_per_customer, bins=50, edgecolor='black', alpha=0.7, color='green')
plt.title('Distribution of Purchases per Customer', fontsize=16, fontweight='bold')
plt.xlabel('Number of Purchases')
plt.ylabel('Number of Customers')
plt.axvline(purchases_per_customer.mean(), color='red', linestyle='--', linewidth=2, label=f'Average: {purchases_per_customer.mean():.1f}')
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.show()

print(f"Average purchases per customer: {purchases_per_customer.mean():.1f}")
print(f"Median purchases per customer: {purchases_per_customer.median():.1f}")

## üéØ Key Insights

**Write your observations here after running the analysis:**

1. Customer demographics: 
   - Most customers from _______
   - Age range: _______
   
2. Revenue patterns:
   - Total revenue: ‚Ç¨_______
   - Average order value: ‚Ç¨_______
   
3. Customer behavior:
   - Average purchases per customer: _______
   - Different customer types: _______

---

## üìù Next Steps (Week 2):
- Segment customers by behavior (RFM analysis)
- Identify churned customers
- Calculate customer lifetime value
- Statistical testing