# Data Exploration & Visualization (Customer-Orders Model)

This Jupyter Notebook connects to the MySQL `retail_demo` database, retrieves data from the **customers**, **products**, and **orders** tables, and performs exploratory data analysis with **9 visualizations**.

**Requirements:**  
- mysql-connector-python  
- pandas  
- matplotlib  

Install dependencies (if needed):  
```bash
pip install mysql-connector-python pandas matplotlib
```


In [None]:
import mysql.connector as mc
import pandas as pd
import matplotlib.pyplot as plt

# Connect to MySQL (edit password if needed)
conn = mc.connect(
    host="localhost",
    user="root",
    password="YOUR_PASSWORD",
    database="retail_demo"
)

# Load tables
customers = pd.read_sql("SELECT * FROM customers", conn)
products  = pd.read_sql("SELECT * FROM products", conn)
orders    = pd.read_sql("SELECT * FROM orders", conn)

orders_full = pd.read_sql(
    """
    SELECT o.order_id, o.paid_price,
           c.customer_id, c.name, c.country,
           p.product_id, p.product_name, p.category, p.color
    FROM orders o
    JOIN customers c ON o.customer_id=c.customer_id
    JOIN products  p ON o.product_id=p.product_id
    """, conn
)

customers.head(), products.head(), orders.head()

## Visualizations
We generate **9 visuals** to explore orders, customers, and products.

### Orders per Customer

In [None]:
orders.groupby('customer_id')['order_id'].count().sort_values(ascending=False).plot(kind='bar', figsize=(10,4)); plt.title('Orders per Customer'); plt.show()

### Orders by Country

In [None]:
orders_full.groupby('country')['order_id'].count().sort_values(ascending=False).plot(kind='bar', figsize=(8,4)); plt.title('Orders by Country'); plt.show()

### Revenue by Category

In [None]:
orders_full.groupby('category')['paid_price'].sum().sort_values(ascending=False).plot(kind='bar', figsize=(8,4)); plt.title('Revenue by Category'); plt.show()

### Top Colors by Orders

In [None]:
orders_full.groupby('color')['order_id'].count().sort_values(ascending=False).head(15).plot(kind='bar', figsize=(10,4)); plt.title('Top Colors by Orders'); plt.show()

### Top 15 Products by Orders

In [None]:
orders_full.groupby('product_name')['order_id'].count().sort_values(ascending=False).head(15).plot(kind='bar', figsize=(12,4)); plt.title('Top 15 Products by Orders'); plt.show()

### Total Spend per Customer

In [None]:
orders_full.groupby('name')['paid_price'].sum().sort_values(ascending=False).plot(kind='bar', figsize=(12,4)); plt.title('Total Spend per Customer'); plt.show()

### Average Order Value by Category

In [None]:
orders_full.groupby('category')['paid_price'].mean().sort_values(ascending=False).plot(kind='bar', figsize=(8,4)); plt.title('Average Order Value by Category'); plt.show()

### Revenue by Category-Color (Top 20)

In [None]:
orders_full.groupby(['category','color'])['paid_price'].sum().sort_values(ascending=False).head(20).plot(kind='bar', figsize=(12,4)); plt.title('Revenue by Category-Color (Top 20)'); plt.show()

### Paid Price Distribution

In [None]:
orders['paid_price'].plot(kind='hist', bins=30, figsize=(8,4)); plt.title('Paid Price Distribution'); plt.xlabel('Paid Price'); plt.show()

## Close Connection
Always close your database connection when finished.

In [None]:
conn.close()