Project Title: Retail Data Analysis SQL
Level: Beginner
Database: retail_data_db
This project demonstrates SQL skills used by data analysts to explore and analyze a retail dataset. It covers data setup, cleaning, and analysis to answer practical business questions. The goal is to learn how to organize, query, and interpret data to uncover useful patterns and insights.
- Build and set up a database for retail transactions.
- Clean incomplete or incorrect data.
- Explore the dataset and extract useful metrics.
- Answer key business questions using SQL queries.
Database Creation
CREATE DATABASE retail_data_db;Table Creation
CREATE TABLE retail_records (
    transactions_id INT,
    record_date DATE,
    record_time TIME,
    customer_id INT,
    gender VARCHAR(15),
    age INT,
    category VARCHAR(15),
    quantity INT,
    price_per_unit FLOAT,
    cogs FLOAT,
    total_amount FLOAT
);Check Total Records
SELECT COUNT(*) FROM retail_records;Unique Customers
SELECT COUNT(DISTINCT customer_id) FROM retail_records;Unique Categories
SELECT DISTINCT category FROM retail_records;Check for Missing Data
SELECT *
FROM retail_records
WHERE
    transactions_id IS NULL
    OR record_date IS NULL
    OR record_time IS NULL
    OR gender IS NULL
    OR category IS NULL
    OR quantity IS NULL
    OR cogs IS NULL
    OR total_amount IS NULL;Delete Missing Records
DELETE FROM retail_records
WHERE
    transactions_id IS NULL
    OR record_date IS NULL
    OR record_time IS NULL
    OR gender IS NULL
    OR category IS NULL
    OR quantity IS NULL
    OR cogs IS NULL
    OR total_amount IS NULL;SELECT *
FROM retail_records
WHERE record_date = '2022-11-05';SELECT *
FROM retail_records
WHERE 
    category = 'Clothing'
    AND TO_CHAR(record_date, 'YYYY-MM') = '2022-11'
    AND quantity >= 4;SELECT 
    category,
    SUM(total_amount) AS net_amount,
    COUNT(*) AS total_orders
FROM retail_records
GROUP BY 1;SELECT
    ROUND(AVG(age), 2) AS avg_age
FROM retail_records
WHERE category = 'Beauty';SELECT *
FROM retail_records
WHERE total_amount > 1000;SELECT 
    category,
    gender,
    COUNT(*) AS total_count
FROM retail_records
GROUP BY category, gender
ORDER BY 1;SELECT 
       year,
       month,
       avg_amount
FROM 
(    
SELECT 
    EXTRACT(YEAR FROM record_date) AS year,
    EXTRACT(MONTH FROM record_date) AS month,
    AVG(total_amount) AS avg_amount,
    RANK() OVER(PARTITION BY EXTRACT(YEAR FROM record_date) ORDER BY AVG(total_amount) DESC) AS rank
FROM retail_records
GROUP BY 1, 2
) AS t1
WHERE rank = 1;SELECT 
    customer_id,
    SUM(total_amount) AS total_value
FROM retail_records
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;SELECT 
    category,    
    COUNT(DISTINCT customer_id) AS unique_customers
FROM retail_records
GROUP BY category;WITH hourly_records AS (
    SELECT *,
        CASE
            WHEN EXTRACT(HOUR FROM record_time) < 12 THEN 'Morning'
            WHEN EXTRACT(HOUR FROM record_time) BETWEEN 12 AND 17 THEN 'Afternoon'
            ELSE 'Evening'
        END AS shift
    FROM retail_records
)
SELECT 
    shift,
    COUNT(*) AS total_orders
FROM hourly_records
GROUP BY shift;- Customer Demographics: Includes different age groups and categories like Clothing, Beauty, etc.
- High-Value Transactions: Many customers made premium purchases (above 1000).
- Monthly Trends: Shows clear month-to-month variations in transaction amounts.
- Top Customers: Easy to identify top spenders and loyal buyers.
- Summary Report: Overall transactions, demographics, and category-wise performance.
- Trend Report: Monthly and shift-based analysis.
- Customer Insights: Identifies unique and high-value customers.
This project is a simple but powerful way to practice SQL for real-world analytics. It walks through the whole process — from setting up a database to cleaning and analyzing data — using practical queries that mimic real business use cases.