This project is part of a Data Analyst Internship Task (Task 4 – SQL for Data Analysis).
The objective is to use SQL queries to clean, explore, and analyze an E-commerce retail sales dataset.
Repository: harshharsha17/SQL-for-Data-Analysis
Retail_sales_analysis.csv
→ Dataset used for analysis.task_4_sql_data_analysis_retail_sales.sql
→ SQL script containing all queries.README.md
→ Documentation explaining objectives, steps, and outcomes.- Query screenshots (
Query1.png
…Query10.png
) showing output of SQL queries.
- Database: PostgreSQL / MySQL / SQLite
- Queries: SQL (DML + DDL)
- Created database:
task_4
- Created table:
retail_sales
with columns:
transactions_id, sale_date, sale_time, customer_id, gender, age, category, quantity, price_per_unit, cogs, total_sale
.
- Checked for NULL values in critical fields.
- Deleted rows with missing values to ensure accuracy in analysis.
- Counted total number of sales.
- Found unique customers.
- Extracted distinct product categories (Electronics, Clothing, Beauty).
✅ Key SQL Queries answered:
- Sales made on a specific date (
2022-11-05
). - Clothing sales with quantity > 4 in November 2022.
- Total sales & order count per category.
- Average age of Beauty product customers.
- Transactions with sales > 1000.
- Transactions by gender per category.
- Best selling month (using
RANK
& window functions). - Top 5 customers based on sales.
- Unique customers per category.
- Orders distribution by shift (Morning, Afternoon, Evening).
- SQL Basics:
SELECT
,WHERE
,ORDER BY
,GROUP BY
- Aggregate Functions:
SUM
,AVG
,COUNT
,ROUND
DISTINCT
keywordCASE
statements for time-based analysis- Window Functions (
RANK
) - Subqueries
- Query Optimization (using indexes)
- Views for reusable queries
Q1: Sales on 2022-11-05
Q2: Clothing category with quantity > 4 in Nov-2022
Q3: Total sales per category
Q4: Average age of Beauty category customers
Q5: Transactions with total_sale > 1000
Q6: Transactions by gender per category
Q7: Best selling month per year
Q8: Top 5 customers based on total sales
Q9: Unique customers per category
Q10: Orders distribution by shift (Morning, Afternoon, Evening)
✔ Learned to clean and manipulate SQL datasets.
✔ Performed exploratory and analytical queries.
✔ Gained insights into customer behavior, sales trends, and category performance.
- Clone this repository:
git clone https://github.com/harshharsha17/SQL-for-Data-Analysis.git