[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/joycemsm/fashion-store-analysis/blob/main/sales_analysis_queries_SQL.ipynb)


In [28]:
import pandas as pd
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

In [30]:
products = pd.read_csv('/content/dataset_fashion_store_products.csv')
sales = pd.read_csv('/content/dataset_fashion_store_sales.csv')
sales_items = pd.read_csv('/content/dataset_fashion_store_salesitems.csv')
customers = pd.read_csv('/content/dataset_fashion_store_customers.csv')
stock = pd.read_csv('/content/dataset_fashion_store_stock.csv')
campaigns = pd.read_csv('/content/dataset_fashion_store_campaigns.csv')
channels = pd.read_csv('/content/dataset_fashion_store_channels.csv')

# **Fashion Store - Sales Analysis (SQL)**

This notebook presents business-driven SQL queries focused on the Sales Analysis of the Fashion Store database.
It answers key performance questions using structured SQL logic, suitable for both technical and business audiences.

### **Summary**

1. **Which products have low sales volume?**
2. **Which products are below average in sales?**
3. **Are there underperforming product categories?**
4. **Did any product show a sales drop recently?**
5. **Which products are in stock but haven’t sold recently?**


### ✅ **Which products have low sales volume?**

Goal:
Identify products that have sold the least in total units.  

Explanation:

*   Uses SUM(quantity) to calculate total units sold.

*   Orders results in ascending order to find low-performing products.

*   Brings product names using JOIN with the products table.

In [39]:
query = """
SELECT
    products.product_id,
    products.product_name,
    SUM(sales_items.quantity) AS total_sold
FROM
    sales_items
JOIN
    products ON sales_items.product_id = products.product_id
GROUP BY
    products.product_id, products.product_name
ORDER BY
    total_sold ASC
LIMIT 10;
"""
result = pysqldf(query)
result

Unnamed: 0,product_id,product_name,total_sold
0,26,Soft Crew Shoes,1
1,64,Soft Sleeveless Set,1
2,67,Modern Crew Tee,1
3,87,Classic Silk Dress,1
4,377,Tailored High-Waist Shoes,1
5,379,Essential Cotton Trousers,1
6,424,Elegant Sleeveless Dress,1
7,58,Modern Sleeveless Dress,2
8,130,Classic Ribbed Set,2
9,218,Polished Satin Tee,2


### ✅ **Which products are below average in sales?**

Goal:
Identify products whose total sales volume is lower than the global product average.

Explanation:

*   Uses a CTE to compute total units sold per product.

*   Filters products below the average sales volume.

*   Useful to identify broadly underperforming products.

In [38]:
query = """
WITH sales_product AS (
	SELECT
	products.product_id,
    products.product_name,
    SUM(sales_items.quantity) AS total_quantity_sold
FROM sales_items
JOIN products ON sales_items.product_id = products.product_id
GROUP BY products.product_id, products.product_name
)
SELECT *
FROM sales_product
WHERE total_quantity_sold < (
	SELECT AVG(total_quantity_sold)
    FROM sales_product
)
ORDER BY total_quantity_sold ASC;
"""
result = pysqldf(query)
result

Unnamed: 0,product_id,product_name,total_quantity_sold
0,26,Soft Crew Shoes,1
1,64,Soft Sleeveless Set,1
2,67,Modern Crew Tee,1
3,87,Classic Silk Dress,1
4,377,Tailored High-Waist Shoes,1
...,...,...,...
263,430,Bold Linen Set,13
264,442,Bold Ribbed Dress,13
265,463,Bold Boxy Dress,13
266,472,Tailored Satin Trousers,13


### ✅ **Are there underperforming product categories?**

Goal:
Highlight categories with total sales below average.

Explanation:

*   Aggregates sales by category.

*   Filters categories with sales below the average of all categories.

*   Helps marketing prioritize attention.

In [37]:
query = """
WITH sales_category AS (
    SELECT
        products.category,
        SUM(sales_items.quantity) AS total_sales
    FROM
        sales_items
    JOIN
        products ON sales_items.product_id = products.product_id
    GROUP BY
        products.category
)
SELECT *
FROM sales_category
WHERE total_sales < (
    SELECT AVG(total_sales) FROM sales_category
)
ORDER BY total_sales ASC;
"""
result = pysqldf(query)
result

Unnamed: 0,category,total_sales
0,Pants,1063


### ✅ **Did any product show a sales drop recently?**

Goal:
Compare month-over-month sales to detect declines.

Explanation:

*   Uses a CTE to calculate monthly sales.

*   LAG() compares current to previous month.

*   Filters for products where sales dropped month-to-month.

In [51]:
query = """
WITH monthly_sales AS (
    SELECT
        p.product_id,
        p.product_name,
        substr(s.sale_date, 1, 7) AS month,
        SUM(si.quantity) AS total_sold
    FROM sales_items si
    JOIN sales s ON si.sale_id = s.sale_id
    JOIN products p ON si.product_id = p.product_id
    GROUP BY p.product_id, p.product_name, month
),
sales_with_lag AS (
    SELECT
        product_id,
        product_name,
        month,
        total_sold,
        LAG(total_sold) OVER (PARTITION BY product_id ORDER BY month) AS previous_month_sold
    FROM monthly_sales
)
SELECT *
FROM sales_with_lag
WHERE previous_month_sold IS NOT NULL
  AND total_sold < previous_month_sold
ORDER BY product_name, month;

"""
result = pysqldf(query)
result

Unnamed: 0,product_id,product_name,month,total_sold,previous_month_sold
0,490,Bold Boxy Shoes,2025-05,12,14
1,445,Bold Cotton Set,2025-05,10,20
2,474,Bold Cotton Shoes,2025-05,7,9
3,452,Bold Crew Dress,2025-05,8,9
4,454,Bold Crew Set,2025-05,4,12
...,...,...,...,...,...
328,158,Vintage Sleeveless Dress,2025-05,2,3
329,201,Vintage Sleeveless Tee,2025-05,7,12
330,201,Vintage Sleeveless Tee,2025-06,3,7
331,215,Vintage Sleeveless Trousers,2025-06,1,7


### ✅ **Which products are in stock but haven’t sold recently?**

Goal:
Identify products with stock available but no sales in the last 90 days.

Explanation:

*   Filters stock for products with positive quantity.

*   Excludes those sold in the last 90 days using a subquery.

*   Useful to detect potentially dead inventory.

In [54]:
query = """
SELECT
    p.product_id,
    p.product_name,
    s.stock_quantity,
    s.country
FROM
    stock AS s
JOIN
    products AS p ON s.product_id = p.product_id
WHERE
    s.stock_quantity > 0
    AND p.product_id NOT IN (
        SELECT DISTINCT si.product_id
        FROM sales_items AS si
        JOIN sales AS sa ON si.sale_id = sa.sale_id
        WHERE sa.sale_date >= date('now', '-90 day')
    );
"""
result = pysqldf(query)
result

Unnamed: 0,product_id,product_name,stock_quantity,country
0,465,Bold Boxy Set,42,France
1,465,Bold Boxy Set,1,Germany


### **Conclusion**

This sales analysis provided actionable insights into the store’s product performance by answering business-critical questions through structured SQL.