Skip to content

This project is a comprehensive SQL-based analysis of Amazon marketplace data. Using PostgreSQL and structured SQL queries, I explored various business metrics such as sales performance, customer behavior, inventory management, shipping delays, and more.

Notifications You must be signed in to change notification settings

farah-massuh/amazon_sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

43 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š Data-Driven Decisions: Amazon Marketplace Analysis with SQL

Amazon

๐Ÿง  Overview

This project is a comprehensive SQL-based analysis of Amazon marketplace data. Using PostgreSQL and structured SQL queries, I explored various business metrics such as sales performance, customer behavior, inventory management, shipping delays, and more.

๐Ÿ‘ฉโ€๐Ÿ’ป Author

Farah Massuh

๐ŸŽฏ Project Objectives

  • Identify high-performing products, sellers, and regions
  • Analyze customer lifetime value and buying trends
  • Track shipping and delivery performance
  • Generate profit margin and return rate insights
  • Automate inventory updates using a stored procedure

๐Ÿ“‚ Datasets Used

All data is derived from simulated Amazon marketplace records:

  • orders.csv
  • order_items.csv
  • products.csv
  • inventory.csv
  • customers.csv
  • sellers.csv
  • payments.csv
  • shipping.csv
  • category.csv

EDR for amazon_sql

๐Ÿ“ฅ Data Source

The dataset used in this project was sourced from Zero Analyst on Topmate.

Note: This dataset simulates real-world Amazon marketplace activity and is intended for learning and project-building purposes.

๐Ÿ› ๏ธ Tools & Technologies

  • SQL (PostgreSQL) โ€“ Core querying and analysis
  • VS Code โ€“ Development environment
  • ChatGPT โ€“ For query optimization and explanation

๐ŸŽฏ Project Objectives

  • Identify high-performing products, sellers, and regions
  • Analyze customer lifetime value and buying trends
  • Track shipping and delivery performance
  • Generate profit margin and return rate insights
  • Automate inventory updates using a stored procedure

๐Ÿ’ป How to Use

  1. Set up PostgreSQL on your machine.
  2. Import the CSV datasets into respective tables.
  3. Run the business_insights.sql file:
\i business_insights.sql

This file runs all queries and creates the stored procedure for inventory automation.

๐Ÿ“Š Business Questions, Queries & Insights

1. Top Selling Products

SELECT
    oi.product_id,
    p.product_name,
    ROUND(SUM(total_sale)) total_sale,
    COUNT(o.order_id) total_orders
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 10;

Top Selling Products

Insight: A small number of products contribute significantly to total revenue and order volume. These are key revenue drivers.

2. Revenue by Category

SELECT
    p.category_id,
    c.category_name,
    ROUND(SUM(oi.total_sale)) total_sale,
    ROUND((SUM(oi.total_sale::NUMERIC) * 100 / (SELECT SUM(total_sale::NUMERIC) FROM order_items)), 2) percentage_contribution
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
LEFT JOIN category c ON c.category_id = p.category_id
GROUP BY 1, 2
ORDER BY 3 DESC;

Revenue by Category

Insight: Electronics and related categories account for the highest revenue contribution across the platform.

3. Average Order Value (AOV)

SELECT
    co.customer_id,
    CONCAT(co.first_name, ' ', co.last_name)  full_name,
    COUNT(o.order_id) total_orders,
    ROUND(SUM(oi.total_sale::NUMERIC) / COUNT(o.order_id),2) average_order_value
FROM customers co
JOIN orders o ON o.customer_id = co.customer_id
JOIN order_items oi ON oi.order_id= o.order_id
GROUP BY 1
HAVING COUNT(o.order_id) > 5
ORDER BY 4 DESC;

Average Order Value (AOV)

Insight: Returning customers have significantly higher AOV, suggesting a loyal and valuable customer segment.

4. Monthly Sales Trend

WITH month_total_sales_past_year AS (
    SELECT
        EXTRACT(MONTH FROM o.order_date) month,
        EXTRACT(YEAR From o.order_date) year,
        ROUND(SUM(oi.total_sale::NUMERIC), 2) total_sale
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
    GROUP BY 1, 2
)
SELECT
    year,
    month,
    total_sale current_month_sale,
    LAG(total_sale, 1) OVER (ORDER BY year, month) last_month_sale
FROM month_total_sales_past_year;

Monthly Sales Trend

Insight: Sales trends fluctuate monthly with noticeable seasonal patterns. Comparing current and previous months helps detect growth or decline.

5. Customers with No Purchases

SELECT * 
FROM customers
WHERE customer_id NOT IN (
    SELECT DISTINCT customer_id
    FROM orders
);

Customers with No Purchases

Insight: There is a segment of registered users who have not placed any orders โ€” potential for re-engagement campaigns.

6. Best-Selling Categories by State

WITH ranked_sales AS (
    SELECT
        c.state,
        cat.category_name,
        SUM(oi.total_sale) total_sale,
        RANK() OVER(PARTITION BY c.state ORDER BY SUM(oi.total_sale) DESC) rank
    FROM orders o
    JOIN customers c ON o.customer_id = c.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    JOIN products p ON oi.product_id = p.product_id
    JOIN category cat ON cat.category_id = p.category_id
    GROUP BY 1, 2
)
SELECT 
    state, 
    category_name, 
    ROUND(total_sale::NUMERIC, 2) total_sale
FROM ranked_sales
WHERE rank = 1
ORDER BY 1, 3 DESC;

Best-Selling Categories by State

Insight: Different states show varied preferences, enabling region-specific marketing strategies.

7. Customer Lifetime Value (CLTV)

SELECT
    c.customer_id,
    CONCAT(c.first_name, ' ', c.last_name) full_name,
    ROUND(SUM(oi.total_sale::NUMERIC),2) CLTV,
    DENSE_RANK() OVER (ORDER BY SUM(oi.total_sale) DESC) customer_ranking
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY 1, 2;

Customer Lifetime Value (CLTV)

Insight: Top customers contribute significantly to revenue. Retaining them is crucial.

(Continued in next cell...)

8. Inventory Stock Alerts

SELECT
    i.inventory_id,
    p.product_name,
    i.stock current_stock_left,
    i.last_stock_date,
    i.warehouse_id
FROM inventory i
JOIN products p ON p.product_id = i.product_id
WHERE i.stock < 10;

Inventory Stock Alerts

Insight: Several high-demand products are at risk of going out of stock, requiring urgent restocking.

9. Shipping Delays

SELECT
    CONCAT(c.first_name, ' ', c.last_name) customer,
    o.*,
    s.shipping_providers,
    (s.shipping_date - o.order_date) days_to_deliver
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN shippings s ON o.order_id = s.order_id
WHERE s.shipping_date - o.order_date > 2;

Shipping Delays

Insight: Many orders experience shipping delays beyond 2 days, indicating potential issues with fulfillment partners.

10. Payment Success Rate

SELECT
    p.payment_status,
    COUNT(*) AS total_payments,
    ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER(), 2) percentage
FROM orders o
JOIN payments p ON o.order_id = p.order_id
GROUP BY 1
ORDER BY 3 DESC;

Payment Success Rate

Insight: While most payments succeed, a notable percentage are either pending or failed, suggesting a need for payment system improvements.

11. Top Performing Sellers

-- Summary from multiple steps
WITH top_sellers AS (
    SELECT
        s.seller_id,
        s.seller_name,
        SUM(oi.total_sale) total_sale
    FROM orders o
    JOIN sellers s ON o.seller_id = s.seller_id
    JOIN order_items oi ON oi.order_id = o.order_id
    GROUP BY 1, 2
    ORDER BY 3 DESC
    LIMIT 5
),
sellers_order_status AS (
    SELECT
        seller_id,
        order_status,
        COUNT(*) total_orders
    FROM orders
    GROUP BY 1, 2
),
sellers_report AS (
    SELECT
        ts.seller_id seller_id,
        ts.seller_name seller_name,
        sos.order_status order_status,
        sos.total_orders total_orders
    FROM sellers_order_status sos
    JOIN top_sellers ts ON ts.seller_id = sos.seller_id
    WHERE sos.order_status NOT IN ('Inprogress', 'Returned')
)
SELECT
    seller_id,
    seller_name,
    SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END) completed_orders,
    SUM(CASE WHEN order_status = 'Cancelled' THEN total_orders ELSE 0 END) failed_orders,
    SUM(total_orders) total_orders,
    ROUND(SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END)::NUMERIC / SUM(total_orders)::NUMERIC * 100, 2) successful_orders_percentage,
    ROUND(100 - (SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END)::NUMERIC / SUM(total_orders)::NUMERIC * 100), 2) failed_orders_percentage
FROM sellers_report
GROUP BY 1, 2;

Top Performing Sellers

Insight: Top sellers show high order completion rates, but a few face elevated cancellation ratesโ€”warranting performance reviews.

12. Product Profit Margin

SELECT
    product_id,
    product_name,
    profit_margin,
    DENSE_RANK () OVER(ORDER BY profit_margin DESC) product_ranking
FROM (
    SELECT
        p.product_id,
        p.product_name,
        SUM(total_sale - (cogs * quantity)) / SUM(total_sale) * 100 profit_margin
    FROM order_items oi
    JOIN products p ON oi.product_id = p.product_id
    GROUP BY 1, 2
);

Product Profit Margin

Insight: Certain products yield significantly higher margins, guiding profitability-focused inventory and pricing strategies.

13. Most Returned Products

SELECT
    *,
    ROUND(total_returned::NUMERIC / total_units_sold::NUMERIC * 100, 2) return_percentage
FROM (
    SELECT
        p.product_id,
        p.product_name,
        COUNT(*) total_units_sold,
        SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END) total_returned
    FROM order_items oi
    JOIN products p ON oi.product_id = p.product_id
    JOIN orders o ON o.order_id = oi.order_id
    GROUP BY 1, 2
)
ORDER BY return_percentage DESC
LIMIT 10;

Most Returned Products

Insight: Return rates vary by product. High-return items may need review for quality, expectations, or listing accuracy.

14. Identify Returning vs. New Customers

SELECT
    *,
    CASE WHEN total_returns > 5 THEN 'Returning' ELSE 'New' END customer_type
FROM (
    SELECT
        CONCAT(c.first_name, ' ', c.last_name) full_name,
        COUNT(o.order_id) total_orders,
        SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END) total_returns
    FROM orders o
    JOIN customers c ON c.customer_id = o.customer_id
    JOIN order_items oi ON oi.order_id = o.order_id
    GROUP BY 1
);

Identify Returning vs. New Customers

Insight: Customer behavior varies based on return activityโ€”this can be useful for segmentation and tailored engagement.

15. Top 5 Customers by Orders per State

SELECT *
FROM (
    SELECT
        c.state,
        CONCAT(c.first_name, ' ', c.last_name) customers,
        COUNT(o.order_id) total_orders,
        SUM(oi.total_sale) total_sale,
        DENSE_RANK() OVER(PARTITION BY c.state ORDER BY COUNT(o.order_id) DESC) rank
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    JOIN customers c ON c.customer_id = o.customer_id
    GROUP BY 1, 2
)
WHERE rank <= 5;

Top 5 Customers by Orders per State

Insight: Top customers in each state reveal regional power users that drive local revenue.

16. Revenue by Shipping Provider

SELECT
    s.shipping_providers,
    COUNT(o.order_id) orders_handled,
    ROUND(SUM(oi.total_sale)) total_sale,
    ROUND(COALESCE(AVG(s.return_date - s.shipping_date), 0)) average_days
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
JOIN shippings s ON s.order_id = o.order_id
GROUP BY 1;

Revenue by Shipping Provider

Insight: Shipping providers vary in speed and volume. Some are slower despite handling fewer orders.

17. Products with Revenue Decline (2023 vs 2024)

WITH last_year_sale AS (
    SELECT p.product_id, p.product_name, SUM(oi.total_sale) revenue
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    JOIN products p ON p.product_id = oi.product_id
    WHERE EXTRACT(YEAR FROM o.order_date) = 2023
    GROUP BY 1, 2
),
current_year_sale AS (
    SELECT p.product_id, p.product_name, SUM(oi.total_sale) revenue
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    JOIN products p ON p.product_id = oi.product_id
    WHERE EXTRACT(YEAR FROM o.order_date) = 2024
    GROUP BY 1, 2
)
SELECT
    cs.product_id,
    cs.product_name,
    ls.revenue last_year_revenue,
    cs.revenue current_year_revenue,
    ls.revenue - cs.revenue rev_difference,
    ROUND((cs.revenue - ls.revenue)::NUMERIC / ls.revenue::NUMERIC * 100, 2) revenue_decrease_ratio
FROM last_year_sale ls
JOIN current_year_sale cs ON ls.product_id = cs.product_id
WHERE ls.revenue > cs.revenue
ORDER BY 6 DESC
LIMIT 10;

Products with Revenue Decline (2023 vs 2024)

Insight: Revenue decline detection helps flag underperforming products for reevaluation in 2024.


๐Ÿ“˜ What I Learned

This project was a comprehensive learning experience in data analysis, SQL, and business intelligence. Here's what I learned:

  • How to write complex SQL queries using advanced features like CTEs, window functions, conditional logic, and subqueries.
  • How to structure a real-world data analysis project with clear objectives, questions, and measurable outcomes.
  • How to extract actionable business insights from raw datasets using SQL alone.
  • The importance of validating data relationships and using joins correctly to link multiple tables.
  • How to implement automation in SQL with stored procedures using PL/pgSQL.
  • Practical debugging and optimization of queries using PostgreSQL.
  • How to build stored procedures like add_sales that automate inventory and order management in a real-world scenario.
  • How to present technical findings clearly and professionally through documentation and structure.

๐Ÿ›’ Stored Procedure: add_sales

As part of this project, I created a PostgreSQL stored procedure named add_sales to simulate real-time product transactions and automate inventory updates.

๐Ÿ”„ How It Works

This procedure replicates the logic of placing an order on an e-commerce platform:

  1. It first checks the inventory to confirm that enough stock is available for the requested product.
  2. If the product is available:
    • A new entry is added to the orders table.
    • A related entry is created in the order_items table including the calculated total sale.
    • The inventory table is updated to reflect the reduced stock.
    • A confirmation notice is displayed with the product name.
  3. If the product is not in stock, a notice informs the user that the product is unavailable.

๐Ÿ“œ Procedure Code

CREATE OR REPLACE PROCEDURE add_sales (
    p_order_id INT,
    p_customer_id INT,
    p_seller_id INT,
    p_order_item_id INT,
    p_product_id INT,
    p_quantity INT
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_count INT;
    v_price FLOAT;
    v_product_name VARCHAR(50);
BEGIN
    SELECT price, product_name
    INTO v_price, v_product_name
    FROM products
    WHERE product_id = p_product_id;

    SELECT COUNT(*)
    INTO v_count
    FROM inventory
    WHERE product_id = p_product_id AND stock >= p_quantity;

    IF v_count > 0 THEN
        INSERT INTO orders (
            order_id, order_date, customer_id, seller_id
        )
        VALUES (
            p_order_id, CURRENT_DATE, p_customer_id, p_seller_id
        );

        INSERT INTO order_items (
            order_item_id, order_id, product_id, quantity, price_per_unit, total_sale
        )
        VALUES (
            p_order_item_id, p_order_id, p_product_id, p_quantity, v_price, v_price * p_quantity
        );

        UPDATE inventory
        SET stock = stock - p_quantity
        WHERE product_id = p_product_id;

        RAISE NOTICE 'Thank you, product "%" has been added and inventory updated.', v_product_name;
    ELSE
        RAISE NOTICE 'Product "%" is not available at the moment.', v_product_name;
    END IF;
END;
$$;

๐Ÿงช Example Usage

CALL add_sales(25001, 2, 5, 26001, 1, 10);
SELECT product_id, stock FROM inventory WHERE product_id = 1;

This procedure simulates a core part of e-commerce operations and demonstrates how SQL can be used not only for analysis but also for backend logic automation.

๐Ÿง  What I Learned

  • How to declare and use variables inside a stored procedure.
  • How to perform conditional logic and error handling in PL/pgSQL.
  • How to use transactions to automate business processes in a database environment.
  • How stored procedures can encapsulate complex logic and ensure consistency across related tables.

โœ… Conclusion

This SQL project demonstrates how data can inform strategic decisions in an e-commerce environment. By analyzing Amazon-style marketplace data, I was able to:

  • Uncover best-selling products and top-performing categories.
  • Analyze revenue trends, profit margins, and customer value.
  • Identify operational inefficiencies such as shipping delays and inventory shortages.
  • Use SQL not just for querying, but also for creating database functionality through a stored procedure.

From querying to automation with the add_sales procedure, this project showcases how data analysis and backend logic can bring clarity to complex business questions and optimize decision-making.

๐Ÿ“œ License

This project is licensed under the MIT License.

About

This project is a comprehensive SQL-based analysis of Amazon marketplace data. Using PostgreSQL and structured SQL queries, I explored various business metrics such as sales performance, customer behavior, inventory management, shipping delays, and more.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •