# DSCI 504

## Week 7

### Assignment: Using PostgreSQL to Support Other Data Science Tasks

### Student Name: SEIF KUNGULIO

<u>How to complete this assignment:</u> This Notebook has two types of cells, text, and code. Cell marked with **_Do not mark in this cell_** are not to be edited. Other text cells that require answers or typing may be edited by double-clicking on the cell and editing. Clicking off of the cell saves the text. Code cells are where you will enter in your SQL code. The results of your query will be displayed immediately beneath your code cell and are interactive. For full credit, you must write, execute, and save your code in your Notebook. The last cell of the Notebook has instructions for saving the Notebook as an html file for submission.

### 7.1 Question: Find customers whose total spending exceeds the average spending (via subquery)

In [1]:
SELECT 
    c.cus_id,
    c.cus_first_name,
    c.cus_last_name,
    SUM(o.order_tot) AS total_spending
FROM dsci_504.customers c
JOIN dsci_504.orders o ON c.cus_id = o.cus_id
GROUP BY c.cus_id, c.cus_first_name, c.cus_last_name
HAVING SUM(o.order_tot) > (
    SELECT AVG(total_spending)
    FROM (
        SELECT 
            cus_id,
            SUM(order_tot) AS total_spending
        FROM dsci_504.orders
        GROUP BY cus_id
    ) AS customer_totals
)
ORDER BY total_spending DESC;

cus_id,cus_first_name,cus_last_name,total_spending
461,Moritz,Thomason,385962.07
250,Zachary,Williamson,371796.56
1504,Nadine,Slain,367295.38
2336,Samuel,Jones,366405.95
270,Jack,Tompsen,357923.24
1318,Robert,Banks,355514.41
1762,Airn,Smith,354030.49
131,Summer,Rice,346373.03
1414,Cooper,Presten,342214.43
1906,Jonathon,Dewey,341024.12


### 7.2 Question: Rank builds by their number of components. This is a spin on an earlier week's query using an advance query.

In [2]:
SELECT 
    b.build_id,
    b.build_name,
    COUNT(bc.comp_id) AS component_count,
    DENSE_RANK() OVER (ORDER BY COUNT(bc.comp_id) DESC) AS build_rank
FROM dsci_504.builds b
JOIN dsci_504.build_components bc ON b.build_id = bc.build_id
GROUP BY b.build_id, b.build_name;

build_id,build_name,component_count,build_rank
10,wild ride,11,1
73,rare,10,2
53,sojourn,10,2
37,pinnacle,10,2
17,trail two,10,2
15,max traction,10,2
49,bull,10,2
12,loam assault,10,2
61,apocalypse,9,3
77,north shore,9,3


### 7.3 Question: Compute a running total of monthly sales

In [3]:
SELECT 
    DATE_TRUNC('month', ord_date)::DATE AS month,
    SUM(order_tot) AS monthly_sales,
    SUM(SUM(order_tot)) OVER (ORDER BY DATE_TRUNC('month', ord_date)) AS running_total
FROM dsci_504.orders
GROUP BY month, ord_date
ORDER BY month;

month,monthly_sales,running_total
2000-01-01,4397.11,45124.56
2000-01-01,4797.81,45124.56
2000-01-01,5375.99,45124.56
2000-01-01,3888.56,45124.56
2000-01-01,7399.0,45124.56
2000-01-01,5745.59,45124.56
2000-01-01,5199.0,45124.56
2000-01-01,2199.0,45124.56
2000-01-01,4192.95,45124.56
2000-01-01,1929.55,45124.56


### 7.4 Question: Compare each month's sales to the previous month's sales (LAG)

In [4]:
WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', ord_date)::DATE AS month,
        SUM(order_tot) AS total_sales
    FROM dsci_504.orders
    GROUP BY month
)
SELECT 
    month,
    total_sales,
    LAG(total_sales) OVER (ORDER BY month) AS prev_month_sales,
    total_sales - LAG(total_sales) OVER (ORDER BY month) AS change_in_sales
FROM monthly_sales;

month,total_sales,prev_month_sales,change_in_sales
2000-01-01,45124.56,,
2000-02-01,49589.76,45124.56,4465.2
2000-03-01,73544.39,49589.76,23954.63
2000-04-01,45284.01,73544.39,-28260.38
2000-05-01,68553.65,45284.01,23269.64
2000-06-01,83630.97,68553.65,15077.32
2000-07-01,91006.28,83630.97,7375.31
2000-08-01,74073.48,91006.28,-16932.8
2000-09-01,69246.97,74073.48,-4826.51
2000-10-01,56867.85,69246.97,-12379.12


### 7.5 Question: Find the next month's return count (LEAD)

In [5]:
WITH monthly_returns AS (
    SELECT 
        DATE_TRUNC('month', return_date)::DATE AS month,
        COUNT(*) AS return_count
    FROM dsci_504.returns
    GROUP BY month
)
SELECT 
    month,
    return_count,
    LEAD(return_count) OVER (ORDER BY month) AS next_month_returns
FROM monthly_returns;

month,return_count,next_month_returns
2000-01-01,2,4.0
2000-02-01,4,7.0
2000-03-01,7,21.0
2000-04-01,21,19.0
2000-05-01,19,31.0
2000-06-01,31,32.0
2000-07-01,32,29.0
2000-08-01,29,34.0
2000-09-01,34,27.0
2000-10-01,27,26.0


### 7.6 Question: Pivot return\_reason counts by month using a combination of a CTE and filtered aggregation

In [6]:
WITH return_data AS (
    SELECT 
        DATE_TRUNC('month', r.return_date)::DATE AS month,
        ri.return_reason
    FROM dsci_504.returns r
    JOIN dsci_504.return_items ri ON r.rac_id = ri.rac_id
)
SELECT 
    month,
    COUNT(CASE WHEN return_reason SIMILAR TO '%Faulty%' THEN 1 END) AS faulty,
    COUNT(CASE WHEN return_reason SIMILAR TO '%Broken%' THEN 1 END) AS broken,
    COUNT(CASE WHEN return_reason SIMILAR TO '%Defective%' THEN 1 END) AS defective,
    COUNT(CASE WHEN return_reason SIMILAR TO '%Wrong%' THEN 1 END) AS wrong_item,
    COUNT(CASE WHEN return_reason SIMILAR TO '%Mind%' THEN 1 END) AS changed_mind
FROM return_data
GROUP BY month
ORDER BY month;

month,faulty,broken,defective,wrong_item,changed_mind
2000-01-01,0,0,0,0,0
2000-02-01,0,0,1,0,1
2000-03-01,0,1,1,0,2
2000-04-01,4,2,2,2,2
2000-05-01,1,1,4,2,4
2000-06-01,3,5,2,8,2
2000-07-01,2,2,2,3,5
2000-08-01,0,5,1,5,3
2000-09-01,6,1,7,4,3
2000-10-01,3,4,4,3,3


### 6.7 Question: Compute a 3-month running average of order totals

In [7]:
WITH monthly_orders AS (
    SELECT 
        DATE_TRUNC('month', ord_date)::DATE AS month,
        SUM(order_tot) AS total_order
    FROM dsci_504.orders
    GROUP BY month
)
SELECT 
    month,
    total_order,
    ROUND(AVG(total_order) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS three_month_avg
FROM monthly_orders;

month,total_order,three_month_avg
2000-01-01,45124.56,45124.56
2000-02-01,49589.76,47357.16
2000-03-01,73544.39,56086.24
2000-04-01,45284.01,56139.39
2000-05-01,68553.65,62460.68
2000-06-01,83630.97,65822.88
2000-07-01,91006.28,81063.63
2000-08-01,74073.48,82903.58
2000-09-01,69246.97,78108.91
2000-10-01,56867.85,66729.43


### 6.8 Question: Identify returns occurring outside of a 90-day window (Subquery)

In [8]:
SELECT 
    r.rac_id,
    r.ord_id,
    r.return_date,
    o.ord_date,
    (r.return_date - o.ord_date) AS days_between
FROM dsci_504.returns r
JOIN dsci_504.orders o ON r.ord_id = o.ord_id
WHERE r.return_date > o.ord_date + INTERVAL '90 days';

rac_id,ord_id,return_date,ord_date,days_between
1025,5978290,2006-11-20,2006-08-20,92
1028,5785967,2010-04-20,2010-01-03,107
1029,6316949,2006-07-19,2006-04-10,100
1032,4532837,2009-06-27,2009-03-16,103
1034,2685159,2019-07-21,2019-04-03,109
1035,4770294,2006-06-15,2005-12-25,172
1038,3224260,2017-10-27,2017-07-25,94
1040,2253672,2000-06-19,2000-02-22,118
1041,1564158,2011-02-16,2010-10-30,109
1045,1570448,2011-08-26,2011-04-23,125


### 6.9 Question: List the Top 3 Products per class by quantity sold (ROW\_NUMBER)

In [9]:
WITH product_sales AS (
    SELECT 
        p.prod_id,
        p.prod_class,
        p.prod_name,
        SUM(oi.quantity) AS total_quantity,
        ROW_NUMBER() OVER (
            PARTITION BY p.prod_class
            ORDER BY SUM(oi.quantity) DESC
        ) AS rank_within_class
    FROM dsci_504.products p
    JOIN dsci_504.order_items oi ON p.prod_id = oi.prod_id
    GROUP BY p.prod_id, p.prod_class, p.prod_name
)
SELECT *
FROM product_sales
WHERE rank_within_class <= 3
ORDER BY prod_class, rank_within_class;

prod_id,prod_class,prod_name,total_quantity,rank_within_class
53,Expert,E-Series,1019,1
16,Expert,Enduro,971,2
8,Expert,Megatower,970,3
52,Mid,Snabb,1007,1
64,Mid,RKT9,1000,2
49,Mid,AMD,975,3


### Run the blow cell using the Python kernel to save this Notebook as an html file for submission. Like other weeks, append the filename with your name and submit in Canvas.

In [1]:
!jupyter nbconvert --to html DSCI504_Wk7_Assignment.ipynb

[NbConvertApp] Converting notebook DSCI504_Wk7_Assignment.ipynb to html


[NbConvertApp] Writing 1043343 bytes to DSCI504_Wk7_Assignment.html
