
# Common analytical questions and SQL templates for answering them

## Finding n-th event in a series of events with Window functions

* Many user interactions are stored as events (e.g., impressions, clicks, checkouts, cab called, cab boarded, cab dismounted, etc.)

* Analytical questions involve identifying one or more of such events and associating it with a past event. 

* For example, if a customer purchases a product, how did the user land on the product page (google, ads, Bing, etc.) (aka attribution)?

[ref: utm](https://blog.hubspot.com/customers/understanding-basics-utm-parameters)




## Find n-th click in a series of user clicks 

* Assume we have a `clickstream` table with user_id and the time they clicked on our web page. We can use ranking functions to pick the user's 3rd (or any n-th) click.

* n-th event is a series of events that is beneficial in
	* Marketing attribution
	* Debugging issues with late-arriving data



In [2]:
%%sql
WITH clickstream AS (
    SELECT
        1 AS user_id, '2024-07-01 10:00:00' AS click_time UNION ALL
    SELECT
        1 AS user_id, '2024-07-01 10:05:00' AS click_time UNION ALL
    SELECT
        1 AS user_id, '2024-07-01 10:10:00' AS click_time UNION ALL
    SELECT
        2 AS user_id, '2024-07-01 10:15:00' AS click_time UNION ALL
    SELECT
        2 AS user_id, '2024-07-01 10:20:00' AS click_time UNION ALL
    SELECT
        2 AS user_id, '2024-07-01 10:25:00' AS click_time
),
ranked_clicks AS (
    SELECT
        user_id,
        click_time,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY click_time) AS click_rank
    FROM
        clickstream
)
SELECT
    user_id,
    click_time,
    click_rank
FROM
    ranked_clicks
WHERE
    click_rank = 3;


UsageError: Cell magic `%%sql` not found.



* This pattern(ROW_NUMBER + ORDER BY unique key) can also remove duplicate rows. 

* Note: some DBS support drop duplicate function

## Converting row values into individual columns (aka PIVOT)

* Commonly used for easy visual summarization

* Used extensively by business folks to inspect value distributions

![](./pivot.png)

## Use GROUP BY + CASE WHEN to replicate PIVOT in SQL

* Pivots take values in rows and convert them into columns.

* We can create this logic in SQL with a CASE WHEN inside a GROUP BY 

* Only columns with a low number of unique values (aka low cardinality) are pivoted.

* Convert `orderpriority` column values into individual columns and calculate monthly revenue.


In [3]:
%%sql
SELECT strftime(o_orderdate, '%Y-%m') AS ordermonth,
       ROUND(AVG(CASE
                     WHEN o_orderpriority = '1-URGENT' THEN o_totalprice
                     ELSE NULL
                 END), 2) AS urgent_order_avg_price,
       ROUND(AVG(CASE
                     WHEN o_orderpriority = '2-HIGH' THEN o_totalprice
                     ELSE NULL
                 END), 2) AS high_order_avg_price,
       ROUND(AVG(CASE
                     WHEN o_orderpriority = '3-MEDIUM' THEN o_totalprice
                     ELSE NULL
                 END), 2) AS medium_order_avg_price,
       ROUND(AVG(CASE
                     WHEN o_orderpriority = '4-NOT SPECIFIED' THEN o_totalprice
                     ELSE NULL
                 END), 2) AS not_specified_order_avg_price,
       ROUND(AVG(CASE
                     WHEN o_orderpriority = '5-LOW' THEN o_totalprice
                     ELSE NULL
                 END), 2) AS low_order_avg_price
FROM orders
GROUP BY strftime(o_orderdate, '%Y-%m');



UsageError: Cell magic `%%sql` not found.


Some DBs support PIVOT


In [4]:
%%sql
PIVOT
  (SELECT *,
          strftime(o_orderdate, '%Y-%m') AS order_month
   FROM orders) ON o_orderpriority USING AVG(o_totalprice)
GROUP BY order_month
LIMIT 10;

UsageError: Cell magic `%%sql` not found.
