<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/3_Windows_Functions/3_Ranking.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Ranking Functions

## Overview

### 🥅 Analysis Goals

Explore user-level metrics to understand customer value, order quantity, and cohort rankings.

- **Average LTV by Customer:** Aggregate total revenue per customer and calculate the average lifetime value for each customer. Provides insights into the long-term value of users and their contributions to the business.
- **Rank Customers by Order Quantity:** Calculate and rank customers based on their total order quantity. Helps identify the most engaged customers and their purchasing behavior.
- **Monthly Ranking of Orders by Cohort:** Rank monthly orders within each cohort to evaluate trends and performance over time. Offers insights into order distribution and growth patterns across cohorts. 

### 📘 Concepts Covered

- `ORDER BY`
- Ranking
    - `ROW_NUMBER`
    - `RANK`
    - `DENSE_RANK`

In [2]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## ORDER BY

### 📝 Notes

`ORDER BY`

- **ORDER BY**: Orders rows within each partition for the function.
- `ORDER BY` can be ordered in either `DESC` or `ASC` order.
- Syntax
    ```sql
    SELECT
        window_function() OVER (
            PARTITION BY partition_expression
            ORDER BY column_name --DESC or ASC
        ) AS window_column_alias
    FROM table_name;
    ```

#### Importance of `ORDER BY` in a Window Function  

1. Controls Row Processing Order 🏛️
    - Defines how rows are sequentially evaluated within their partition.
    - Essential for functions like cumulative sums, moving averages, and rankings.  
2. Required for Certain Window Functions 🪟
    - Functions like `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, and `LAG()/LEAD()` require an `ORDER BY` inside `OVER()` to determine sequence.  
3. Affects Aggregation Window 📊
    - For cumulative functions (`SUM()`, `AVG()`, etc.), it determines how values are accumulated row by row.

#### Running Total of Orders for Customers

In [None]:
%%sql

SELECT 
    customerkey,
    orderdate,
    (quantity * netprice * exchangerate) AS net_revenue,
    COUNT(*) OVER (
        PARTITION BY customerkey 
        ORDER BY orderdate
    ) AS running_order_count
FROM sales
LIMIT 10

#### Running Average Net Revenue for Customer

In [19]:
%%sql

SELECT 
    customerkey,
    orderdate,
    (quantity * netprice * exchangerate) AS net_revenue,
    AVG(quantity * netprice * exchangerate) OVER (
        PARTITION BY customerkey 
        ORDER BY orderdate
    ) AS running_avg_value
FROM sales
LIMIT 10

Unnamed: 0,customerkey,orderdate,net_revenue,running_avg_value
0,15,2021-03-08,2217.41,2217.41
1,180,2018-07-28,525.31,525.31
2,180,2023-08-28,71.36,836.74
3,180,2023-08-28,1913.55,836.74
4,185,2019-06-01,1395.52,1395.52
5,243,2016-05-19,287.67,287.67
6,387,2018-12-21,45.62,592.64
7,387,2018-12-21,619.77,592.64
8,387,2018-12-21,1608.1,592.64
9,387,2018-12-21,97.05,592.64


#### Average LTV by Customer

**`ORDER BY`**

1. Get the average revenue by each customer (similar to the last query from our previous example, except the CTE `cohort_summary` becomes our main query).  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total revenue for each customer.  
        - Extract the cohort year using `EXTRACT(YEAR FROM MIN(orderdate))`.  
        - Calculate the total revenue for each customer with `SUM(quantity * netprice * exchangerate)`.  
        - Group by `customerkey` to ensure total revenue and cohort year are assigned to each customer.  
   - In the main query: 
        - 🔔 Use `AVG(total_customer_net_revenue) OVER (PARTITION BY cohort_year ORDER BY total_customer_net_revenue)` to calculate the average revenue per customer for each cohort and order by the customer's net revenue.  
        - Select `cohort_year`, `customerkey`, `total_customer_net_revenue` (rename this to `customer_ltv`) and the average total revenue for output.  

In [8]:
%%sql

WITH yearly_cohort AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        SUM(quantity * netprice * exchangerate) AS customer_ltv 
    FROM sales
    GROUP BY 
        customerkey
)

SELECT 
    cohort_year,
    customerkey,
    customer_ltv,
    AVG(customer_ltv) OVER (PARTITION BY cohort_year ORDER BY customer_ltv) AS avg_cohort_ltv --Updated
FROM yearly_cohort
-- ORDER BY
--     cohort_year,
--     customerkey
-- ORDER BY commented out

Unnamed: 0,cohort_year,customerkey,customer_ltv,avg_cohort_ltv
0,2015,4376,182.00,5271.59
1,2015,4403,9530.35,5271.59
2,2015,4925,6078.08,5271.59
3,2015,5729,192.16,5271.59
4,2015,6048,1903.89,5271.59
...,...,...,...,...
49482,2024,2093965,475.22,2037.55
49483,2024,2095129,156.00,2037.55
49484,2024,2095691,326.00,2037.55
49485,2024,2096470,535.78,2037.55


<img src="../Resources/images/3.3_customer_avg_ltv.png" alt="Continent" width="50%">

> ⚠️ **Chart Note**: This plots only 15 of our customers for better visualization.

---
## ROW_NUMBER

### 📝 Notes

`ROW_NUMBER`

- **ROW NUMBER**: Assigns a unique number to each row within a partition.
- Syntax:
    ```sql
    ROW_NUMBER() OVER(
         PARTITION BY partition_expression
         ORDER BY column_name
    ) AS window_column_alias
    ```

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Customer Order Rank: Unique position assigned to each customer based on their order metrics
  - Sequential Numbering: Process of assigning unique, consecutive numbers to orders or customers
  - Order Frequency: How often a customer makes purchases over a given time period
- **💡 Why It Matters**: Assigns unique identifiers to orders, enabling customer behavior analysis
- **🎯 Common Use Cases**: Customer order tracking, monthly cohort rankings
- **📈 Related KPIs**: Order frequency, customer engagement metrics


### 📈 Analysis

- Rank customers by their total amount of orders.
- Track customer ordering behavior over time, grouping by the year of first purchase (cohort year) and aggregating orders and unique users by month. 

#### Rank Customers Order Quantity

**`ROW_NUMBER`**

1. By customer, assign a rank to the total orders each customer made.  
   - Use `COUNT(orderkey)` to calculate the total number of orders for each customer.  
   - Group by `customerkey` to ensure the order count is calculated for each individual customer.  
   - Use `ROW_NUMBER() OVER (ORDER BY COUNT(orderkey) DESC)` to assign a unique rank to each customer based on their total orders, in descending order.  
   - Select `customerkey`, `total_orders`, and the rank (`row_number_rank`) in the output.  


In [80]:
%%sql
SELECT 
    customerkey,
    COUNT(orderkey) AS total_orders,
    ROW_NUMBER() OVER (ORDER BY COUNT(orderkey) DESC) AS row_number_rank
FROM sales
GROUP BY customerkey;


Unnamed: 0,customerkey,total_orders,row_number_rank
0,1834524,31,1
1,1375597,30,2
2,249557,27,3
3,1495941,26,4
4,459519,26,5
...,...,...,...
49482,1603362,1,49483
49483,618460,1,49484
49484,1313599,1,49485
49485,1842437,1,49486


#### Monthly Ranking of Orders by Cohort

**`ROW_NUMBER`**

1. Get the raw data to calculate metrics by `cohort_year`.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Include `customerkey`, `orderdate`, and `orderkey` in the SELECT statement to retain raw details for each order.  
   - Group by `customerkey`, `orderdate`, and `orderkey` to ensure all unique combinations of orders are represented in the output.  

In [None]:
%%sql 

SELECT
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
    customerkey,
    orderdate,
    orderkey
FROM sales
GROUP BY
    customerkey,
    orderdate,
    orderkey

2. Get the `cohort_year`, `order_month`, total orders, and the users in the cohort.  
   - 🔔 Use a subquery to calculate the `cohort_year` for each customer using `EXTRACT(YEAR FROM MIN(orderdate))`.  
        - Retain `customerkey`, `orderdate`, and `orderkey` in the subquery to provide the necessary raw data for further analysis.  
   - 🔔 In the main query, use `DATE_TRUNC('month', orderdate)` to group orders by month.  
        - Calculate the total number of unique orders using `COUNT(DISTINCT orderkey)`.  
        - Calculate the total number of unique customers in the cohort using `COUNT(DISTINCT customerkey)`.  
        - Group the results by `cohort_year` and `order_month` to summarize the data at the cohort and monthly level.  

In [115]:
%%sql 

SELECT
    cohort_year,
    DATE_TRUNC('month',orderdate) AS order_month,
    COUNT(DISTINCT orderkey) AS total_orders,
    COUNT(DISTINCT customerkey) AS user_count
FROM (
    -- Put query into subquery
    SELECT
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        orderdate,
        orderkey
    FROM sales
    GROUP BY
        customerkey,
        orderdate,
        orderkey
) cohort_analysis
GROUP BY 
    cohort_year, 
    order_month

Unnamed: 0,cohort_year,order_month,total_orders,user_count
0,2015,2015-01-01 00:00:00-08:00,200,200
1,2015,2015-02-01 00:00:00-08:00,292,291
2,2015,2015-03-01 00:00:00-08:00,139,139
3,2015,2015-04-01 00:00:00-07:00,78,78
4,2015,2015-05-01 00:00:00-07:00,236,236
...,...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,1500,1484
108,2024,2024-01-01 00:00:00-08:00,1353,1340
109,2024,2024-02-01 00:00:00-08:00,1749,1718
110,2024,2024-03-01 00:00:00-08:00,884,877


3. Put the query into a CTE and, in the main query, get the `cohort_year`, `order_month`, total orders, and user count.  
   - 🔔 Define a CTE `cohort_totals` to calculate the total orders and user count by cohort year and order month.  
        - In the CTE, use `DATE_TRUNC('month', orderdate)` to group orders by month.  
        - Calculate total orders with `COUNT(orderkey)` and the total number of unique users with `COUNT(DISTINCT customerkey)`.  
        - Group the CTE by `cohort_year` and `order_month` to summarize the data at the cohort and monthly level.  
   - 🔔 In the main query, select `cohort_year`, `order_month`, `total_orders`, and `user_count` from the CTE `cohort_totals`.  

In [116]:
%%sql

-- Put previous query into a CTE
WITH cohort_totals AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month',orderdate) AS order_month,
        COUNT(orderkey) AS total_orders,
        COUNT(DISTINCT customerkey) AS user_count
    FROM (
        SELECT
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
            customerkey,
            orderdate,
            orderkey
        FROM sales
        GROUP BY
            customerkey,
            orderdate,
            orderkey
    ) cohort_analysis
    GROUP BY cohort_year, order_month
)

-- Main query
SELECT
    cohort_year,
    order_month,
    total_orders,
    user_count
FROM cohort_totals;

Unnamed: 0,cohort_year,order_month,total_orders,user_count
0,2015,2015-01-01 00:00:00-08:00,200,200
1,2015,2015-02-01 00:00:00-08:00,292,291
2,2015,2015-03-01 00:00:00-08:00,139,139
3,2015,2015-04-01 00:00:00-07:00,78,78
4,2015,2015-05-01 00:00:00-07:00,236,236
...,...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,1500,1484
108,2024,2024-01-01 00:00:00-08:00,1353,1340
109,2024,2024-02-01 00:00:00-08:00,1749,1718
110,2024,2024-03-01 00:00:00-08:00,884,877


4. Assign a row number and order by the total orders to get the cohorts with the largest orders by order month.  
   - Define a CTE `cohort_totals` to calculate total orders and user count by cohort year and order month.  
        - Use `DATE_TRUNC('month', orderdate)` to group data by month in the CTE.  
        - Calculate `total_orders` using `COUNT(orderkey)` and `user_count` using `COUNT(DISTINCT customerkey)` in the CTE.  
        - Group the CTE results by `cohort_year` and `order_month` to summarize the data at the cohort and monthly level.  
   - In the main query:
        - 🔔 Assign a rank to each cohort-month combination using `ROW_NUMBER() OVER (ORDER BY total_orders DESC)`.  
        - Select `cohort_year`, `order_month`, `total_orders`, `user_count`, and `row_number_rank` to output ranked results based on total orders.  

In [119]:
%%sql

WITH cohort_totals AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month',orderdate) AS order_month,
        COUNT(orderkey) AS total_orders,
        COUNT(DISTINCT customerkey) AS user_count
    FROM (
        SELECT
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
            customerkey,
            orderdate,
            orderkey
        FROM sales
        GROUP BY
            customerkey,
            orderdate,
            orderkey
    ) cohort_analysis
    GROUP BY 
        cohort_year, 
        order_month
)
SELECT
    cohort_year,
    order_month,
    total_orders,
    user_count,
    ROW_NUMBER() OVER (ORDER BY total_orders DESC) AS row_number_rank -- Added
FROM cohort_totals;

Unnamed: 0,cohort_year,order_month,total_orders,user_count,row_number_rank
0,2022,2022-12-01 00:00:00-08:00,1987,1960,1
1,2023,2023-02-01 00:00:00-08:00,1979,1946,2
2,2022,2022-02-01 00:00:00-08:00,1887,1871,3
3,2022,2022-06-01 00:00:00-07:00,1773,1741,4
4,2022,2022-09-01 00:00:00-07:00,1755,1731,5
...,...,...,...,...,...
107,2020,2020-11-01 00:00:00-07:00,156,156,108
108,2015,2015-03-01 00:00:00-08:00,139,139,109
109,2016,2016-04-01 00:00:00-07:00,123,123,110
110,2017,2017-04-01 00:00:00-07:00,123,123,111


<img src="../Resources/images/3.3_monthly_orders_cohort.png" alt="Continent" width="50%">

---
## RANK

### 📝 Notes

`RANK`

- **RANK**: Assigns the same rank to rows with identical values but skips ranks after ties (e.g., 1, 2, 2, 4).
- Syntax:
    ```sql
    RANK() OVER(
         PARTITION BY partition_expression
         ORDER BY column_name
    ) AS window_column_alias
    ```

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Customer Order Ranking: Position of a customer based on their order volume or value
  - Tied Rankings: When multiple customers share the same rank due to identical metrics
  - Order Volume: Total number of orders placed by a customer
- **💡 Why It Matters**: Identifies high-volume customers while preserving tied rankings
- **🎯 Common Use Cases**: Customer segmentation, identifying top customers
- **📈 Related KPIs**: Customer order volume, tier distribution

### 📈 Analysis

- Rank customers by their total amount of orders.
- Track customer ordering behavior over time, grouping by the year of first purchase (cohort year) and aggregating orders and unique users by month.


#### Rank Customers Order Quantity

**`RANK`**

1. By customer, assign a rank to the total orders each customer made (from the previous example use `RANK` instead).  
   - Use `COUNT(orderkey)` to calculate the total number of orders for each customer.  
   - Group by `customerkey` to ensure the order count is calculated for each individual customer.  
   - 🔔 Use `RANK() OVER (ORDER BY COUNT(orderkey) DESC)` to assign a unique rank to each customer based on their total orders, in descending order.  
   - Select `customerkey`, `total_orders`, and the rank (`rank_rank`) in the output.  

In [82]:
%%sql
SELECT 
    customerkey,
    COUNT(orderkey) AS total_orders,
    RANK() OVER (ORDER BY COUNT(orderkey) DESC) AS rank_rank
FROM sales
GROUP BY customerkey;


Unnamed: 0,customerkey,total_orders,rank_rank
0,1834524,31,1
1,1375597,30,2
2,249557,27,3
3,1495941,26,4
4,459519,26,4
...,...,...,...
49482,1603362,1,39985
49483,618460,1,39985
49484,1313599,1,39985
49485,1842437,1,39985


#### Monthly Ranking of Orders by Cohort

**`RANK`**

1. Use the same query as before but use `RANK` to assign a rank to the cohort's total orders.
   - Define a CTE `cohort_totals` to calculate total orders and user count by cohort year and order month.  
        - Use `DATE_TRUNC('month', orderdate)` to group data by month in the CTE.  
        - Calculate `total_orders` using `COUNT(orderkey)` and `user_count` using `COUNT(DISTINCT customerkey)` in the CTE.  
        - Group the CTE results by `cohort_year` and `order_month` to summarize the data at the cohort and monthly level.  
   - In the main query: 
        - 🔔 Assign a rank to each cohort-month combination using `RANK() OVER (ORDER BY total_orders DESC)`.  
        - Select `cohort_year`, `order_month`, `total_orders`, `user_count`, and `rank_rank` to output ranked results based on total orders.  

In [121]:
%%sql

WITH cohort_totals AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month',orderdate) AS order_month,
        COUNT(orderkey) AS total_orders,
        COUNT(DISTINCT customerkey) AS user_count
    FROM (
        SELECT
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
            customerkey,
            orderdate,
            orderkey
        FROM sales
        GROUP BY
            customerkey,
            orderdate,
            orderkey
    ) cohort_analysis
    GROUP BY cohort_year, order_month
)

SELECT
    cohort_year,
    order_month,
    total_orders,
    user_count,
    RANK() OVER (ORDER BY total_orders DESC) AS rank_rank -- Updated
FROM cohort_totals;

Unnamed: 0,cohort_year,order_month,total_orders,user_count,rank_rank
0,2022,2022-12-01 00:00:00-08:00,1987,1960,1
1,2023,2023-02-01 00:00:00-08:00,1979,1946,2
2,2022,2022-02-01 00:00:00-08:00,1887,1871,3
3,2022,2022-06-01 00:00:00-07:00,1773,1741,4
4,2022,2022-09-01 00:00:00-07:00,1755,1731,5
...,...,...,...,...,...
107,2020,2020-11-01 00:00:00-07:00,156,156,108
108,2015,2015-03-01 00:00:00-08:00,139,139,109
109,2016,2016-04-01 00:00:00-07:00,123,123,110
110,2017,2017-04-01 00:00:00-07:00,123,123,110


---
## DENSE RANK

### 📝 Notes

`DENSE_RANK`

- **DENSE_RANK**: Similar to RANK(), it assigns the same rank to rows with identical values but does not skip ranks after ties (e.g., 1, 2, 2, 3).
- Syntax:
    ```sql
    DENSE_RANK() OVER(
         PARTITION BY partition_expression
         ORDER BY column_name
    ) AS window_column_alias
    ```

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Continuous Customer Ranking: Ranking system without gaps, even when ties exist
  - Order Volume Tiers: Groupings of customers based on their order quantities
  - Customer Segmentation: Process of dividing customers into groups based on similar characteristics
- **💡 Why It Matters**: Creates consecutive rankings for customer segmentation without gaps
- **🎯 Common Use Cases**: Customer tiering, continuous rank analysis
- **📈 Related KPIs**: Customer tier metrics, order volume distribution

### 📈 Analysis

- Rank customers by their total amount of orders.
- Track customer ordering behavior over time, grouping by the year of first purchase (cohort year) and aggregating orders and unique users by month.



#### Rank Customers Order Quantity

**`DENSE_RANK`**

1. By customer, assign a rank to the total orders each customer made (from the previous example use `DENSE_RANK` instead).  
   - Use `COUNT(orderkey)` to calculate the total number of orders for each customer.  
   - Group by `customerkey` to ensure the order count is calculated for each individual customer.  
   - 🔔 Use `DENSE_RANK() OVER (ORDER BY COUNT(orderkey) DESC)` to assign a unique rank to each customer based on their total orders, in descending order.  
   - Select `customerkey`, `total_orders`, and the rank (`dense_rank`) in the output.  

In [120]:
%%sql
SELECT 
    customerkey,
    COUNT(orderkey) AS total_orders,
    DENSE_RANK() OVER (ORDER BY COUNT(orderkey) DESC) AS dense_rank
FROM sales
GROUP BY customerkey;


Unnamed: 0,customerkey,total_orders,dense_rank
0,1834524,31,1
1,1375597,30,2
2,249557,27,3
3,1495941,26,4
4,459519,26,4
...,...,...,...
49482,1603362,1,28
49483,618460,1,28
49484,1313599,1,28
49485,1842437,1,28


#### Monthly Ranking of Orders by Cohort

**`DENSE_RANK`**

1. Use the same query as before but use `DENSE_RANK` to assign a rank to the cohort's total orders.
   - Define a CTE `cohort_totals` to calculate total orders and user count by cohort year and order month.  
        - Use `DATE_TRUNC('month', orderdate)` to group data by month in the CTE.  
        - Calculate `total_orders` using `COUNT(orderkey)` and `user_count` using `COUNT(DISTINCT customerkey)` in the CTE.  
        - Group the CTE results by `cohort_year` and `order_month` to summarize the data at the cohort and monthly level.  
   - In the main query
        - 🔔 Assign a rank to each cohort-month combination using `DENSE_RANK() OVER (ORDER BY total_orders DESC)`.  
        - Select `cohort_year`, `order_month`, `total_orders`, `user_count`, and `dense_rank` to output ranked results based on total orders.  

In [118]:
%%sql

WITH cohort_totals AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month',orderdate) AS order_month,
        COUNT(orderkey) AS total_orders,
        COUNT(DISTINCT customerkey) AS user_count
    FROM (
        SELECT
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
            customerkey,
            orderdate,
            orderkey
        FROM sales
        GROUP BY
            customerkey,
            orderdate,
            orderkey
    ) cohort_analysis
    GROUP BY cohort_year, order_month
)

SELECT
    cohort_year,
    order_month,
    total_orders,
    user_count,
    DENSE_RANK() OVER (ORDER BY total_orders DESC) AS dense_rank -- Updated 
FROM cohort_totals;

Unnamed: 0,cohort_year,order_month,total_orders,user_count,dense_rank
0,2022,2022-12-01 00:00:00-08:00,1987,1960,1
1,2023,2023-02-01 00:00:00-08:00,1979,1946,2
2,2022,2022-02-01 00:00:00-08:00,1887,1871,3
3,2022,2022-06-01 00:00:00-07:00,1773,1741,4
4,2022,2022-09-01 00:00:00-07:00,1755,1731,5
...,...,...,...,...,...
107,2020,2020-11-01 00:00:00-07:00,156,156,102
108,2015,2015-03-01 00:00:00-08:00,139,139,103
109,2016,2016-04-01 00:00:00-07:00,123,123,104
110,2017,2017-04-01 00:00:00-07:00,123,123,104


### 💡 What's the difference between `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`

1. `ROW_NUMBER()` 
    - Even if two rows have the same value, they will get different, consecutive ranks.
    - Example: If three products have the same sales amount, they’ll be ranked 1, 2, and 3 in sequence.

    | Sales | ROW_NUMBER() |
    |-------|--------------|
    | 500   | 1            |
    | 500   | 2            |
    | 400   | 3            |
    | 300   | 4            |  
  

2. `RANK()`
    - Rows with identical values receive the same rank, and the next rank jumps to the next number in sequence.
    - Example: If three products have the same highest sales amount, they all get rank 1, and the next product will get rank 4.

    | Sales | ROW_NUMBER() |
    |-------|--------------|
    | 500   | 1            |
    | 500   | 1            |
    | 400   | 3            |
    | 300   | 4            |


3. `DENSE_RANK()`
    - Rows with identical values receive the same rank, and the next rank continues sequentially without gaps.
    - Example: If three products have the same highest sales amount, they all get rank 1, and the next product will get rank 2.

    | Sales | ROW_NUMBER() |
    |-------|--------------|
    | 500   | 1            |
    | 500   | 1            |
    | 400   | 2            |
    | 300   | 3            |

**Alternative note format**

- Same info as above but in a different format. 

| Function     | Description                                                                                    | Tie Handling                           | Example Sales Values (500, 500, 400, 300) |
|--------------|------------------------------------------------------------------------------------------------|----------------------------------------|-------------------------------------------------------|
| ROW_NUMBER() | Assigns a unique, sequential rank   to each row without regard for ties.                       | No ties; each row gets a unique   rank | 1, 2, 3, 4                                            |
| RANK()       | Assigns the same rank to   identical values but skips ranks after ties.                        | Same rank for ties; skips next   ranks | 1, 1, 3, 4                                            |
| DENSE_RANK() | Assigns the same rank to   identical values but continues sequentially without skipping ranks. | Same rank for ties; no skipped   ranks | 1, 1, 2, 3                                            |

Using our simple example before of getting the total orders per customer. 

In [111]:
%%sql
SELECT 
    customerkey,
    COUNT(DISTINCT orderkey) AS total_orders,
    ROW_NUMBER() OVER (ORDER BY COUNT(orderkey) DESC) AS row_number_rank,
    RANK() OVER (ORDER BY COUNT(orderkey) DESC) AS rank_rank,
    DENSE_RANK() OVER (ORDER BY COUNT(orderkey) DESC) AS dense_rank_rank
FROM sales
GROUP BY customerkey;


Unnamed: 0,customerkey,total_orders,row_number_rank,rank_rank,dense_rank_rank
0,1834524,8,1,1,1
1,1375597,7,2,2,2
2,249557,9,3,3,3
3,1495941,7,4,4,4
4,459519,6,5,4,4
...,...,...,...,...,...
49482,1407634,1,49483,39985,28
49483,1381,1,49484,39985,28
49484,1407353,1,49485,39985,28
49485,1747703,1,49486,39985,28
