# Rank Based Window Functions

### Introduction

Now so far, we have seen how to create running totals with window functions.  And we did so by using the `ORDER BY` statement in our window function.

```SQL
SELECT date, store_nbr, transactions,
SUM(transactions) OVER (PARTITION BY date ORDER BY transactions DESC) as running_total
FROM store_transactions 
```

In this lesson, we'll see that SQL has various rank-based functions that calculate an item's rank within a window.

### Loading our data

Let's again use the data from the [favorita kaggle competition](https://www.kaggle.com/c/favorita-grocery-sales-forecasting/data).

We begin by reading this data from a csv file.

In [1]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-eng-10-21/window-functions/main/favorita_transactions.csv"
df = pd.read_csv(url)
df[:2]

Unnamed: 0,id,date,store_nbr,transactions
0,0,2013-01-01,25,770
1,1,2013-01-02,1,2111


And then we can load this data into our database.

In [2]:
import sqlite3
conn = sqlite3.connect('grocery.db')

In [3]:
df.to_sql('store_transactions', conn, index = False, if_exists = 'replace')

### Rank based functions

Now, we previously saw how we can calculate a running total.

In [7]:
query = """SELECT date, store_nbr, transactions,
SUM(transactions) OVER (PARTITION BY store_nbr ORDER BY transactions DESC) as running_total
FROM store_transactions
LIMIT 4"""

pd.read_sql(query, conn)

Unnamed: 0,date,store_nbr,transactions,running_total
0,2016-12-23,1,3023,3023
1,2014-12-23,1,2861,5884
2,2013-12-23,1,2848,8732
3,2013-12-24,1,2844,11576


Now sometimes, we may always to see how well this day performed for a particular store.  We can do so with the rank function.

In [12]:
query = """SELECT date, store_nbr, transactions, 
RANK() OVER (PARTITION BY store_nbr ORDER BY transactions DESC) as rank
FROM store_transactions
LIMIT 4"""

pd.read_sql(query, conn)

Unnamed: 0,date,store_nbr,transactions,rank
0,2016-12-23,1,3023,1
1,2014-12-23,1,2861,2
2,2013-12-23,1,2848,3
3,2013-12-24,1,2844,4


So by placing a rank before the over command, SQL displays the rank across the specified partition, in the specified order.  The rank restarts at one with each partition.

Similar to rank, we can also use the NTILE function, to calculate the percentile of any given row within the window.  Let's see how this works.

In [31]:
query = """SELECT date, store_nbr, transactions, 
NTILE(100) OVER (PARTITION BY date ORDER BY transactions DESC) as percentile
FROM store_transactions
LIMIT 8"""

pd.read_sql(query, conn)

Unnamed: 0,date,store_nbr,transactions,percentile
0,2013-01-01,25,770,1
1,2013-01-02,46,4886,1
2,2013-01-02,44,4821,2
3,2013-01-02,45,4208,3
4,2013-01-02,47,4161,4
5,2013-01-02,11,3547,5
6,2013-01-02,3,3487,6
7,2013-01-02,48,3397,7


So now we can see that the second row's store had the top 1 percentile of transactions, and second is in the 2 percentile.  Notice that when using the NTILE function we need to specify how we are dividing our data. So above, we are calculating the percentile of each row.

```sql
NTILE(100) OVER (PARTITION BY date ORDER BY transactions DESC) as percentile
```

But we can also divide our grouping by 20.

In [37]:
query = """SELECT date, store_nbr, transactions, 
NTILE(20) OVER (PARTITION BY date ORDER BY transactions DESC) as by_twenty
FROM store_transactions
LIMIT 8"""

pd.read_sql(query, conn)

Unnamed: 0,date,store_nbr,transactions,by_twenty
0,2013-01-01,25,770,1
1,2013-01-02,46,4886,1
2,2013-01-02,44,4821,1
3,2013-01-02,45,4208,1
4,2013-01-02,47,4161,2
5,2013-01-02,11,3547,2
6,2013-01-02,3,3487,2
7,2013-01-02,48,3397,3


So this time, everything with the number one is in the top five percentile, and everything with the number two is between the 5 and 10 percentile. 

### Row Number vs Rank vs Dense Rank

Finally, there are some slight differences between row_number, rank, and dense_rank.  This [stackoverflow post](https://stackoverflow.com/questions/11183572/whats-the-difference-between-rank-and-dense-rank-functions-in-oracle) explains the differences.

<img src="./stack-overflow-rank.png" width="80%">

### Summary

We'll practice working with some other rank based functions in the following lab.

### Resources

[Mode Window Functions - Row Number](https://mode.com/sql-tutorial/sql-window-functions/#row_number)

[SQL rank, denserank, rownumber](https://blog.jooq.org/sql-trick-row_number-is-to-select-what-dense_rank-is-to-select-distinct/)