# Window Functions

This is an in-depth guide on window functions' syntax, rules, and usage. It is mostly based on the great talks of Bruce Momjian, and is intended to be used as a reference when window functions are required.

## ToC
+ [Window Functions Definition](#Definition)
+ [Usage](#Usage)
+ [Window Syntax](#Window-Syntax)
+ [Window Specific Functions](#Window-Specific-Functions)
+ [Examples](#Examples)



## Definition

> In SQL a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row.

- **Each row remains a separate identity**. While aggregate functions cause rows to become grouped into a single output row, window functions do not.
- `GROUP BY` can be used with aggregate functions. Window functions can be used with aggregate, ranking, and analytics functions

## Usage

Window functions can be used to answer a wide range of analytical questions:

1. Ranking and Ordering:
    - What is the rank of each item within its category?
    - How do the sales of each product compare to others in the same period?

1. Running Totals and Aggregations:
    - What is the cumulative total of sales for each day?
    - How does the running average of a variable change over time
1. Comparisons and Percentiles:
    - What percentage of customers contribute to the top 80% of total revenue?
    - How does each employee's sales performance compare to their team average
1. Lead and Lag Analysis:
    - What is the difference in revenue between consecutive months?
    - Which products are gaining or losing market share over time
1. Time-based Analysis:
    - What is the rolling average of stock prices for the last 7 days?
    - How does the current value compare to the historical average at the same point in time
1. Cumulative Sums with Conditions:
    - What is the cumulative sales for each product, but reset the total for each new month?
    - How many consecutive days has a user been active, considering gaps of up to 2 days
1. Top-N Analysis:
    - What are the top 5 products with the highest sales in each category?
    - Who are the top-performing employees in each department
1. Window Frame Analysis:
    - What is the average temperature for each day, considering the previous 3 days?
    - Which customers had a consecutive streak of purchases lasting at least 5 days
1. Percent Change and Growth Rates:
    - What is the month-over-month growth rate of revenue?
    - How does the sales performance of each region change compared to the previous year
1. Distribution Analysis:
    - What is the percentile rank of each employee's sales performance within the company?
    - How does the distribution of product prices vary within each category
1. First and Last Values:
    - What is the first purchase date for each customer?
    - What is the latest status of each project
1. Handling Ties:
    - How to handle tied ranks when calculating percentiles or assigning ranks?
    - What is the strategy for breaking ties in ordering?

## Window Syntax


### Window Definition
+ Consists of three parts, all optional:
  - partition part: dividing the result set into smaller windows (partitions)
  - order by part
  - frame part:  defining a set of rows within a partition

``` postgresql
func(expression) { OVER | WINDOW alias AS } (
    [PARTITION BY ...]
    [ORDER BY ...]
    [
        { RANGE | ROWS | GROUPS }
        { frame_start | BETWEEN frame_start and frame_end } [ frame_exclusion ]
    ]
)
```
`func` can be one of the aggregate functions or one of the window-specific functions.

`frame_start` and `frame_end` can be:
- `UNBOUNDED PRECEDING`
- `offset PRECEDING`
- `CURRENT ROW`
- `offset FOLLOWING`
- `UNBOUNDED FOLLOWING`

`frame_exclusion` can be:
- `EXCLUDE CURRENT ROW`
- `EXCLUDE GROUP`
- `EXCLUDE TIES`
- `EXCLUDE NO OTHERS`

### The Default Window Clause
``` postgresql
OVER()
```
which is the same as:
``` postgresql
OVER(RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
```
+ The set is a single partition (No `PARTITION BY`)
+ The set is unordered (All rows are peers of `CURRENT ROW`)
+ All frame rows are processed (No `EXCLUDE` clause)
+ `CURRENT ROW` inclides all peers (`RANGE` mode, not `ROWS`)

### Terms

+ **Window**: The set of rows within the result set that the window function operates on. It defines the scope of the calculation performed by the window function. The window is determined by the combination of the `PARTITION BY` and `ORDER BY` clauses in the window function.
+ **Partition**: A partition is a way to divide the result set into smaller groups, where the window function will be independently applied to each partition. It is determined by the `PARTITION BY` clause.
+ **Window Frame**: The set of rows within a partition that is used by the window function. It is determined by the `ORDER BY` clause within the window function and specifies the range of rows relative to the current row that should be included in the calculation.
+ **`RANGE`** versus **`ROWS`**: Both are used to define the range of rows to include in the window frame, but they operate in slightly different ways:
    - `ROWS`: Specifies the number of physical offset of rows from the current row to include in the window frame.
    - `RANGE`: Specifies a range of values rather than a fixed number of physical rows. It is defined by the values in the order specified in the `ORDER BY` clause.
    - **Handling of ties**: `ROWS` is based on the physical position of rows, so it doesn't consider ties in the ordering column. `RANGE` is based on the values of the ordering column, so it considers ties and includes all rows with the same value.
    - **Handling of duplicate values**: `ROWS` operates on a fixed number of rows, irrespective of the values in the ordering column. `RANGE` considers the actual values and includes rows with the same value within the specified range.
+ **Peer Rows**: Rows that share the same values in the columns specified in the `ORDER BY` clause. 

## Window-Specific Functions

+ [`ROW_NUMBER`](#ROW_NUMBER)
+ [`LAG`](#LAG)
+ [`LEAD`](#LEAD)
+ [`FIRST_VALUE`](#FIRST_VALUE)
+ [`LAST_VALUE`](#LAST_VALUE)
+ [`NTH_VALUE`](#NTH_VALUE)
+ [`RANK`](#RANK)
+ [`DENSE_RANK`](#DENSE_RANK)
+ [`PERCENT_RANK`](#PERCENT_RANK)
+ [`CUME_DIST`](#CUME_DIST)
+ [`NTILE`](#NTILE)

### `ROW_NUMBER`

+ Assigns a sequential integer number to each row in the query's result set.
+ Takes no arguments and operates on partitions, not window frames.


#### Useful for
+ Assigning sequential numbers to a result set. Sometimes it is as simple as that.
+ Pagination: By assigning each row a sequential number, rows can be filtered by that value.
+ Finding the n-th highest value per group.
+ Finding the top-n values per group.
+ Finding duplicate rows: By partitioning over the attribute and then querying for rows with a row number > 1

#### Example: `ROW_NUMBER` over the entire set
``` sql
SELECT x, ROW_NUMBER() OVER w
FROM generate_1_to_5_x2
WINDOW w AS ();
```

```
 x | row_number
---+------------
 1 |          1
 1 |          2
 2 |          3
 2 |          4
 3 |          5
 3 |          6
 4 |          7
 4 |          8
 5 |          9
 5 |         10
```

#### Example: `ROW_NUMBER` over partitioned set
``` sql
SELECT x, ROW_NUMBER() OVER w
FROM generate_1_to_5_x2
WINDOW w AS (PARTITION BY x);
```

```
 x | row_number
---+------------
 1 |          1
 1 |          2
 2 |          1
 2 |          2
 3 |          1
 3 |          2
 4 |          1
 4 |          2
 5 |          1
 5 |          2
```

### `LAG`

+ Provides access to a row at a specified physical offset which comes before the current row.

#### Useful for
+ Calculating the difference between the current row and a previous one.

#### Example: `LAG`
``` sql
SELECT x, LAG(x, 1) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (ORDER BY x);
```

```
 x | row_number
---+------------
 1 |     (null)
 1 |          1
 2 |          1
 2 |          2
 3 |          2
 3 |          3
 4 |          3
 4 |          4
 5 |          4
 5 |          5
```


## Tutorial

This section mostly follows Bruce Momjian's [great presentation](https://momjian.us/main/writings/pgsql/window.pdf) explaining window functions.

### Tutorial Table #1

Generate a set containing the numbers from 1 to 10:

``` postgresql
SELECT * FROM generate_series(1, 10) AS f(x)
```

```
 x 
---
 1 
 2 
 3 
 4 
 5 
 6 
 7 
 8 
 9 
10 
```

### The Simplest Window Function

``` postgresql
SELECT x, SUM(x) OVER ()
FROM generate_series(1, 10) AS f(x);
```

```
 x | sum
---+-----
 1 |  55
 2 |  55
 3 |  55
 4 |  55
 5 |  55
 6 |  55
 7 |  55
 8 |  55
 9 |  55
10 |  55
```

### Two `OVER` Clauses

``` postgresql
SELECT x, COUNT(x) OVER (), SUM(x) OVER ()
FROM generate_series(1, 10) AS f(x);
```

```
 x | count | sum
---+-------+-----
 1 |  10   |  55
 2 |  10   |  55
 3 |  10   |  55
 4 |  10   |  55
 5 |  10   |  55
 6 |  10   |  55
 7 |  10   |  55
 8 |  10   |  55
 9 |  10   |  55
10 |  10   |  55
```

### Presented as a `WINDOW` Clause

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS ();
```

```
 x | count | sum
---+-------+-----
 1 |  10   |  55
 2 |  10   |  55
 3 |  10   |  55
 4 |  10   |  55
 5 |  10   |  55
 6 |  10   |  55
 7 |  10   |  55
 8 |  10   |  55
 9 |  10   |  55
10 |  10   |  55
```

This is the same as the following:
``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW);
```

If `ORDER BY` is not provided the default order is undefined. In this case, the concept of the current row becomes ambiguous, because there is no  specified order to determine which row is considered current. Therefore the window frame is essentially the entire partition defined by the `PARTITION BY` clause (or the entire result set if there's no `PARTITION BY` clause). The window function operates on all rows within the partition without considering any specific order.

### `ROWS` Instead of `RANGE`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW);
```

```
 x | count | sum

---+-------+-----
 1 |   1   |   1
 2 |   2   |   3
 3 |   3   |   6
 4 |   4   |  10
 5 |   5   |  15
 6 |   6   |  21
 7 |   7   |  28
 8 |   8   |  36
 9 |   9   |  45
10 |  10   |  55
```

The default will produce the same result, because the default end frame is `CURRENT ROW`:
``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ROWS UNBOUNDED PRECEDING);
```

### `RANGE` with `ORDER BY`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ORDER BY x);
```

```
 x | count | sum
---+-------+-----
 1 |   1   |   1
 2 |   2   |   3
 3 |   3   |   6
 4 |   4   |  10
 5 |   5   |  15
 6 |   6   |  21
 7 |   7   |  28
 8 |   8   |  36
 9 |   9   |  45
10 |  10   |  55
```

`CURRENT ROW` peers are rows with equal values for `ORDER BY` columns, or all partition rows if `ORDER BY` is not specified.

The above will produce the same result if the default frame is explicitly specified:
```postgresql
(ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
```

### `UNBOUNDED FOLLOWING`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING);
```

```
 x | count | sum
---+-------+-----
 1 |  10   |  55
 2 |   9   |  54
 3 |   8   |  52
 4 |   7   |  49
 5 |   6   |  45
 6 |   5   |  40
 7 |   4   |  34
 8 |   3   |  27
 9 |   2   |  19
10 |   1   |  10
```


### `PRECEDING`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW);
```

```
 x | count | sum
---+-------+-----
 1 |   1   |   1
 2 |   2   |   3
 3 |   2   |   5
 4 |   2   |   7
 5 |   2   |   9
 6 |   2   |  11
 7 |   2   |  13
 8 |   2   |  15
 9 |   2   |  17
10 |   2   |  19
```

 `PRECEDING` ignores nonexistent rows; they are not `NULL`s. In `RANGE` mode, offset `PRECEDING`/`FOLLOWING` includes peer groups with values plus/minus offset of the current peer's value.


### `FOLLOWING`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING);
```

```
 x | count | sum
---+-------+-----
 1 |   2   |   3
 2 |   2   |   5
 3 |   2   |   7
 4 |   2   |   9
 5 |   2   |  11
 6 |   2   |  13
 7 |   2   |  15
 8 |   2   |  17
 9 |   2   |  19
10 |   1   |  10
```

 `PRECEDING` ignores nonexistent rows; they are not `NULL`s. In `RANGE` mode, offset `PRECEDING`/`FOLLOWING` includes peer groups with values plus/minus offset of the current peer's value.

### `CURRENT ROW` Only

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_series(1, 10) AS f(x)
WINDOW w AS (ORDER BY x RANGE CURRENT ROW);
```

```
 x | count | sum
---+-------+-----
 1 |   1   |   1
 2 |   1   |   2
 3 |   1   |   3
 4 |   1   |   4
 5 |   1   |   5
 6 |   1   |   6
 7 |   1   |   7
 8 |   1   |   8
 9 |   1   |   9
10 |   1   |  10
```

Same as `(ROWS CURRENT ROW)`

### Tutorial Table #2

The next table contains duplicates.

``` postgresql
CREATE TABLE generate_1_to_5_x2 AS
    SELECT ceil(x/2.0) AS x
    FROM generate_series(1, 10) AS f(x);

SELECT * FROM generate_1_to_5_x2;
```

```
 x 
---
 1 
 1 
 2 
 2 
 3 
 3 
 4 
 4 
 5 
 5 
```

### Empty `WINDOW` is the Same

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS ()
```

```
 x | count | sum
---+-------+-----
 1 |    10 |  30
 1 |    10 |  30
 2 |    10 |  30
 2 |    10 |  30
 3 |    10 |  30
 3 |    10 |  30
 4 |    10 |  30
 4 |    10 |  30
 5 |    10 |  30
 5 |    10 |  30
```

### `RANGE` With Duplicates

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (ORDER BY x);
```

```
 x | count | sum
---+-------+-----
 1 |     2 |   2
 1 |     2 |   2
 2 |     4 |   6
 2 |     4 |   6
 3 |     6 |  12
 3 |     6 |  12
 4 |     8 |  20
 4 |     8 |  20
 5 |    10 |  30
 5 |    10 |  30
```

This is the same as
``` postgresql
(ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
```

Since the `CURRENT ROW` in `RANGE` mode means all peers having the same value, we get the result above.

### `RANGE` On `CURRENT ROW`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (ORDER BY x RANGE CURRENT ROW);
```

```
 x | count | sum
---+-------+-----
 1 |     2 |   2
 1 |     2 |   2
 2 |     2 |   4
 2 |     2 |   4
 3 |     2 |   6
 3 |     2 |   6
 4 |     2 |   8
 4 |     2 |   8
 5 |     2 |  10
 5 |     2 |  10
```

Range `CURRENT ROW TO CURRENT ROW` here actually means the range from the first row that has the same value to the last row that has the same value.

### `PARTITION BY`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (PARTITION BY x);
```

```
 x | count | sum
---+-------+-----
 1 |     2 |   2
 1 |     2 |   2
 2 |     2 |   4
 2 |     2 |   4
 3 |     2 |   6
 3 |     2 |   6
 4 |     2 |   8
 4 |     2 |   8
 5 |     2 |  10
 5 |     2 |  10
```

The results are the same as `RANGE CURRENT ROW` because the partition matches the window frame.


### `PARTITION BY` A Better Example

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (PARTITION BY x >= 3);
```

```
 x | count | sum
---+-------+-----
 1 |     4 |   6
 1 |     4 |   6
 2 |     4 |   6
 2 |     4 |   6
 3 |     6 |  24
 3 |     6 |  24
 4 |     6 |  24
 4 |     6 |  24
 5 |     6 |  24
 5 |     6 |  24
```



### `PARTITION BY` Plus `ORDER BY`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (PARTITION BY x >= 3 ORDER BY x);
```

```
 x | count | sum
---+-------+-----
 1 |     2 |   2
 1 |     2 |   2
 2 |     4 |   6
 2 |     4 |   6
 3 |     2 |   6
 3 |     2 |   6
 4 |     4 |  14
 4 |     4 |  14
 5 |     6 |  24
 5 |     6 |  24
```

Again, using `ORDER BY` enables `RANGE` mode to view rows with same values as peer rows, because the window clause is the same as the following:
``` postgresql
(PARTITION BY x >= 3 ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
```

### `ROWS`

``` postgresql
SELECT x, COUNT(x) OVER w, SUM(x) OVER w
FROM generate_1_to_5_x2
WINDOW w AS (PARTITION BY x >= 3 ORDER BY x
             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW);
```

```
 x | count | sum
---+-------+-----
 1 |     1 |   1
 1 |     2 |   2
 2 |     3 |   4
 2 |     4 |   6
 3 |     1 |   3
 3 |     2 |   6
 4 |     3 |  10
 4 |     4 |  14
 5 |     5 |  19
 5 |     6 |  24
```

## Examples

In [154]:
import sqlite3
from tabulate import tabulate

In [155]:
conn = sqlite3.connect('window.db')
cursor = conn.cursor()

### Example table overview

In [156]:
result = cursor.execute("select * from emp")
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3
1,Anthony,Marketing,2300
2,Detelina,Marketing,2200
3,Samuel,Marketing,1500
4,Toby,Operations,1750
5,Kris,Operations,1800
6,Daniel,Operations,1900
7,Maria,Operations,2000
8,Pratel,Operations,1850
9,Katya,Operations,1500
10,Simeon,Operations,1800


#### Each row remains a separate entity when using window functions

In [157]:
sql_query = """
SELECT name, salary, SUM(salary)
FROM emp
WHERE department = "HR"
GROUP BY department
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Anelia,2650,7800


In [158]:
sql_query = """
SELECT name, salary, SUM(salary) OVER ()
FROM emp
WHERE department = "HR"
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Anelia,2650,7800
Anelia,2650,7800
Mario,2500,7800


### Over


In [159]:
sql_query = """
SELECT 
    name, 
    salary, 
    SUM(salary) OVER ()
FROM emp
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Natalia,3500,53100
Boris,3200,53100
Tony,3100,53100
Tobias,3000,53100
Karen,2800,53100
Petko,2700,53100
Anelia,2650,53100
Anelia,2650,53100
Mario,2500,53100
John,2400,53100


#### As Percentage

In [160]:
sql_query = """
SELECT 
    name, 
    salary, 
    round(salary * 1.0 / SUM(salary) OVER () * 100, 2) AS pct
FROM emp
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Natalia,3500,6.59
Boris,3200,6.03
Tony,3100,5.84
Tobias,3000,5.65
Karen,2800,5.27
Petko,2700,5.08
Anelia,2650,4.99
Anelia,2650,4.99
Mario,2500,4.71
John,2400,4.52


this is the same as:

``` sql
SELECT name, salary, 
       round(salary * 1.0 / (SELECT SUM(salary) FROM emp)  * 100, 2) AS pct
FROM emp
ORDER BY salary DESC;
```

### Cumulative Totals Using `ORDER BY`

### Window `AVG`

In [161]:
sql_query = """
SELECT 
    name, 
    salary,
    SUM(salary) OVER (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM emp
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Natalia,3500,3500
Boris,3200,6700
Tony,3100,9800
Tobias,3000,12800
Karen,2800,15600
Petko,2700,18300
Anelia,2650,20950
Anelia,2650,23600
Mario,2500,26100
John,2400,28500


In [162]:
sql_query = """
SELECT 
    name, 
    salary,
    round(AVG(salary) OVER (), 2) AS  avg    
FROM emp
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Natalia,3500,2308.7
Boris,3200,2308.7
Tony,3100,2308.7
Tobias,3000,2308.7
Karen,2800,2308.7
Petko,2700,2308.7
Anelia,2650,2308.7
Anelia,2650,2308.7
Mario,2500,2308.7
John,2400,2308.7


### Difference Compared to Average

In [163]:
sql_query = """
SELECT 
    name, 
    salary,
    round(AVG(salary) OVER (), 2) AS avg, 
    round(salary - AVG(salary) OVER (), 2) AS diff_afv
FROM emp
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3
Natalia,3500,2308.7,1191.3
Boris,3200,2308.7,891.3
Tony,3100,2308.7,791.3
Tobias,3000,2308.7,691.3
Karen,2800,2308.7,491.3
Petko,2700,2308.7,391.3
Anelia,2650,2308.7,341.3
Anelia,2650,2308.7,341.3
Mario,2500,2308.7,191.3
John,2400,2308.7,91.3


### Difference Compared to the Next Value

In [164]:
sql_query = """
SELECT 
    name, 
    salary,
    salary - LEAD(salary, 1) OVER (ORDER BY salary DESC) AS diff_next
FROM emp
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2
Natalia,3500,300.0
Boris,3200,100.0
Tony,3100,100.0
Tobias,3000,200.0
Karen,2800,100.0
Petko,2700,50.0
Anelia,2650,0.0
Anelia,2650,150.0
Mario,2500,100.0
John,2400,100.0


### Percentage Difference Compared to the Lowest-Paid Employee

In [165]:
sql_query = """
SELECT 
    name, 
    salary,
    salary - LAST_VALUE(salary) OVER w AS more,
    round((salary - LAST_VALUE(salary) OVER w) * 1.0 / LAST_VALUE(salary) OVER w * 100) as pct_more
FROM emp
WINDOW w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3
Natalia,3500,2000,133
Boris,3200,1700,113
Tony,3100,1600,107
Tobias,3000,1500,100
Karen,2800,1300,87
Petko,2700,1200,80
Anelia,2650,1150,77
Anelia,2650,1150,77
Mario,2500,1000,67
John,2400,900,60


### `RANK` and `DENSE_RANK`

In [166]:
sql_query = """
SELECT 
    name, 
    salary,
    RANK() OVER s,
    DENSE_RANK() OVER s
FROM emp
WINDOW s AS (ORDER BY salary DESC)
ORDER BY salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3
Natalia,3500,1,1
Boris,3200,2,2
Tony,3100,3,3
Tobias,3000,4,4
Karen,2800,5,5
Petko,2700,6,6
Anelia,2650,7,7
Anelia,2650,7,7
Mario,2500,9,8
John,2400,10,9


### Average by Department

In [167]:
sql_query = """
SELECT 
    name, 
    salary,
    department,
    round(AVG(salary) OVER (PARTITION BY department), 2) AS avg,
    round(salary - AVG(salary) OVER (PARTITION BY department), 2) AS diff_avg
FROM emp
ORDER BY department, salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3,4
Tobias,3000,Finance,2900,100
Karen,2800,Finance,2900,-100
Anelia,2650,HR,2600,50
Anelia,2650,HR,2600,50
Mario,2500,HR,2600,-100
Petko,2700,IT,2275,425
John,2400,IT,2275,125
Dilyana,2200,IT,2275,-75
Ivan,1800,IT,2275,-475
Tony,3100,Legal,3100,0


### Compared to Next Salary in Department

In [168]:
sql_query = """
SELECT 
    name, 
    department,
    salary,
    salary - LEAD(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS diff_next
FROM emp
ORDER BY department, salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3
Tobias,Finance,3000,200.0
Karen,Finance,2800,
Anelia,HR,2650,0.0
Anelia,HR,2650,150.0
Mario,HR,2500,
Petko,IT,2700,300.0
John,IT,2400,200.0
Dilyana,IT,2200,400.0
Ivan,IT,1800,
Tony,Legal,3100,


### Departmental and Global Ranks

In [169]:
sql_query = """
SELECT 
    name, 
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    RANK() OVER (ORDER BY salary DESC) AS global_rank
FROM emp
ORDER BY department, salary DESC;
"""
result = cursor.execute(sql_query)
tabulate(result.fetchall(), tablefmt="html")

0,1,2,3,4
Tobias,Finance,3000,1,4
Karen,Finance,2800,2,5
Anelia,HR,2650,1,7
Anelia,HR,2650,1,7
Mario,HR,2500,3,9
Petko,IT,2700,1,6
John,IT,2400,2,10
Dilyana,IT,2200,3,12
Ivan,IT,1800,4,18
Tony,Legal,3100,1,3


In [170]:
sql_query = """
SELECT SUM(salary) FROM emp
"""
result = cursor.execute(sql_query)
result.fetchall()

[(53100,)]

In [171]:
conn.close()

Resources:
+ https://momjian.us/main/writings/pgsql/window.pdf
+ https://www.youtube.com/watch?v=D8Q4n6YXdpk
+ https://www.red-gate.com/simple-talk/databases/sql-server/learn/window-functions-in-sql-server/
+ https://www.red-gate.com/simple-talk/databases/sql-server/learn/window-functions-in-sql-server-part-2-the-frame/
+ https://www.postgresql.org/docs/current/tutorial-window.html