# Window Functions, Part 4: Partition By Order By

As you may remember, we've covered `PARTITION BY` a few wekes ago in our course, but we never used it with another feature: ORDER BY. 

Today, we'll use them together to create powerful statistics and show how you can use various functions independently for each group of rows.

Access tables via the fiddle here: https://www.db-fiddle.com/f/45nXwRXBKAwt9MsbCHiuxT/1

## Exercise
Select all the information from the store table.

Each store has its own id, country and city. 

There is only one store per city in our table to make things a bit easier. Apart from that, you can see when the store was opened and what rating it has (1-5), based on customers' opinions.

# Exercise
Select all the information from the sales table.

In this table, the sales results are gathered for each store from the period between August 1 and August 14 2016. 

You can find the id of the store, the date, and three important values: the total revenue on that day, the number of transactions and the number of customers who entered the store (but not necessarily bought anything).


# Scope of this section

Before we get down to real work, let's explain what we'll actually teach you today.

Earlier you learned what PARTITION BY is. 

It allows you to compute certain functions independently for groups of rows and still maintain their individual character.

Previously, we only used PARTITION BY with the aggregate functions which you had known before: AVG(), COUNT(), MAX(), MIN(), SUM(). 

None of these functions required the use of ORDER BY: the order of rows simply doesn't matter in this case.

However, we got to know new elements where the order does matter: ranking functions, window frames and analytical functions.

In this part, I'll teach you how to use PARTITION BY with these new elements. 

Each time, you will also need an ORDER BY clause – hence the name of the part: PARTITION BY ORDER BY. 

Remember to keep the order: PARTITION BY comes before ORDER BY, or it simply won't make any sense.

# PARTITION BY – refresher 1

Before we start writing queries with PARTITION BY ORDER BY, let's quickly revise queries with PARTITION BY alone. Take a look:

```
SELECT
  country,
  city,
  rating,
  AVG(rating) OVER(PARTITION BY country)
FROM store;
```

In the above query, we show the rating of each store plus the average rating calculated for the respective country. If we hadn't used PARTITION BY country, we would have ended up with an average across all stores. 

This way, we get separate average values for each country.


# Exercise

For each sales row, show the store_id, day, revenue on that day and the average revenue in that store.



```
SELECT
  store_id,
  day,
  revenue,
  AVG(revenue) OVER(PARTITION BY store_id)
FROM sales;
```

One more exercise and we move on to PARTITION BY ORDER BY.

# Exercise

For each sales row between August 1 and August 7, 2016, show the store_id, day, number of transactions, the total number of transactions on that day in any store and the ratio of the two last columns shown as percentage rounded to integer values.



```
SELECT
  store_id,
  day,
  transactions,
  SUM(transactions) OVER(PARTITION BY day),
  ROUND(transactions::numeric / SUM(transactions) OVER(PARTITION BY day)*100)
FROM sales
WHERE day BETWEEN '2016-08-01' AND '2016-08-07';
```

# RANK() with PARTITION BY ORDER BY

Ok! We'll introduce the features chronologically. 

Previously, you learned ranking functions. 

They are one of the places where you can apply PARTITION BY and ORDER BY together.

So far, all the rankings we calculated were performed for all the rows from the query result. With that knowledge, we could have calculated the position of each store in the global network based on their ratings:

```
SELECT
  id,
  country,
  city,
  rating,
  RANK() OVER(ORDER BY rating DESC)
FROM store;
```

Now, we can add PARTITION BY to calculate the positions independently for each country:

```
SELECT
  id,
  country,
  city,
  rating,
  RANK() OVER(PARTITION BY country ORDER BY rating DESC)
FROM store;
```


In this way, we create a separate ranking for each country, so Paris and Frankfurt can both get rank = 1 for the separate rankings in France and Germany.

# Exercise

Take into account the period between August 10 and August 14, 2016. 

For each row of sales, show the following information: store_id, day, number of customers and the rank based on the number of customers in the particular store (in descending order).



```
SELECT
  store_id,
  day,
  customers,
  RANK() OVER (PARTITION BY store_id ORDER BY customers DESC)
FROM sales
WHERE day BETWEEN '2016-08-10' AND '2016-08-14';
```

# NTILE(x) with PARTITION BY ORDER BY

Of course, you can use any other ranking function in the same way:

```
SELECT
  id,
  country,
  city,
  rating,
  NTILE(2) OVER(PARTITION BY country ORDER BY opening_day)
FROM store;
```

In the above query, the stores are divided into two groups: older and more recent stores. 

These groups are created separately for each country.


# Exercise

Take the sales between August 1 and August 10, 2016. For each row, show the store_id, the day, the revenue on that day and quartile number (quartile means we divide the rows into four groups) based on the revenue of the given store in the descending order.



```
SELECT
  store_id,
  day,
  revenue,
  NTILE(4) OVER (PARTITION BY store_id ORDER BY revenue DESC)
FROM sales
WHERE day BETWEEN '2016-08-01' AND '2016-08-10';
```

# PARTITION BY ORDER BY in CTE

We very briefly looked at queries that introduced WITH, these are called CTEs. We haven't learned these indepth yet, so this is a little preview.

We can use them to find the row with a certain rank. Now, we can find even more rows with a certain rank, each for a different group. 

Take a look:

```
WITH ranking AS (
  SELECT
    country,
    city,
    RANK() OVER(PARTITION BY country ORDER BY rating DESC) AS rank
  FROM store
)

SELECT
  country,
  city
FROM ranking
WHERE rank = 1;
```


The CTE in the parentheses creates a separate ranking of stores in each country based on their rating. 

In the outer query, we simply return the rows with the right rank. 

As a result, we'll see the best store in each country.

# Exercise
For each store, show a row with three columns: store_id, the revenue on the best day in that store in terms of the revenue and the day when that best revenue was achieved.



```
WITH ranking AS (
  SELECT
    store_id,
    revenue,
    day,
    RANK() OVER(PARTITION BY store_id ORDER BY revenue DESC) AS rank
  FROM sales
)

SELECT
  store_id,
  revenue,
  day
FROM ranking
WHERE rank = 1;
```

# Explanation

We also got to know window frames.

Can we use them together with PARTITION BY to create even more sophisticated windows? 

Of course we can. Take a look:

```
SELECT
  id,
  country,
  city,
  opening_day,
  rating,
  MAX(rating) OVER(
    PARTITION BY country
    ORDER BY opening_day
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM store;
```

In the above example, we show some information about each store and the maximal rating of any store opened up to that date (that's where we need a window frame) in the respective country (that's where we need PARTITION BY).


# Exercise
Show sales statistics between August 1 and August 7, 2016. For each row, show store_id, day, revenue and the best revenue in the respective store up to that date.

```
SELECT
  store_id,
  day,
  revenue,
  MAX(revenue) OVER(
    PARTITION BY store_id
    ORDER BY day
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM sales
WHERE day BETWEEN '2016-08-01' AND '2016-08-07';
```

# Exercise

Take sales from the period between August 1 and August 10, 2016. For each row, show the following information: store_id, day, number of transactions and the average number of transactions in the respective store in the window frame starting 2 days before and ending 2 days later with respect to the current row.



```
SELECT
  store_id,
  day,
  transactions,
  AVG(transactions) OVER(
    PARTITION BY store_id
    ORDER BY day
    ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
FROM sales
WHERE day BETWEEN '2016-08-01' AND '2016-08-10';
```

# LEAD() with PARTITION BY ORDER BY

Now, let's talk about the use of analytical functions with PARTITION BY ORDER BY. 

Take a look at the following example:

```
SELECT
  country,
  city,
  opening_day,
  LEAD(city, 1, 'NaN') OVER(PARTITION BY country ORDER BY opening_day)
FROM store;
```
In the above example, we show the country, city and opening_day of each store, but we also show the city where the next store was opened – in the same country, of course.

# Exercise

For each store, show the sales in the period between August 5, 2016 and August 10, 2016: store_id, day, number of transactions, number of transactions on the previous day and the difference between these two values.


```
SELECT
  store_id,
  day,
  transactions,
  LAG(transactions) OVER(PARTITION BY store_id ORDER BY day),
  transactions - LAG(transactions) OVER(PARTITION BY store_id ORDER BY day)
FROM sales
WHERE day BETWEEN '2016-08-05' AND '2016-08-10';
```

# FIRST_VALUE() with PARTITION BY ORDER BY

Of course, other analytical functions are possible as well. Let's analyze another example:

```
SELECT
  country,
  city,
  rating,
  FIRST_VALUE(city) OVER(PARTITION BY country ORDER BY rating DESC)
FROM store;
```

In the above query, we're showing each store individually, but we also show the name of the city with the highest rating in that particular country. 

Note that this would be impossible without PARTITION BY – we couldn't get individual city names for each country separately.

# Exercise

Show sales figures in the period between August 1 and August 3: for each store, show the store_id, the day, the revenue and the date with the best revenue in that period as best_revenue_day.

```
SELECT
  store_id,
  day,
  revenue,
  FIRST_VALUE(day) OVER(PARTITION BY store_id ORDER BY revenue DESC) AS best_revenue_day
FROM sales
WHERE day BETWEEN '2016-08-01' AND '2016-08-03';
```

It's time to review what we've learned in this part:

You can use PARTITION BY ORDER BY to create rankings and do row-level analytics independently for each partition in a single SQL query.

# Window Functions - Evaluation Order

We'll compare window functions with other elements of SELECT queries. You'll find out how to use window functions in various parts of the query and where you're not allowed to do so. 

Are you ready?


# Query evaluation order – problems with WHERE

Great, now we can get down to work.

If you recall, we said that you can't use window functions in the WHERE clause.

Why is that so? 

Because all query elements are processed in a very strict order:

- FROM – the database gets the data from tables in FROM clause and if necessary performs the JOINs,

- WHERE – the data are filtered with conditions specified in WHERE clause,

- GROUP BY – the data are grouped by with conditions specified in WHERE clause,

- aggregate functions – the aggregate functions are applied to the groups created in the GROUP BY phase,

- HAVING – the groups are filtered with the given condition, window functions,

- SELECT – the database selects the given columns,

- DISTINCT – repeated values are removed,

- UNION/INTERSECT/EXCEPT – the database applies set operations,

- ORDER BY – the results are sorted,

- OFFSET – the first rows are skipped,

- LIMIT/FETCH/TOP – only the first rows are selected

Practically, this order means that you can't put window functions anywhere in the FROM, WHERE, GROUP BY or HAVING clauses. 

This is because at the time of calculating these elements, window functions are not yet calculated – and it's impossible to use something which is not already available.


- Window functions can only appear in the SELECT and ORDER BY clauses.

- If you need window functions in other parts of the query, use a subquery.

- If the query uses aggregates or GROUP BY, remember that the window function can only see the grouped rows instead of the original table rows.

# Subqueries for problems with WHERE


Find out for yourself that window functions don't work in the WHERE clause. Look at the template: we would like to show some information for those auctions which have their final_price higher than the average final_price.

```
SELECT 
  id, 
  final_price 
FROM auction 
WHERE final_price > AVG(final_price) OVER()
```

Okay. As you can see, the query did not succeed.

So, how can we select some information for those auctions which had their final_price higher than the average final_price? We have to use a subquery. 

Take a look:

```
SELECT
  id,
  final_price 
FROM (
  SELECT
    id,
    final_price,
    AVG(final_price) OVER() AS avg_final_price
  FROM auction) c
WHERE final_price > avg_final_price
```
In the FROM clause, we introduced a subquery where we selected both the final_price for each auction and the average final_price. Because the whole subquery is calculated before the external query, we can use avg_final_price in the external query.


# Query evaluation order – problems with HAVING

The same problem occurs when we try to use a window function in the HAVING clause.

Look at the template: we would like to show those countries that have the average final price higher than the average final price from all over the world.

Try to run this query:


```
SELECT 
  country, 
  AVG(final_price) 
FROM auction 
GROUP BY country 
HAVING AVG(final_price) > AVG(final_price) OVER ();
```


Just as we expected, no window functions are allowed in HAVING either. Okay, you know that the remedy is to use a subquery. 

Try to correct the query on your own. 

Don't worry if you can't, the hint will be waiting for you in case you need it.

# Exercise

Again, we would like to show those countries (country name and average final price) that have the average final price higher than the average price from all over the world. Correct the query by using a subquery.

```
SELECT
  country,
  AVG(final_price) 
FROM auction 
GROUP BY country 
HAVING AVG(final_price) > (SELECT AVG(final_price) FROM auction);
```


# Problems with GROUP BY


Great. 

The GROUP BY clause is also calculated before window functions. This means that we can't group by the values obtained with window functions. 

Let's check that.

# Exercise

Try to run the template query:

```
SELECT 
  NTILE(4) OVER(ORDER BY views DESC) AS quartile, 
  MIN(views), 
  MAX(views) 
FROM auction 
GROUP BY NTILE(4) OVER(ORDER BY views DESC);
```

The idea is as follows: we want to divide auctions into four equal groups (quartiles) based on the number of views and show the minimal and maximal value for each group. 

Will this query work out?


# Subqueries for problems with GROUP BY

As anticipated, the query failed. So, what can we do to make the query work? 

Again, we'll use a subquery:

```
SELECT
  quartile,
  MIN(views),
  MAX(views)
FROM
  (SELECT
    views,
    ntile(4) OVER(ORDER BY views DESC) AS quartile
  FROM auction) c
GROUP BY quartile;
```

We used the window function in the inner query, which is why we could use it for grouping in the external query.

So, to sum up this section, remember the following rule: the only places where we can use window functions without having to write subqueries are the SELECT and ORDER BY clauses. 

In all other places you have to use subqueries.

# What window functions see

Before, we said that window functions were calculated after the GROUP BY clause. 

This has a very important implication for our queries: if the query uses any aggregates, GROUP BY or HAVING, the window function sees the group rows instead of the original table rows.

To get a better understanding of this phenomenon, take a look at the following example:

```
SELECT 
  category_id,
  final_price, 
  AVG(final_price) OVER() 
FROM auction;
```

This simple query will show the id and final_price of each auction alongside the average final_price from all the auctions. 

Now, take a look at the modified example with grouping:

```
SELECT 
  category_id,
  MAX(final_price), 
  AVG(final_price) OVER() 
FROM auction 
GROUP BY category_id;
```

As you can see, the query doesn't work. This is because we can't use the column final_price in the window function. Once the rows have been grouped, there is no final_price value that makes sense for all the rows together.

However, let's take a look at another modification of this example:
```
SELECT
  category_id,
  MAX(final_price) AS max_final, 
  AVG(MAX(final_price)) OVER()
FROM auction
GROUP BY category_id;
```

As you can see, the query now succeeded because we used an aggregate function (MAX(final_price)) that was indeed available after grouping the rows. By the way, this is the only place where you can nest aggregate functions inside one another.


The best way to correctly create queries with window functions and GROUP BY is as follows: first, create the query with GROUP BY, but without window functions. 

Run the query (in the database or in your head). 

Now, the columns you see in the result are the only columns you can use in your window functions.

# Exercise

Group the auctions by the country. Show the country, the minimal number of participants in an auction and the average minimal number of participants across all countries.

```
SELECT
  country,
  MIN(participants),
  AVG(MIN(participants)) OVER()
FROM auction
GROUP BY country;
```

# Ranking by an aggregate

Great. As you can see, it's fairly simple to create quite advanced statistics very easily thanks to how window functions behave with GROUP BY. Let's take a look at other use cases.

For instance, we may make a ranking based on an aggregate function. 

Take a look:

```
SELECT
  country,
  COUNT(id),
  RANK() OVER(ORDER BY COUNT(id) DESC)
FROM auction
GROUP BY country;
```

We grouped auctions with respect to the country, counted the number of auctions from each country... and then we created a ranking based on that count of auctions.

Now, group the auctions based on the category. Show category_id, the sum of final prices for auctions from this category and a ranking based on that sum, with the highest sum coming first.

```
SELECT
  category_id,
  SUM(final_price),
  RANK() OVER(ORDER BY SUM(final_price) DESC)
FROM auction
GROUP BY category_id;
```


# Day-to-day deltas with GROUP BY

Another thing we can do with window functions when rows are grouped are leads, lags and day-to-day deltas. 

Take a look:

```
SELECT
  ended,
  SUM(final_price) AS sum_price,
  LAG(SUM(final_price)) OVER(ORDER BY ended)
FROM auction
GROUP BY ended
ORDER BY ended;
```

The above query shows each end date with the total price of all items sold on that day and the same total price on the previous day.


# Exercise

For each end day, show the following columns:

- ended,
- the sum of views from auctions that ended on that day,
- the sum of views from the previous day (name the column previous_day,
- delta – the difference between the sum of views on that day and on the previous day (name the column delta).

```
SELECT
  ended,
  SUM(views),
  LAG(SUM(views)) OVER(ORDER BY ended) AS previous_day,
  SUM(views) - LAG(SUM(views)) OVER(ORDER BY ended) AS delta 
FROM auction
GROUP BY ended
ORDER BY ended;
```

# Grouped rows, window functions and PARTITION BY


Finally, you can use window functions with PARTITION BY on grouped rows. One thing you need to remember is that the window function will only see grouped rows, not the original rows. Take a look:

```
SELECT
  country,
  ended,
  SUM(views) AS views_on_day,
  SUM(SUM(views)) OVER(PARTITION BY country)
    AS views_country
FROM auction
GROUP BY country, ended
ORDER BY country, ended;
```

The query might require a bit of explanation. 

First of all, we grouped all rows by the country and ended. 

Then, we showed the country name and date when the auctions ended.

Look what happens in the next two columns. 

First, we simply sum the views in accordance with our GROUP BY clause, i.e. we get the sum of views in all auctions from the particular country on the particular day. 

But look what happens next. We use a window function to sum all daily sums for a particular country. 

As a result, we get the sum of views for a particular country on all days.

# Exercise

Group all auctions by the category and end date and show the following columns:

- category_id,
- ended,
- the average daily final price as daily_avg_final_price in that category on that day,
- the maximal daily average in that category from any day as daily_max_avg.


```
SELECT  
  category_id,  
  ended,  
  AVG(final_price) AS daily_avg_final_price,
  MAX(AVG(final_price)) OVER(PARTITION BY category_id) AS daily_max_avg 
FROM auction
GROUP BY category_id, ended
ORDER BY category_id, ended;
```

# Summary

Excellent. It's time to wrap things up.

- Window functions can only appear in the SELECT and ORDER BY clauses.

- If you need window functions in other parts of the query, use a subquery.

- If the query uses aggregates or GROUP BY, remember that the window function can only see the grouped rows instead of the original table rows.

# Homework

Use the same fiddle from class.

## Exercise

Let's analyze sales data between August 1 and August 3, 2016. For each row, show store_id, day, transactions and the ranking of the store on that day in terms of the number of transactions as compared to other stores. 

The store with the greatest number should get 1 from a window function. 

Use individual row ranks even when two rows share the same value. Name the column place_no.



```
SELECT
  store_id,
  day,
  transactions,
  ROW_NUMBER() OVER (PARTITION BY day ORDER BY transactions DESC) AS place_no
FROM sales
WHERE day
BETWEEN '2016-08-01' AND '2016-08-03';
```

# Exercise

For each day of the sales statistics, show the day, the store_id of the best store in terms of the revenue on that day, and that revenue.



```
WITH ranking AS (
  SELECT
    store_id,
    day,
    revenue,
    RANK() OVER(PARTITION BY day ORDER BY revenue DESC) AS rank
  FROM sales
)

SELECT
  store_id,
  day,
  revenue
FROM ranking
WHERE rank = 1;
```

# Exercise

Divide the sales results for each store into four groups based on the number of transactions and for each store, show the rows in the group with the lowest numbers of transactions: store_id, day, transactions.

```
WITH ranking AS (
  SELECT
    store_id,
    day,
    transactions,
    NTILE(4) OVER(PARTITION BY store_id ORDER BY transactions) AS quartile
  FROM sales
)

SELECT
  store_id,
  day,
  transactions
FROM ranking
WHERE quartile = 1;
```

# Exercise

For each sales row, show the following information: store_id, day, revenue and the future cash flow receivable by the headquarters (i.e. the total revenue in that store, counted from the current day until the last day in our table).



```
SELECT
  store_id,
  day,
  revenue,
  SUM(revenue) OVER(
    PARTITION BY store_id
    ORDER BY day
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM sales
```

# Exercise

For each row of the sales figures, show the following information: store_id, day, revenue, revenue a week before and the ratio of revenue today to the revenue a week before expressed in percentage with 2 decimal places.



```
SELECT
  store_id,
  day,
  revenue,
  LAG(revenue,7) OVER(PARTITION BY store_id ORDER BY day),
  ROUND(revenue / LAG(revenue, 7) OVER(PARTITION BY store_id ORDER BY day) * 100, 2)
FROM sales;
```

# Exercise

For each row, show the following columns: store_id, day, customers and the number of clients in the 5th greatest store in terms of the number of customers on that day.

```
SELECT
  store_id,
  day,
  customers,
  NTH_VALUE(customers, 5) OVER(
    PARTITION BY day
    ORDER BY customers DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM sales;
```

# Exercise

Find the id, country and views for those auctions where the number of views was below the average.

```
SELECT
  id,
  country,
  views
FROM (
  SELECT
    id,
    country,
    views,
    AVG(views) OVER() AS avg_views
  FROM auction) c
WHERE views < avg_views;
```

# Exercise

Now, divide all auctions into 6 equal groups based on the asking_price in ascending order. 

Show columns group_no, minimal, average and maximal value for that group. Sort by the group in ascending order.



```
SELECT
  group_no,
  MIN(asking_price),
  AVG(asking_price),
  MAX(asking_price)
FROM (
  SELECT
    asking_price,
    NTILE(6) OVER(ORDER BY asking_price) AS group_no
  FROM auction) c
GROUP BY group_no
ORDER BY group_no;
```

# Exercise

Group the auctions by category_id and show the category_id and maximal asking price in that category alongside the average maximal price across all categories.



```
SELECT
  category_id,
  MAX(asking_price),
  AVG(MAX(asking_price)) OVER() 
FROM auction
GROUP BY category_id;
```