## **COOP SQL 102.5: Window Functions - Advanced Data Manipulation**

Author: Martin Arroyo

You're making great strides in your SQL journey! Now that you’ve mastered the basics like `SELECT`, `WHERE`, and `GROUP BY`, and even delved into subqueries, it’s time to explore window functions. These powerful tools will allow you to perform advanced calculations across your data.

By the end of this section, you'll understand:

- What window functions are
- Why you'd use them
- How to incorporate them into your SQL queries

### **About this notebook**

All of your queries will be written using preloaded databases that are available only in this notebook. Our "RDBMS" and SQL dialect is called `duckdb`, a new and popular Python library that provides the framework to make our queries possible. You can find [the documentation for `duckdb` here](https://duckdb.org/docs/sql/introduction) - you will want to keep the documentation handy.

`teachdb`, which provides the data that you will be working with, is a Python library written by The Freestack Initiative, a group of COOP alumni who want to empower the community to learn and improve their technical skills by providing materials and resources at low (or no) cost.

## **How to use this notebook**

First, we'll do a quick tutorial on how to use the notebook with these tools, then we'll dive into more SQL!

### **Step 1: Press the play button below to set up the database and notebook**

You will see a checkmark appear when the database is finished setting up.

In [None]:
%%capture --no-stderr
# @title Press Play { display-mode: "form" }

# This code is used to set up the notebook by installing the libraries we need, configuring extensions to
# make displays for our queries look nice, and connecting to our relational database so that you can write
# queries in code cells using the %%sql magic tag.

# Install `teachdb` if it's not in the system already
%pip install --quiet --upgrade git+https://github.com/freestackinitiative/teachingdb.git
from teachdb import TeachDB
# Set configurations for notebook & load data
db = TeachDB(database=["sales_cogs_opex", "restaurant"])
db.setup_notebook()
con = db.connection
%sql con

# Check out the Freestack Initiative @https://github.com/freestackinitiative

### **What is a Window Function?**
Imagine you're a chef in a busy restaurant. You have a list of all orders for the night, and you want to calculate the total bill for each table while still seeing the breakdown of each individual dish. Window functions let you do this by performing calculations across sets of table rows that are related to the current row.

In SQL, window functions perform calculations similar to aggregate functions but without collapsing the result set. They allow you to calculate running totals, moving averages, and rankings.

### **Why Use Window Functions?**
Window functions are incredibly useful when:

- You need to perform calculations across a set of rows that are related to the current row.
- You want to calculate cumulative statistics like running totals, moving averages, or rankings.

### **Example - Using ROW_NUMBER()**

The first window function we'll use is `ROW_NUMBER`. This is a handy function to know, since it allows us to assign unique numbers to our rows based on some kind of order we assign. This is useful in scenarios where you need to assign a unique ID to rows in your query or want to be able to select from the top *N* number of results.

**Scenario:**

The restaurant has asked us to assign a unique number to each dish in the Dishes table based on their price, ordered from the most expensive to the least expensive. The dishes must also be sorted in alphabetical order.

Here's how we can do that:
```sql
SELECT 
       Name, 
       Price, 
       ROW_NUMBER() OVER (ORDER BY Price DESC, Name ASC) AS RowNum
FROM 
       Dishes;
```

And here is what the results look like (showing only the first five rows):

| Name                     | Price | RowNum |
|--------------------------|-------|--------|
| Barbecued Tofu Skewers   | 9.99  | 1      |
| Classic Burger           | 9.99  | 2      |
| Fiesta Family Platter    | 9.99  | 3      |
| Garden Buffet            | 9.99  | 4      |
| Handcrafted Pizza        | 9.99  | 5      |


### **Breakdown**

This is a straightforward `SELECT` query except for the 3rd line where we start the `ROW_NUMBER`. The `ROW_NUMBER()` function assigns a unique number to each row within the result set. The `OVER` clause, which is **always needed in a window function**, specifies how the rows should be ordered. In this case, we order the rows by `Price` in descending order and by `Name` in ascending order. This will give the highest priced dishes a lower number and the lowest priced dishes a higher one, and all of them will be unique (from 1 to *N*).

### **Example - Calculating Running Total with SUM()**

**Scenario:**

Now the restaurant wants to know a running total of the prices of dishes in the `Dishes` table to understand the cumulative revenue from the dishes ordered. They want the running totals by each 

Breakdown:

We can use the SUM() function in conjunction with the OVER clause to achieve this.

sql
