# Windows Functions Syntax

## Definition

Window functions allow you to perform calculations across a set of table rows related to the current row. Unlike aggregate functions, they do not group the results into a single output row.

## Syntax

- `OVER()`: Defines the window for the function. It can include `PARTITION BY` and `ORDER BY`.
- `PARTITION BY`: Divides the result set into partitions. The function is then applied to each partition.
- `ORDER BY`: Orders rows within each partition for the function.

## Example

### Load in the database

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

### Query

Calculate the average age for customers in each state.

In [2]:
%%sql 

SELECT 
	customerkey, 
	continent,
	state,
	age,
	AVG(age) OVER(PARTITION BY state) AS avg_age_state
FROM customer

customerkey,continent,state,age,avg_age_state
376855,North America,AB,79,51.588552188552185
376749,North America,AB,40,51.588552188552185
376663,North America,AB,45,51.588552188552185
278274,North America,AB,73,51.588552188552185
349590,North America,AB,47,51.588552188552185
243074,North America,AB,61,51.588552188552185
243119,North America,AB,27,51.588552188552185
243214,North America,AB,85,51.588552188552185
243257,North America,AB,73,51.588552188552185
243275,North America,AB,54,51.588552188552185


Why do this instead of using a `GROUP BY` and `AVG` to aggregate this data?

In [3]:
%%sql 

SELECT 
	state,
	AVG(age) AS avg_age_state
FROM customer
GROUP BY state
ORDER BY state

state,avg_age_state
AB,51.588552188552185
Aberdeen,51.354166666666664
Aberdeenshire,52.76203966005666
ACT,53.41
AG,50.214285714285715
AK,52.54491017964072
AL,51.79736211031175
Allerdale,51.08571428571429
Amber Valley,52.52631578947368
AN,48.68292682926829


Windows functions
1. Keeps the row-level information while `GROUP BY` collapses rows into aggregated results. Windows functions lets you calculate aggregates but still show each row.
2. Windows functions can easily partition and order data within the query, great for calculating things like running totals, ranks or averages within partitions (more on this later).

### Real Example

Targeted Marketing or Segmentation Analysis

- Goal: Segment customers based on characteristics relative to regional peers.
- Scenario: Running an ad campaign targeting younger customers in each region.
- Approach:
    - Use window functions to calculate the difference between each customer’s age and the average age for their state.
    - Retain both individual data and state averages, allowing nuanced segmentation.
- Outcome: Focus on customers younger than their state’s average for more relevant, trend-aligned ads.

**Note: This query uses concepts we haven't learned yet. But this is just an example of how to use more intermediate SQL functions for real life analysis**.

In [4]:
%%sql

SELECT 
    customerkey, 
    state, 
    age, 
    ROUND(avg_age_state,1) AS avg_age_state,
    ROUND(age_diff,0) AS age_diff
FROM 
	-- Calculate avg_age by state and the age difference
	(
    SELECT 
        customerkey, 
        state, 
        age, 
        AVG(age) OVER(PARTITION BY state) AS avg_age_state,
        age - AVG(age) OVER(PARTITION BY state) AS age_diff
    FROM customer
) AS subquery
WHERE age_diff < -5; -- Younger than average by at least 5 years


customerkey,state,age,avg_age_state,age_diff
303299,AB,43,51.6,-9
256292,AB,43,51.6,-9
331269,AB,29,51.6,-23
330941,AB,41,51.6,-11
256608,AB,36,51.6,-16
256753,AB,34,51.6,-18
330353,AB,45,51.6,-7
330244,AB,35,51.6,-17
329787,AB,37,51.6,-15
329607,AB,39,51.6,-13


What can we do with this data?

1. **Identify Younger Segments**: Filter customers who are at least 5 years younger than the state average, focusing on younger audiences within each region.
2. **Design and Launch Targeted Campaigns**: Use these insights to craft ad campaigns with messages tailored to younger demographics’ preferences in each state, like trendy products or lifestyle themes.
3. **Measure and Adjust**: Track engagement and conversion rates by state, refining your targeting if younger segments respond well to the campaign.