# Windows Functions Syntax

## Overview

### 🥅 Analysis Goals

- Identify high-revenue states and segment younger customers for targeted campaigns.
    - Calculate the average age by state to compare individual ages to their regional peers.
    - Calculate the total revenue by state to identify high-revenue regions.
- Identify segments of younger customers in high-revenue states for more impactful and relevant ad campaigns, focusing on demographics likely to respond to age-appropriate marketing themes.

### 📘 Concepts Covered

- Window functions basic syntax
- `PARTITION BY`
- Aggregate functions with windows funcitons

## Syntax

### 📝 Notes

- Let's you perform calculations across a set of table rows related to the current row.
- Unlike aggregate functions, they do not group the results into a single output row.
- Easily partition and order data within the query, great for calculating things like running totals, ranks or averages within partitions (more on this later).

Syntax
- `OVER()`: Defines the window for the function. It can include `PARTITION BY` and other functions.
- `PARTITION BY`: Divides the result set into partitions. The function is then applied to each partition.

### 💻 Final Result

#### Average Age by State

**`AVG`, `OVER`, `PARTITION BY`**

1. Return the following columns:
    1. `customerkey`
    2. `continent`
    3. `state`
    4. `age`
2. Using a windows function return the average age by the state.

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

In [2]:
%%sql 

SELECT 
	customerkey, 
	continent,
	state,
	age,
	AVG(age) OVER(PARTITION BY state) AS avg_age_state
FROM customer

customerkey,continent,state,age,avg_age_state
376855,North America,AB,79,51.588552188552185
376749,North America,AB,40,51.588552188552185
376663,North America,AB,45,51.588552188552185
278274,North America,AB,73,51.588552188552185
349590,North America,AB,47,51.588552188552185
243074,North America,AB,61,51.588552188552185
243119,North America,AB,27,51.588552188552185
243214,North America,AB,85,51.588552188552185
243257,North America,AB,73,51.588552188552185
243275,North America,AB,54,51.588552188552185


In [3]:
%%sql 

SELECT 
	state,
	AVG(age) AS avg_age_state
FROM customer
GROUP BY state
ORDER BY state

state,avg_age_state
AB,51.588552188552185
Aberdeen,51.354166666666664
Aberdeenshire,52.76203966005666
ACT,53.41
AG,50.214285714285715
AK,52.54491017964072
AL,51.79736211031175
Allerdale,51.08571428571429
Amber Valley,52.52631578947368
AN,48.68292682926829


#### Total Revenue by State

**`SUM`, `OVER`, `PARTITION BY`**

1. Add in another column with a window function that calculates the total revenue by state.
    1. Using `SUM()` for the `revenue`.
    2. `PARTITION BY` the state.
    3. Name this column as `total_revenue_state`.

In [None]:
%%sql

SELECT 
    customerkey, 
    state, 
    age, 
    revenue,
    AVG(age) OVER(PARTITION BY state) AS avg_age_state,
    SUM(revenue) OVER(PARTITION BY state) AS total_revenue_state
FROM customer


2. Add in filter to only get high-revenue states and customers who are under the average age for their state.
    1. In the `WHERE` clause include `age < AVG(age) OVER(PARTITION BY state)`
    2. Add in another condition with `AND` to make sure the `total_revenue_state` is over 10000000.

In [None]:
%%sql

SELECT 
    customerkey, 
    state, 
    age, 
    revenue,
    AVG(age) OVER(PARTITION BY state) AS avg_age_state,
    SUM(revenue) OVER(PARTITION BY state) AS total_revenue_state
FROM customer
WHERE age < AVG(age) OVER(PARTITION BY state) 
AND total_revenue_state > 1000000; -- Example threshold for high-revenue states


### 💡 Why not use GROUP BY instead? 

Like this:

In [None]:
%%sql 

SELECT 
	state,
	AVG(age) AS avg_age_state,
	SUM(revenue) AS total_revenue_state
FROM customer
GROUP BY 
	state
ORDER BY 
	state

Great for cases when you need row-level information and aggregated values.

An example is running a targeted marketing campaign:

- Segment customers based on characteristics relative to regional peers.
- For an ad campaign targeting younger customers in each high-revenue region:
    - Use window functions to calculate each customer’s age difference from the state average and the total revenue by state.
    - This provides individual ages, average age, and total revenue by state for refined segmentation.
- Focus on customers younger than their state’s average in high-revenue states for more targeted and impactful ads.

**Note: This query uses intermediate SQL functions, showing real-life applications of window functions.**

In [None]:
%%sql

SELECT 
    customerkey, 
    state, 
    age, 
    ROUND(avg_age_state, 1) AS avg_age_state,
    ROUND(age_diff, 0) AS age_diff,
    total_revenue_state
FROM 
	-- Calculate avg_age by state, age difference, and total revenue by state
	(
    SELECT 
        customerkey, 
        state, 
        age, 
        revenue,
        AVG(age) OVER(PARTITION BY state) AS avg_age_state,
        age - AVG(age) OVER(PARTITION BY state) AS age_diff,
        SUM(revenue) OVER(PARTITION BY state) AS total_revenue_state
    FROM customer
) AS subquery
WHERE age_diff < -5 -- Younger than average by at least 5 years
AND total_revenue_state > 1000000; -- Example threshold for high-revenue states
