<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/3_Windows_Functions/1_Syntax.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Windows Functions Syntax

## Overview

**Marketing Analysis Focused**

### 📘 Concepts Covered

- Basic syntax: `OVER()`, `PARTITION BY`

In [None]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

---
## Syntax

### 📝 Notes

- Let's you perform calculations across a set of table rows related to the current row.
- Unlike aggregate functions, they do not group the results into a single output row.
- Easily partition and order data within the query, great for calculating things like running totals, ranks or averages within partitions (more on this later).

#### Syntax
- `OVER()`: Defines the window for the function. It can include `PARTITION BY` and other functions.
- `PARTITION BY`: Divides the result set into partitions. The function is then applied to each partition.

```sql
  SELECT
    window_function() OVER(
         PARTITION BY partition_expression
    ) AS window_column_alias
    FROM table_name
```

### 💻 Final Result

**Note: Still need screenshot of the final result**.

**Use different example here**

In [None]:
%%sql

SELECT 
	customerkey, 
	state,
	age,
	AVG(age) OVER(PARTITION BY state) AS avg_age_state
FROM customer

### 📊 Example for targeted marketing:

- Segment customers based on characteristics relative to their regional peers.
- For an ad campaign targeting younger customers in states with a high number of customers:
    - Use window functions to calculate each customer’s age difference from the state average and the total number of customers in each state.
    - This allows you to keep individual ages while also accessing the state’s average age and total customer count for refined segmentation.
- Focus on customers who are younger than their state’s average in high-customer-count areas, creating more targeted and impactful ads.

**Note:** This query shows real-life applications of window functions for practical marketing analysis.

In [7]:
%%sql

SELECT 
    customerkey, 
    state, 
    age, 
    ROUND(avg_age_state, 1) AS avg_age_state,
    ROUND(age_diff, 0) AS age_diff,
    total_customers_state
FROM 
    -- Calculate avg_age by state, age difference, and total customers by state
    (
        SELECT 
            customerkey, 
            state, 
            age, 
            AVG(age) OVER(PARTITION BY state) AS avg_age_state,
            age - AVG(age) OVER(PARTITION BY state) AS age_diff,
            COUNT(customerkey) OVER(PARTITION BY state) AS total_customers_state
        FROM customer
    ) AS subquery
WHERE age_diff < -5 -- Younger than average by at least 5 years
AND total_customers_state > 1000; -- Example threshold for high customer count states


customerkey,state,age,avg_age_state,age_diff,total_customers_state
362016,AB,25,51.6,-27,1485
362036,AB,20,51.6,-32,1485
362081,AB,30,51.6,-22,1485
362630,AB,40,51.6,-12,1485
362641,AB,20,51.6,-32,1485
363113,AB,19,51.6,-33,1485
363240,AB,33,51.6,-19,1485
363556,AB,41,51.6,-11,1485
363697,AB,23,51.6,-29,1485
363753,AB,23,51.6,-29,1485
