**Setup**

In [1]:
# Library
import pandas as pd
import sqlite3

# Load CSV data into a DataFrame
data = pd.read_csv('D:\Code\DE\PostgreSQL Summary Stats and Window Functions\summer.csv')

# Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')

# Store the DataFrame as a table in the database
data.to_sql('Summer_Medals', conn, index=False)

31165

**Aggregate window functions**

**Running totals of athlete medals**

The running total (or cumulative sum) of a column helps you determine what each row's contribution is to the total sum.

**Instructions**

- Return the athletes, the number of medals they earned, and the medals running total, ordered by the athletes' names in alphabetical order.

In [2]:
query = """
WITH Athlete_Medals AS (
  SELECT
    Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'USA' AND Medal = 'Gold'
    AND Year >= 2000
  GROUP BY Athlete)

SELECT
  -- Calculate the running total of athlete medals
  Athlete,
  Medals,
  SUM(Medals) OVER (ORDER BY Athlete ASC) AS Max_Medals
FROM Athlete_Medals
ORDER BY Athlete ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Athlete,Medals,Max_Medals
0,ABDUR-RAHIM Shareef,1,1
1,ABERNATHY Brent,1,2
2,ADRIAN Nathan,3,5
3,AHRENS Chris,1,6
4,AINSWORTH Kurt,1,7
...,...,...,...
362,WOLTERS Kara,1,513
363,WYLDE Peter,1,514
364,YOUNG Ernie,1,515
365,YOUNG Tim,1,516


**Maximum country medals by year**

Getting the maximum of a country's earned medals so far helps you determine whether a country has broken its medals record by comparing the current year's earned medals and the maximum so far.

**Instructions**

- Return the year, country, medals, and the maximum medals earned so far for each country, ordered by year in ascending order.

In [3]:
query = """
WITH Country_Medals AS (
  SELECT
    Year, Country, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country IN ('CHN', 'KOR', 'JPN')
    AND Medal = 'Gold' AND Year >= 2000
  GROUP BY Year, Country)

SELECT
  -- Return the max medals earned so far per country
  Country,
  Year,
  Medals,
  MAX(Medals) OVER (PARTITION BY Country
                        ORDER BY Year ASC) AS Max_Medals
FROM Country_Medals
ORDER BY Country ASC, Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Country,Year,Medals,Max_Medals
0,CHN,2000,39,39
1,CHN,2004,52,52
2,CHN,2008,74,74
3,CHN,2012,56,74
4,JPN,2000,5,5
5,JPN,2004,21,21
6,JPN,2008,23,23
7,JPN,2012,7,23
8,KOR,2000,12,12
9,KOR,2004,14,14


**Minimum country medals by year**

So far, you've seen `MAX` and `SUM`, aggregate functions normally used with `GROUP BY`, being used as window functions. You can also use the other aggregate functions, like `MIN`, as window functions.

**Instructions**

- Return the year, medals earned, and minimum medals earned so far.

In [4]:
query = """
WITH France_Medals AS (
  SELECT
    Year, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'FRA'
    AND Medal = 'Gold' AND Year >= 2000
  GROUP BY Year)

SELECT
  Year,
  Medals,
  MIN(Medals) OVER (ORDER BY Year ASC) AS Min_Medals
FROM France_Medals
ORDER BY Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,Medals,Min_Medals
0,2000,22,22
1,2004,21,21
2,2008,25,21
3,2012,30,21


**Frames**

**Moving maximum of Scandinavian athletes' medals**

Frames allow you to restrict the rows passed as input to your window function to a sliding window for you to define the start and finish.

Adding a frame to your window function allows you to calculate "moving" metrics, inputs of which slide from row to row.

**Instructions**

- Return the year, medals earned, and the maximum medals earned, comparing only the current year and the next year.

In [5]:
query = """
WITH Scandinavian_Medals AS (
  SELECT
    Year, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country IN ('DEN', 'NOR', 'FIN', 'SWE', 'ISL')
    AND Medal = 'Gold'
  GROUP BY Year)

SELECT
  -- Select each year's medals
  Year,
  Medals,
  -- Get the max of the current and next years'  medals
  MAX(Medals) OVER (ORDER BY Year ASC
                    ROWS BETWEEN CURRENT ROW
                    AND 1 FOLLOWING) AS Max_Medals
FROM Scandinavian_Medals
ORDER BY Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,Medals,Max_Medals
0,1896,1,1
1,1900,1,77
2,1908,77,141
3,1912,141,159
4,1920,159,159
5,1924,48,48
6,1928,24,24
7,1932,17,17
8,1936,15,54
9,1948,54,54


**Moving maximum of Chinese athletes' medals**

Frames allow you to "peek" forwards or backward without first using the relative fetching functions, `LAG` and `LEAD`, to fetch previous rows' values into the current row.

**Instructions**

- Return the athletes, medals earned, and the maximum medals earned, comparing only the last two and current athletes, ordering by athletes' names in alphabetical order.

In [6]:
query = """
WITH Chinese_Medals AS (
  SELECT
    Athlete, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'CHN' AND Medal = 'Gold'
    AND Year >= 2000
  GROUP BY Athlete)

SELECT
  -- Select the athletes and the medals they've earned
  Athlete,
  Medals,
  -- Get the max of the last two and current rows' medals 
  MAX(Medals) OVER (ORDER BY Athlete ASC
                    ROWS BETWEEN 2 PRECEDING
                    AND CURRENT ROW) AS Max_Medals
FROM Chinese_Medals
ORDER BY Athlete ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Athlete,Medals,Max_Medals
0,CAI Yalin,1,1
1,CAI Yun,1,1
2,CAO Lei,1,1
3,CAO Yuan,1,1
4,CHEN Ding,1,1
...,...,...,...
155,ZHOU Lulu,1,1
156,ZHOU Suhong,1,1
157,ZHU Qinan,1,1
158,ZOU Kai,5,5


**Moving averages and totals**

**Moving average of Russian medals**

Using frames with aggregate window functions allow you to calculate many common metrics, including moving averages and totals. These metrics track the change in performance over time.

**Instructions**

- Calculate the 3-year moving average of medals earned.

In [7]:
query = """
WITH Russian_Medals AS (
  SELECT
    Year, COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE
    Country = 'RUS'
    AND Medal = 'Gold'
    AND Year >= 1980
  GROUP BY Year)

SELECT
  Year, Medals,
  AVG(Medals) OVER
    (ORDER BY Year ASC
     ROWS BETWEEN
     2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM Russian_Medals
ORDER BY Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,Medals,Medals_MA
0,1996,36,36.0
1,2000,66,51.0
2,2004,47,49.666667
3,2008,43,52.0
4,2012,47,45.666667


**Moving total of countries' medals**

What if your data is split into multiple groups spread over one or more columns in the table? Even with a defined frame, if you can't somehow separate the groups' data, one group's values will affect the average of another group's values.

**Instructions**

- Calculate the 3-year moving sum of medals earned per country.

In [8]:
query = """
WITH Country_Medals AS (
  SELECT
    Year, Country, COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Year, Country)

SELECT
  Year, Country, Medals,
  -- Calculate each country's 3-game moving total
  SUM(Medals) OVER
    (PARTITION BY Country
     ORDER BY Year ASC
     ROWS BETWEEN
     2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM Country_Medals
ORDER BY Country ASC, Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,Country,Medals,Medals_MA
0,2012,,4,4
1,2008,AFG,1,1
2,2012,AFG,1,2
3,1988,AHO,1,1
4,1984,ALG,2,2
...,...,...,...,...
1153,2004,ZIM,3,19
1154,2008,ZIM,4,23
1155,1896,ZZX,6,6
1156,1900,ZZX,34,40
