**Setup**

In [2]:
# Library
import pandas as pd
import sqlite3

# Load CSV data into a DataFrame
data = pd.read_csv('D:\Code\DE\PostgreSQL Summary Stats and Window Functions\summer.csv')

# Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')

# Store the DataFrame as a table in the database
data.to_sql('Summer_Medals', conn, index=False)

31165

**Numbering rows**

The simplest application for window functions is numbering rows. Numbering rows allows you to easily fetch the `n-th` row. For example, it would be very difficult to get the 35th row in any given table if you didn't have a column with each row's number.

In [13]:
query = """
SELECT
  *,
  -- Assign numbers to each row
  ROW_NUMBER() OVER() AS Row_N
FROM Summer_Medals
ORDER BY Row_N ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal,Row_N
0,1896,Athens,Aquatics,Swimming,HAJOS Alfred,HUN,Men,100M Freestyle,Gold,1
1,1896,Athens,Aquatics,Swimming,HERSCHMANN Otto,AUT,Men,100M Freestyle,Silver,2
2,1896,Athens,Aquatics,Swimming,DRIVAS Dimitrios,GRE,Men,100M Freestyle For Sailors,Bronze,3
3,1896,Athens,Aquatics,Swimming,MALOKINIS Ioannis,GRE,Men,100M Freestyle For Sailors,Gold,4
4,1896,Athens,Aquatics,Swimming,CHASAPIS Spiridon,GRE,Men,100M Freestyle For Sailors,Silver,5
...,...,...,...,...,...,...,...,...,...,...
31160,2012,London,Wrestling,Wrestling Freestyle,JANIKOWSKI Damian,POL,Men,Wg 84 KG,Bronze,31161
31161,2012,London,Wrestling,Wrestling Freestyle,REZAEI Ghasem Gholamreza,IRI,Men,Wg 96 KG,Gold,31162
31162,2012,London,Wrestling,Wrestling Freestyle,TOTROV Rustam,RUS,Men,Wg 96 KG,Silver,31163
31163,2012,London,Wrestling,Wrestling Freestyle,ALEKSANYAN Artur,ARM,Men,Wg 96 KG,Bronze,31164


**Numbering Olympic games in ascending order**

The Summer Olympics dataset contains the results of the games between 1896 and 2012. The first Summer Olympics were held in 1896, the second in 1900, and so on. What if you want to easily query the table to see in which year the 13th Summer Olympics were held? You'd need to number the rows for that.

**Instructions**

- Assign a number to each year in which Summer Olympic games were held.

In [14]:
query = """
SELECT
  Year,

  -- Assign numbers to each year
  ROW_NUMBER() OVER () AS Row_N
FROM (
  SELECT DISTINCT Year
  FROM Summer_Medals
  ORDER BY Year ASC
) AS Years
ORDER BY Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,Row_N
0,1896,1
1,1900,2
2,1904,3
3,1908,4
4,1912,5
5,1920,6
6,1924,7
7,1928,8
8,1932,9
9,1936,10


**Numbering Olympic games in descending order**

You've already numbered the rows in the Summer Medals dataset. What if you need to reverse the row numbers so that the most recent Olympic games' rows have a lower number?

**Instructions**

- Assign a number to each year in which Summer Olympic games were held so that rows with the most recent years have lower row numbers.

In [2]:
query = """
SELECT
  Year,
  -- Assign the lowest numbers to the most recent years
  ROW_NUMBER() OVER (ORDER BY Year DESC) AS Row_N
FROM (
  SELECT DISTINCT Year
  FROM Summer_Medals
) AS Years
ORDER BY Year;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,Row_N
0,1896,27
1,1900,26
2,1904,25
3,1908,24
4,1912,23
5,1920,22
6,1924,21
7,1928,20
8,1932,19
9,1936,18


**Numbering Olympic athletes by medals earned**

Row numbering can also be used for ranking. For example, numbering rows and ordering by the count of medals each athlete earned in the OVER clause will assign 1 to the highest-earning medalist, 2 to the second highest-earning medalist, and so on.

**Instructions**

- For each athlete, count the number of medals he or she has earned.

In [3]:
query = """
SELECT
  -- Count the number of medals each athlete has earned
  Athlete,
  COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete
ORDER BY Medals DESC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Athlete,Medals
0,PHELPS Michael,22
1,LATYNINA Larisa,18
2,ANDRIANOV Nikolay,15
3,SHAKHLIN Boris,13
4,ONO Takashi,13
...,...,...
22757,AARDEWIJN Pepijn,1
22758,AARDENBURG Willemien,1
22759,AANING Alf Lied,1
22760,AAMODT Ragnhild,1


- Having wrapped the previous query in the `Athlete_Medals` CTE, rank each athlete by the number of medals they've earned.

In [4]:
query = """
WITH Athlete_Medals AS (
  SELECT
    -- Count the number of medals each athlete has earned
    Athlete,
    COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Athlete)

SELECT
  -- Number each athlete by how many medals they've earned
  Athlete,
  ROW_NUMBER() OVER (ORDER BY Medals DESC) AS Row_N
FROM Athlete_Medals
ORDER BY Medals DESC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Athlete,Row_N
0,PHELPS Michael,1
1,LATYNINA Larisa,2
2,ANDRIANOV Nikolay,3
3,MANGIAROTTI Edoardo,4
4,ONO Takashi,5
...,...,...
22757,ÖSTERVOLD Henrik,22758
22758,ÖSTERVOLD Jan Olsen,22759
22759,ÖSTERVOLD Kristian Olsen,22760
22760,ÖSTERVOLD Ole Olsen,22761


**Reigning weightlifting champions**

A reigning champion is a champion who's won both the previous and current years' competitions. To determine if a champion is reigning, the previous and current years' results need to be in the same row, in two different columns.

**Instructions**

- Return each year's gold medalists in the Men's 69KG weightlifting competition.

In [5]:
query = """
SELECT
  -- Return each year's champions' countries
  Year,
  Country AS champion
FROM Summer_Medals
WHERE
  Discipline = 'Weightlifting' AND
  Event = '69KG' AND
  Gender = 'Men' AND
  Medal = 'Gold';
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,champion
0,2000,BUL
1,2004,CHN
2,2008,CHN
3,2012,CHN


- Having wrapped the previous query in the `Weightlifting_Gold` CTE, get the previous year's champion for each year.

In [6]:
query = """
WITH Weightlifting_Gold AS (
  SELECT
    -- Return each year's champions' countries
    Year,
    Country AS champion
  FROM Summer_Medals
  WHERE
    Discipline = 'Weightlifting' AND
    Event = '69KG' AND
    Gender = 'Men' AND
    Medal = 'Gold')

SELECT
  Year, Champion,
  -- Fetch the previous year's champion
  LAG(Champion) OVER
    (ORDER BY Year ASC) AS Last_Champion
FROM Weightlifting_Gold
ORDER BY Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Year,champion,Last_Champion
0,2000,BUL,
1,2004,CHN,BUL
2,2008,CHN,CHN
3,2012,CHN,CHN


**Reigning champions by gender**

You've already fetched the previous year's champion for one event. However, if you have multiple events, genders, or other metrics as columns, you'll need to split your table into partitions to avoid having a champion from one event or gender appear as the previous champion of another event or gender.

**Instructions**

- Return the previous champions of each year's event by gender.

In [7]:
query = """
WITH Tennis_Gold AS (
  SELECT DISTINCT
    Gender, Year, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Event = 'Javelin Throw' AND
    Medal = 'Gold')

SELECT
  Gender, Year,
  Country AS Champion,
  -- Fetch the previous year's champion by gender
  LAG(Country) OVER (PARTITION BY Gender
                         ORDER BY Year ASC) AS Last_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Gender,Year,Champion,Last_Champion
0,Men,2000,CZE,
1,Men,2004,NOR,CZE
2,Men,2008,NOR,NOR
3,Men,2012,TTO,NOR
4,Women,2000,NOR,
5,Women,2004,CUB,NOR
6,Women,2008,CZE,CUB
7,Women,2012,CZE,CZE


**Reigning champions by gender and event**

In the previous exercise, you partitioned by gender to ensure that data about one gender doesn't get mixed into data about the other gender. If you have multiple columns, however, partitioning by only one of them will still mix the results of the other columns.

**Instructions**

- Return the previous champions of each year's events by gender and event.

In [3]:
query = """
WITH Athletics_Gold AS (
  SELECT DISTINCT
    Gender, Year, Event, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Discipline = 'Athletics' AND
    Event IN ('100M', '10000M') AND
    Medal = 'Gold')

SELECT
  Gender, Year, Event,
  Country AS Champion,
  -- Fetch the previous year's champion by gender and event
  LAG(Country) OVER (PARTITION BY Gender, Event
                         ORDER BY Year ASC) AS Last_Champion
FROM Athletics_Gold
ORDER BY Event ASC, Gender ASC, Year ASC;
"""
result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Gender,Year,Event,Champion,Last_Champion
0,Men,2000,10000M,ETH,
1,Men,2004,10000M,ETH,ETH
2,Men,2008,10000M,ETH,ETH
3,Men,2012,10000M,GBR,ETH
4,Women,2000,10000M,ETH,
5,Women,2004,10000M,CHN,ETH
6,Women,2008,10000M,ETH,CHN
7,Women,2012,10000M,ETH,ETH
8,Men,2000,100M,USA,
9,Men,2004,100M,USA,USA
