In [2]:
import pandas as pd
import sqlalchemy as sa
import psycopg2 as ps
from sqlalchemy import create_engine

In [3]:
%load_ext sql
%sql postgresql://postgres:lingga28@localhost:2828/datacamp
conn = create_engine('postgresql://postgres:lingga28@localhost/datacamp')

# 1. Which of the following is FALSE?

### Answer the question
### Possible Answers

- A. Unlike GROUP BY results, window functions don't reduce the number of rows in the output.
- B. Window functions can fetch values from other rows into the table, whereas GROUP BY functions cannot.
- C. Window functions can open a "window" to another table, whereas GROUP BY functions cannot.
- D. Window functions can calculate running totals and moving averages, whereas GROUP BY functions cannot.

Answer: C

# 2. Numbering rows
The simplest application for window functions is numbering rows. Numbering rows allows you to easily fetch the nth row. For example, it would be very difficult to get the 35th row in any given table if you didn't have a column with each row's number.

### Instruction
Number each row in the dataset.

In [4]:
%%sql

SELECT
  *,
  -- Assign numbers to each row
  ROW_NUMBER() OVER() AS Row_N
FROM Summer_Medals
ORDER BY Row_N ASC
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


year,city,sport,discipline,athlete,country,gender,event,medal,row_n
1896,Athens,Aquatics,Swimming,HAJOS Alfred,HUN,Men,100M Freestyle,Gold,1
1896,Athens,Aquatics,Swimming,HERSCHMANN Otto,AUT,Men,100M Freestyle,Silver,2
1896,Athens,Aquatics,Swimming,DRIVAS Dimitrios,GRE,Men,100M Freestyle For Sailors,Bronze,3


# 3. Numbering Olympic games in ascending order
### Exercises
The Summer Olympics dataset contains the results of the games between 1896 and 2012. The first Summer Olympics were held in 1896, the second in 1900, and so on. What if you want to easily query the table to see in which year the 13th Summer Olympics were held? You'd need to number the rows for that.

### Instruction
Assign a number to each year in which Summer Olympic games were held.

In [5]:
%%sql

SELECT
  Year,

  -- Assign numbers to each year
  ROW_NUMBER() OVER() AS Row_N
FROM (
  SELECT DISTINCT year
  FROM Summer_Medals
  ORDER BY Year ASC
) AS Years
ORDER BY Year ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
27 rows affected.


year,row_n
1896,1
1900,2
1904,3
1908,4
1912,5
1920,6
1924,7
1928,8
1932,9
1936,10


# 4. Numbering Olympic games in descending order
### Exercises
You've already numbered the rows in the Summer Medals dataset. What if you need to reverse the row numbers so that the most recent Olympic games' rows have a lower number?

### Instruction
Assign a number to each year in which Summer Olympic games were held so that rows with the most recent years have lower row numbers.

In [6]:
%%sql

SELECT
  Year,
  -- Assign the lowest numbers to the most recent years
  ROW_NUMBER() OVER (ORDER BY year DESC) AS Row_N
FROM (
  SELECT DISTINCT Year
  FROM Summer_Medals
) AS Years
ORDER BY Year;

 * postgresql://postgres:***@localhost:2828/datacamp
27 rows affected.


year,row_n
1896,27
1900,26
1904,25
1908,24
1912,23
1920,22
1924,21
1928,20
1932,19
1936,18


# 5. Numbering Olympic athletes by medals earned
### Exercises
Row numbering can also be used for ranking. For example, numbering rows and ordering by the count of medals each athlete earned in the OVER clause will assign 1 to the highest-earning medalist, 2 to the second highest-earning medalist, and so on.

### task 1
### Instruction
For each athlete, count the number of medals he or she has earned.

In [8]:
%%sql

SELECT
  -- Count the number of medals each athlete has earned
  athlete,
  COUNT(gender) AS Medals
FROM Summer_Medals
GROUP BY Athlete
ORDER BY Medals DESC
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


athlete,medals
PHELPS Michael,22
LATYNINA Larisa,18
ANDRIANOV Nikolay,15


### task 2
### Instruction
Having wrapped the previous query in the Athlete_Medals CTE, rank each athlete by the number of medals they've earned.

In [9]:
%%sql

 WITH Athlete_Medals AS (
  SELECT
    -- Count the number of medals each athlete has earned
    Athlete,
    COUNT(*) AS Medals
  FROM Summer_Medals
  GROUP BY Athlete)

SELECT
  -- Number each athlete by how many medals they've earned
  athlete,
  ROW_NUMBER() OVER (ORDER BY medals DESC) AS Row_N
FROM Athlete_Medals
ORDER BY Medals DESC
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


athlete,row_n
PHELPS Michael,1
LATYNINA Larisa,2
ANDRIANOV Nikolay,3


# 6. Reigning weightlifting champions
### Exercises
A reigning champion is a champion who's won both the previous and current years' competitions. To determine if a champion is reigning, the previous and current years' results need to be in the same row, in two different columns.

### task 1
### Instruction
Return each year's gold medalists in the Men's 69KG weightlifting competition.

In [10]:
%%sql

SELECT
  -- Return each year's champions' countries
  year,
  country AS champion
FROM Summer_Medals
WHERE
  Discipline = 'Weightlifting' AND
  Event = '69KG' AND
  Gender = 'Men' AND
  Medal = 'Gold';

 * postgresql://postgres:***@localhost:2828/datacamp
4 rows affected.


year,champion
2000,BUL
2004,CHN
2008,CHN
2012,CHN


### task 2
### Instruction
Having wrapped the previous query in the Weightlifting_Gold CTE, get the previous year's champion for each year.

In [11]:
%%sql

WITH Weightlifting_Gold AS (
  SELECT
    -- Return each year's champions' countries
    Year,
    Country AS champion
  FROM Summer_Medals
  WHERE
    Discipline = 'Weightlifting' AND
    Event = '69KG' AND
    Gender = 'Men' AND
    Medal = 'Gold')

SELECT
  Year, Champion,
  -- Fetch the previous year's champion
  LAG(Champion) OVER
    (ORDER BY year ASC) AS Last_Champion
FROM Weightlifting_Gold
ORDER BY Year ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
4 rows affected.


year,champion,last_champion
2000,BUL,
2004,CHN,BUL
2008,CHN,CHN
2012,CHN,CHN


# 7. Reigning champions by gender
### Exercises
You've already fetched the previous year's champion for one event. However, if you have multiple events, genders, or other metrics as columns, you'll need to split your table into partitions to avoid having a champion from one event or gender appear as the previous champion of another event or gender.

### task 1
### Instruction
Return the previous champions of each year's event by gender.

In [12]:
%%sql

WITH Tennis_Gold AS (
  SELECT DISTINCT
    Gender, Year, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Event = 'Javelin Throw' AND
    Medal = 'Gold')

SELECT
  Gender, Year,
  Country AS Champion,
  -- Fetch the previous year's champion by gender
    LAG(country) OVER (PARTITION BY gender
            ORDER BY year ASC) AS Last_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
8 rows affected.


gender,year,champion,last_champion
Men,2000,CZE,
Men,2004,NOR,CZE
Men,2008,NOR,NOR
Men,2012,TTO,NOR
Women,2000,NOR,
Women,2004,CUB,NOR
Women,2008,CZE,CUB
Women,2012,CZE,CZE


# 8. Reigning champions by gender and event
### Exercises
In the previous exercise, you partitioned by gender to ensure that data about one gender doesn't get mixed into data about the other gender. If you have multiple columns, however, partitioning by only one of them will still mix the results of the other columns.

### Instruction
Return the previous champions of each year's events by gender and event.

In [13]:
%%sql

WITH Athletics_Gold AS (
  SELECT DISTINCT
    Gender, Year, Event, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Discipline = 'Athletics' AND
    Event IN ('100M', '10000M') AND
    Medal = 'Gold')

SELECT
  Gender, Year, Event,
  Country AS Champion,
  -- Fetch the previous year's champion by gender and event
  LAG(country) OVER (PARTITION BY gender, event
            ORDER BY Year ASC) AS Last_Champion
FROM Athletics_Gold
ORDER BY Event ASC, Gender ASC, Year ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
15 rows affected.


gender,year,event,champion,last_champion
Men,2000,10000M,ETH,
Men,2004,10000M,ETH,ETH
Men,2008,10000M,ETH,ETH
Men,2012,10000M,GBR,ETH
Women,2000,10000M,ETH,
Women,2004,10000M,CHN,ETH
Women,2008,10000M,ETH,CHN
Women,2012,10000M,ETH,ETH
Men,2000,100M,USA,
Men,2004,100M,USA,USA


# 9. Row numbers with partitioning
If you run ROW_NUMBER() OVER (PARTITION BY Year ORDER BY Medals DESC) on the following table, what row number would the 2008 Iranian record have?

| Year | Country | Medals |
|------|---------|--------|
| 2004 | IRN     | 32     |
| 2004 | LBN     | 17     |
| 2004 | KSA     | 4      |
| 2008 | IRQ     | 29     |
| 2008 | IRN     | 27     |
| 2008 | UAE     | 12     |

### Possible Answers:
- A. 5
- B. 1
- C. 2

Answer: B