In [1]:
import pandas as pd
import sqlalchemy as sa
import psycopg2 as ps
from sqlalchemy import create_engine

In [2]:
%load_ext sql
%sql postgresql://postgres:lingga28@localhost:2828/datacamp
conn = create_engine('postgresql://postgres:lingga28@localhost/datacamp')

# 1. A basic pivot
You have the following table of Pole Vault gold medalist countries by gender in 2008 and 2012.

| Gender | Year | Country |
|--------|------|---------|
| Men    | 2008 | AUS     |
| Men    | 2012 | FRA     |
| Women  | 2008 | RUS     |
| Women  | 2012 | USA     |
Pivot it by Year to get the following reshaped, cleaner table.

| Gender | 2008 | 2012 |
|--------|------|------|
| Men    | AUS  | FRA  |
| Women  | RUS  | USA  |

### Instructions
- Create the correct extension.
- Fill in the column names of the pivoted table.

In [3]:
%%sql

-- Create the correct extention to enable CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$
  SELECT
    Gender, Year, Country
  FROM Summer_Medals
  WHERE
    Year IN (2008, 2012)
    AND Medal = 'Gold'
    AND Event = 'Pole Vault'
  ORDER By Gender ASC, Year ASC;
-- Fill in the correct column names for the pivoted table
$$) AS ct (Gender VARCHAR,
           Year VARCHAR,
           Country VARCHAR)

ORDER BY Gender ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
Done.
2 rows affected.


gender,year,country
Men,AUS,FRA
Women,RUS,USA


# 2. Pivoting with ranking
You want to produce an easy scannable table of the rankings of the three most populous EU countries by how many gold medals they've earned in the 2004 through 2012 Olympic games. The table needs to be in this format:

| Country | 2004 | 2008 | 2012 |
|---------|------|------|------|
| FRA     | ...  | ...  | ...  |
| GBR     | ...  | ...  | ...  |
| GER     | ...  | ...  | ...  |

You'll need to count the gold medals each country has earned, produce the ranks of each country by medals earned, then pivot the table to this shape.

### task 1
### Instruction
Count the gold medals that France (FRA), the UK (GBR), and Germany (GER) have earned per country and year.

# 3. Country-level subtotals
### Exercises
You want to look at three Scandinavian countries' earned gold medals per country and gender in the year 2004. You're also interested in Country-level subtotals to get the total medals earned for each country, but Gender-level subtotals don't make much sense in this case, so disregard them.

### Instructions
- Count the gold medals awarded per country and gender.
- Generate Country-level gold award counts.

In [4]:
%%sql

-- Count the gold medals per country and gender
SELECT
  country,
  gender,
  COUNT(*) AS Gold_Awards
FROM Summer_Medals
WHERE
  Year = 2004
  AND Medal = 'Gold'
  AND Country IN ('DEN', 'NOR', 'SWE')
-- Generate Country-level subtotals
GROUP BY Country, ROLLUP(gender)
ORDER BY Country ASC, Gender ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
9 rows affected.


country,gender,gold_awards
DEN,Men,4
DEN,Women,15
DEN,,19
NOR,Men,3
NOR,Women,2
NOR,,5
SWE,Men,4
SWE,Women,1
SWE,,5


# 4. All group-level subtotals
### Exercises
You want to break down all medals awarded to Russia in the 2012 Olympic games per gender and medal type. Since the medals all belong to one country, Russia, it makes sense to generate all possible subtotals (Gender- and Medal-level subtotals), as well as a grand total.

Generate a breakdown of the medals awarded to Russia per country and medal type, including all group-level subtotals and a grand total.

### Instructions
- Count the medals awarded per gender and medal type.
- Generate all possible group-level counts (per gender and medal type subtotals and the grand total).

In [5]:
%%sql

-- Count the medals per gender and medal type
SELECT
  gender,
  medal,
  COUNT(*) AS Awards
FROM Summer_Medals
WHERE
  Year = 2012
  AND Country = 'RUS'
-- Get all possible group-level subtotals
GROUP BY CUBE(gender, medal)
ORDER BY Gender ASC, Medal ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
12 rows affected.


gender,medal,awards
Men,Bronze,34
Men,Gold,23
Men,Silver,7
Men,,64
Women,Bronze,17
Women,Gold,24
Women,Silver,25
Women,,66
,Bronze,51
,Gold,47


# 5. Cleaning up results
### Exercises
Returning to the breakdown of Scandinavian awards you previously made, you want to clean up the results by replacing the nulls with meaningful text.

### Instruction
Turn the nulls in the Country column to All countries, and the nulls in the Gender column to All genders.

In [6]:
%%sql

SELECT
  -- Replace the nulls in the columns with meaningful text
  COALESCE(Country, 'All countries') AS Country,
  COALESCE(Gender, 'All genders') AS Gender,
  COUNT(*) AS Awards
FROM Summer_Medals
WHERE
  Year = 2004
  AND Medal = 'Gold'
  AND Country IN ('DEN', 'NOR', 'SWE')
GROUP BY ROLLUP(Country, Gender)
ORDER BY Country ASC, Gender ASC;

 * postgresql://postgres:***@localhost:2828/datacamp
10 rows affected.


country,gender,awards
All countries,All genders,29
DEN,All genders,19
DEN,Men,4
DEN,Women,15
NOR,All genders,5
NOR,Men,3
NOR,Women,2
SWE,All genders,5
SWE,Men,4
SWE,Women,1


# 6. Summarizing results
### Exercises
After ranking each country in the 2000 Olympics by gold medals awarded, you want to return the top 3 countries in one row, as a comma-separated string. In other words, turn this:

| Country | Rank |
|---------|------|
| USA     | 1    |
| RUS     | 2    |
| AUS     | 3    |
| ...     | ...  |

into this:

USA, RUS, AUS

### task 1
### Instruction
Rank countries by the medals they've been awarded.

In [8]:
%%sql

WITH Country_Medals AS (
  SELECT
    Country,
    COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE Year = 2000
    AND Medal = 'Gold'
  GROUP BY Country)

  SELECT
    Country,
    -- Rank countries by the medals awarded
    RANK() OVER (ORDER BY Medals DESC) AS Rank
  FROM Country_Medals
  ORDER BY Rank ASC
    LIMIT 10; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
10 rows affected.


country,rank
USA,1
RUS,2
AUS,3
CHN,4
GER,5
NED,6
ROU,6
HUN,8
CUB,9
ITA,9


### task 2
### Instruction
Return the top 3 countries by medals awarded as one comma-separated string.

In [9]:
WITH Country_Medals AS (
  SELECT
    Country,
    COUNT(*) AS Medals
  FROM Summer_Medals
  WHERE Year = 2000
    AND Medal = 'Gold'
  GROUP BY Country),

  Country_Ranks AS (
  SELECT
    Country,
    RANK() OVER (ORDER BY Medals DESC) AS Rank
  FROM Country_Medals
  ORDER BY Rank ASC)

-- Compress the countries column
SELECT STRING_AGG(Country, ', ')
FROM Country_Ranks
-- Select only the top three ranks
WHERE Rank <= 3;

SyntaxError: invalid syntax (Temp/ipykernel_11236/2913855381.py, line 1)