##  Top-N Analysis Using Ranking Window Functions

###  Learning Objectives

In this training, we will learn:

-  How to use the `ROW_NUMBER()` and `RANK()` functions to perform **partition-wise ranking operations**.  
-  How the ranking results of these two functions **differ from each other**.  



###  Overview

Suppose we want to rank countries from **worst to best** in terms of their **levels of access to managed drinking water services** each year.

To do this, we’ll apply **Top-N Analysis**, which focuses on identifying and analyzing the **highest- or lowest-ranked** elements in a dataset based on a given criterion.

In our case, the criterion will be the **percentage of people with access to managed drinking water services per year**.

We’ll use **SQL ranking functions** — particularly `ROW_NUMBER()` and `RANK()` — to order and compare countries within each region and subregion.



###  Database Table
We’ll be using the table:

**`united_nations.access_to_basic_services`**,  
which contains data about:
- Regions and subregions  
- Countries  
- Levels of access to basic services (drinking water, sanitation, etc.)  
- GDP and population estimates  


In [1]:
%load_ext sql

##  Exercise 1: Ranking Countries by Access to Managed Drinking Water Services

In this task, we will order countries based on their **levels of access to managed drinking water services per year**.

We’ll use the **`ROW_NUMBER()`** window function to assign a unique rank to each country for every year, where:
- **1** represents the country with the **lowest access** level, and  
- The last number represents the country with the **highest access** level that year.

###  Key Concept
The `ROW_NUMBER()` function assigns a sequential number to rows within each partition (in this case, `Time_period`) ordered by the chosen column (`Pct_managed_drinking_water_services`).


In [3]:
%%sql

SELECT
    Country_name,
    Time_period,
    Pct_managed_drinking_water_services,
    ROW_NUMBER() OVER (
        PARTITION BY Time_period
        ORDER BY Pct_managed_drinking_water_services ASC
    ) AS Water_Access_Rank
FROM
    united_nations.Access_to_Basic_Services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services,Water_Access_Rank
Central African Republic,2015,44.0,1
Democratic Republic of the Congo,2015,45.33,2
South Sudan,2015,46.33,3
Angola,2015,50.33,4
Somalia,2015,50.67,5
Chad,2015,51.67,6
Ethiopia,2015,52.0,7
Madagascar,2015,53.33,8
Papua New Guinea,2015,53.67,9
Uganda,2015,55.0,10


##  Exercise 2: Assess Rankings for Countries with the Same Water Access Levels

In this task, we’ll examine how countries with **identical levels of access to managed drinking water services** are ranked in a given year.

We’ll use the **`RANK()`** window function — unlike `ROW_NUMBER()`, it assigns the **same rank** to rows that have the **same value** in the ordering column.

###  Objective
Check whether countries with **100% managed drinking water services** in the same year receive the **same rank**.

###  Key Concept
- `RANK()` assigns the same rank to identical values.
- If two countries share the same rank, the next rank is **skipped**.


In [4]:
%%sql

SELECT
    Country_name,
    Time_period,
    Pct_managed_drinking_water_services,
    RANK() OVER (
        PARTITION BY Time_period
        ORDER BY Pct_managed_drinking_water_services ASC
    ) AS Water_Access_Rank
FROM
    united_nations.Access_to_Basic_Services
WHERE
    Pct_managed_drinking_water_services = 100;


 * mysql+pymysql://root:***@localhost:3306/united_nations
208 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services,Water_Access_Rank
Singapore,2015,100.0,1
Bermuda,2015,100.0,1
Greenland,2015,100.0,1
British Virgin Islands,2015,100.0,1
Guadeloupe,2015,100.0,1
Martinique,2015,100.0,1
Saint Barthélemy,2015,100.0,1
Saint Martin (French Part),2015,100.0,1
Bahrain,2015,100.0,1
Cyprus,2015,100.0,1


##  Exercise 3: Apply the RANK() Function Instead

In this task, we will **replace the `ROW_NUMBER()` function** used in Exercise 1 with the **`RANK()`** function.

###  Objective
To observe how the ranking results differ between `ROW_NUMBER()` and `RANK()` when countries have identical access levels to managed drinking water services.

###  Key Concept
- `ROW_NUMBER()` assigns a **unique sequential number** to each row, even if two or more rows have the same value.
- `RANK()` assigns the **same rank** to rows that share the same value in the ordering column.
- When ranks are tied, the next rank is **skipped** (e.g., 1, 2, 2, 4).


In [5]:
%%sql

SELECT
    Country_name,
    Time_period,
    Pct_managed_drinking_water_services,
    RANK() OVER (
        PARTITION BY Time_period
        ORDER BY Pct_managed_drinking_water_services DESC
    ) AS Water_Access_Rank
FROM
    united_nations.Access_to_Basic_Services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services,Water_Access_Rank
Singapore,2015,100.0,1
Bermuda,2015,100.0,1
Greenland,2015,100.0,1
British Virgin Islands,2015,100.0,1
Guadeloupe,2015,100.0,1
Martinique,2015,100.0,1
Saint Barthélemy,2015,100.0,1
Saint Martin (French Part),2015,100.0,1
Bahrain,2015,100.0,1
Cyprus,2015,100.0,1
