In [1]:
#Preparing sql environment
%load_ext sql

In [2]:
%sql mysql+pymysql://root:0479%40Kanya@127.0.0.1:3306/united_nations

**Task 1: Calculate the average GDP for each region**

Start by calculating the average GDP for each country’s region for 2020 using the AVG(Est_gdp_in_billions) OVER(PARTITION BY Region) window function.

In [4]:
%%sql
SELECT
    Region,
    Country_name,
    Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services,
    Est_gdp_in_billions,
    AVG(Est_gdp_in_billions) OVER(PARTITION BY Region) AS Avg_gdp_for_region
FROM united_nations.Access_to_Basic_Services
WHERE Time_period = 2020;

 * mysql+pymysql://root:***@127.0.0.1:3306/united_nations
165 rows affected.


Region,Country_name,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_gdp_in_billions,Avg_gdp_for_region
Central and Southern Asia,Kazakhstan,95.0,98.0,171.08,338.7381818181818
Central and Southern Asia,Kyrgyzstan,92.67,98.0,,338.7381818181818
Central and Southern Asia,Tajikistan,85.0,96.0,8.13,338.7381818181818
Central and Southern Asia,Turkmenistan,100.0,99.0,,338.7381818181818
Central and Southern Asia,Uzbekistan,98.0,100.0,59.89,338.7381818181818
Central and Southern Asia,Afghanistan,80.33,54.0,20.14,338.7381818181818
Central and Southern Asia,Bangladesh,97.67,54.0,373.9,338.7381818181818
Central and Southern Asia,Bhutan,97.33,77.0,2.33,338.7381818181818
Central and Southern Asia,India,91.0,72.0,2667.69,338.7381818181818
Central and Southern Asia,Iran (Islamic Republic of),96.67,88.0,,338.7381818181818


**Task 2: Filter the data**

Next, let’s filter the data to focus only on the Sub-Saharan African countries with underdeveloped economies and also limited access to managed drinking water services in 2020.

In [5]:
%%sql
SELECT
    Region,
    Country_name,
    Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services,
    Est_gdp_in_billions,
    AVG(Est_gdp_in_billions) OVER(PARTITION BY Region) AS Avg_gdp_for_region
FROM
    united_nations.Access_to_Basic_Services
WHERE
    Region = 'Sub-Saharan Africa'
    AND Time_period = 2020
    AND Pct_managed_drinking_water_services < 60
    AND Est_gdp_in_billions < Avg_gdp_for_region;

 * mysql+pymysql://root:***@127.0.0.1:3306/united_nations
(pymysql.err.OperationalError) (1054, "Unknown column 'Avg_gdp_for_region' in 'where clause'")
[SQL: SELECT
    Region,
    Country_name,
    Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services,
    Est_gdp_in_billions,
    AVG(Est_gdp_in_billions) OVER(PARTITION BY Region) AS Avg_gdp_for_region
FROM
    united_nations.Access_to_Basic_Services
WHERE
    Region = 'Sub-Saharan Africa'
    AND Time_period = 2020
    AND Pct_managed_drinking_water_services < 60
    AND Est_gdp_in_billions < Avg_gdp_for_region;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


**Task 3: Implement the solution using subqueries**

We can fix the error by using a subquery to calculate the average regional GDP, and then use those results in the main query.

In [6]:
%%sql
SELECT
    Country_name,
    Region,
    Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services,
    Est_gdp_in_billions,
    Avg_gdp_for_region
FROM (
    SELECT
        Region,
        Country_name,
        Pct_managed_drinking_water_services,
        Pct_managed_sanitation_services,
        Est_gdp_in_billions,
        AVG(Est_gdp_in_billions) OVER(PARTITION BY Region) AS Avg_gdp_for_region
    FROM
        united_nations.Access_to_Basic_Services
    WHERE
        Time_period = 2020
    ) AS Avg_world_GDP_2020
WHERE
    Region = 'Sub-Saharan Africa'
    AND Pct_managed_drinking_water_services < 60
    AND Est_gdp_in_billions < Avg_gdp_for_region;

 * mysql+pymysql://root:***@127.0.0.1:3306/united_nations
6 rows affected.


Country_name,Region,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_gdp_in_billions,Avg_gdp_for_region
Madagascar,Sub-Saharan Africa,56.33,13,13.05,39.04131578947368
Somalia,Sub-Saharan Africa,57.33,40,6.88,39.04131578947368
Central African Republic,Sub-Saharan Africa,38.33,15,2.33,39.04131578947368
Chad,Sub-Saharan Africa,52.67,19,10.72,39.04131578947368
Burkina Faso,Sub-Saharan Africa,53.33,25,17.93,39.04131578947368
Niger,Sub-Saharan Africa,57.33,25,13.74,39.04131578947368


**Task 4: Implement the solution using Common Table Expressions (CTEs)**

Now, let's attempt the same problem using Common Table Expressions (CTEs).

In [7]:
%%sql
-- This CTE calculates the average regional GDP for each country, for the year 2020.
WITH Avg_world_GDP_2020 AS (
    SELECT
        Region,
        Country_name,
        Pct_managed_drinking_water_services,
        Pct_managed_sanitation_services,
        Est_gdp_in_billions,
        AVG(Est_gdp_in_billions) OVER(PARTITION BY Region) AS Avg_gdp_for_region
    FROM
        united_nations.Access_to_Basic_Services
    WHERE
        Time_period = 2020
)

/*
This query filters the Avg_world_GDP_2020 CTE for
countries that have below-average GDP, in the
Sub-Saharan Africa region, and struggling with water access.
*/

SELECT
    Country_name
FROM
    Avg_world_GDP_2020
WHERE
    Region = 'Sub-Saharan Africa'
    AND Pct_managed_drinking_water_services < 60
    AND Est_gdp_in_billions < Avg_gdp_for_region;

 * mysql+pymysql://root:***@127.0.0.1:3306/united_nations
6 rows affected.


Country_name
Madagascar
Somalia
Central African Republic
Chad
Burkina Faso
Niger
