# Recreating the `Access_to_Basic_Services` Dataset

###  Learning Objectives
By the end of this exercise, you will:
- Understand how Entity-Relationship Diagrams (ERDs) can help us design joins.
- Learn how to use the **LEFT JOIN** technique to combine tables.
- Recognize how the choice of join strategy affects the accuracy of query results.

###  Overview
In this task, we’ll simulate a small relational schema using the existing `access_to_basic_services` table.  
We will extract information from it to create three separate tables:
1. **Geographic_Location**
2. **Economic_Indicators**
3. **Basic_Services**

These tables represent different entities in our `united_nations` database, and we’ll later join them to demonstrate how LEFT JOINs work in SQL.


## Step 1: Connect to the MySQL Database
We start by loading the SQL extension and connecting to our local MySQL database named `united_nations`.


In [11]:
%reload_ext sql

In [14]:
%sql mysql+pymysql://root:password@localhost:3306/united_nations

## Step 2: Drop Any Existing Tables (Clean Environment)
Before creating new tables, we ensure a clean workspace by dropping any older versions of the tables.


In [20]:
%%sql
DROP TABLE IF EXISTS Geographic_Location;
DROP TABLE IF EXISTS Economic_Indicators;
DROP TABLE IF EXISTS Basic_Services;

 * mysql+pymysql://root:***@localhost:3306/united_nations
0 rows affected.
0 rows affected.
0 rows affected.


[]

## Step 3: Create the Three Core Tables

We’ll now extract relevant information from the `access_to_basic_services` table to create our relational structure.


In [21]:
%%sql

-- ️Create the Geographic_Location table
CREATE TABLE Geographic_Location AS
SELECT
    DISTINCT
    Country_name,
    Region,
    Sub_region,
    'Capital City' AS Town_name
FROM
    access_to_basic_services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
182 rows affected.


[]

In [22]:
%%sql

-- Create the Economic_Indicators table
CREATE TABLE Economic_Indicators AS
SELECT
    Country_name,
    Time_period,
    Est_gdp_in_billions,
    Est_population_in_millions,
    Pct_unemployment,
    Land_area
FROM
    access_to_basic_services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


[]

In [23]:
%%sql

--  Create the Basic_Services table
CREATE TABLE Basic_Services AS
SELECT
    Country_name,
    Time_period,
    Pct_managed_drinking_water_services AS Access_to_clean_water,
    Pct_managed_sanitation_services AS Access_to_sanitation
FROM
    access_to_basic_services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


[]

## Step 4: Verify the Created Tables
Let's confirm that our new tables exist and preview their structure.


In [24]:
%%sql
SHOW TABLES;


 * mysql+pymysql://root:***@localhost:3306/united_nations
4 rows affected.


Tables_in_united_nations
access_to_basic_services
basic_services
economic_indicators
geographic_location


In [25]:
%%sql
SELECT * FROM Geographic_Location LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Region,Sub_region,Town_name
Kazakhstan,Central and Southern Asia,Central Asia,Capital City
Kyrgyzstan,Central and Southern Asia,Central Asia,Capital City
Tajikistan,Central and Southern Asia,Central Asia,Capital City
Turkmenistan,Central and Southern Asia,Central Asia,Capital City
Uzbekistan,Central and Southern Asia,Central Asia,Capital City


In [26]:
%%sql
SELECT * FROM Economic_Indicators LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment,Land_area
Kazakhstan,2015,184.39,17.542806,4.93,2699700.0
Kazakhstan,2016,137.28,17.794055,4.96,2699700.0
Kazakhstan,2017,166.81,18.037776,4.9,2699700.0
Kazakhstan,2018,179.34,18.276452,4.85,2699700.0
Kazakhstan,2019,181.67,18.513673,4.8,2699700.0


In [27]:
%%sql
SELECT * FROM Basic_Services LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Access_to_clean_water,Access_to_sanitation
Kazakhstan,2015,94.67,98.0
Kazakhstan,2016,94.67,98.0
Kazakhstan,2017,95.0,98.0
Kazakhstan,2018,95.0,98.0
Kazakhstan,2019,95.0,98.0


###  Exercise 1: First LEFT JOIN
Combine the `Geographic_Location` table with the `Economic_Indicators` table based on the `Country_name` column.


In [28]:
%%sql

SELECT
    geo.Country_name,
    geo.Region,
    geo.Sub_region,
    eco.Time_period,
    eco.Est_gdp_in_billions,
    eco.Est_population_in_millions
FROM
    Geographic_Location AS geo
LEFT JOIN
    Economic_Indicators AS eco
ON
    geo.Country_name = eco.Country_name
LIMIT 10;


 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Region,Sub_region,Time_period,Est_gdp_in_billions,Est_population_in_millions
Kazakhstan,Central and Southern Asia,Central Asia,2020,171.08,18.755666
Kazakhstan,Central and Southern Asia,Central Asia,2019,181.67,18.513673
Kazakhstan,Central and Southern Asia,Central Asia,2018,179.34,18.276452
Kazakhstan,Central and Southern Asia,Central Asia,2017,166.81,18.037776
Kazakhstan,Central and Southern Asia,Central Asia,2016,137.28,17.794055
Kazakhstan,Central and Southern Asia,Central Asia,2015,184.39,17.542806
Kyrgyzstan,Central and Southern Asia,Central Asia,2020,,
Kyrgyzstan,Central and Southern Asia,Central Asia,2019,,
Kyrgyzstan,Central and Southern Asia,Central Asia,2018,,
Kyrgyzstan,Central and Southern Asia,Central Asia,2017,,


###  Exercise 2: Second LEFT JOIN
Extend the previous query by adding the `Basic_Services` table, joining again on `Country_name`.


In [30]:
%%sql
SELECT
    geo.Country_name,
    geo.Region,
    eco.Time_period AS economic_time,
    eco.Est_gdp_in_billions,
    eco.Est_population_in_millions,
    bs.Time_period AS service_time,
    bs.Access_to_clean_water,
    bs.Access_to_sanitation
FROM
    Geographic_Location AS geo
LEFT JOIN
    Economic_Indicators AS eco
ON
    geo.Country_name = eco.Country_name
LEFT JOIN
    Basic_Services AS bs
ON
    geo.Country_name = bs.Country_name
LIMIT 10;


 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Region,economic_time,Est_gdp_in_billions,Est_population_in_millions,service_time,Access_to_clean_water,Access_to_sanitation
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,2020,95.0,98.0
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,2019,95.0,98.0
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,2018,95.0,98.0
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,2017,95.0,98.0
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,2016,94.67,98.0
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,2015,94.67,98.0
Kazakhstan,Central and Southern Asia,2019,181.67,18.513673,2020,95.0,98.0
Kazakhstan,Central and Southern Asia,2019,181.67,18.513673,2019,95.0,98.0
Kazakhstan,Central and Southern Asia,2019,181.67,18.513673,2018,95.0,98.0
Kazakhstan,Central and Southern Asia,2019,181.67,18.513673,2017,95.0,98.0


###  Exercise 3: Refined LEFT JOIN
To ensure accurate alignment between datasets, we include both `Country_name` **and** `Time_period` in our join condition.


In [31]:
%%sql

SELECT
    geo.Country_name,
    geo.Region,
    eco.Time_period,
    eco.Est_gdp_in_billions,
    eco.Est_population_in_millions,
    bs.Access_to_clean_water,
    bs.Access_to_sanitation
FROM
    Geographic_Location AS geo
LEFT JOIN
    Economic_Indicators AS eco
ON
    geo.Country_name = eco.Country_name
LEFT JOIN
    Basic_Services AS bs
ON
    geo.Country_name = bs.Country_name
    AND eco.Time_period = bs.Time_period
LIMIT 10;


 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Region,Time_period,Est_gdp_in_billions,Est_population_in_millions,Access_to_clean_water,Access_to_sanitation
Kazakhstan,Central and Southern Asia,2020,171.08,18.755666,95.0,98.0
Kazakhstan,Central and Southern Asia,2019,181.67,18.513673,95.0,98.0
Kazakhstan,Central and Southern Asia,2018,179.34,18.276452,95.0,98.0
Kazakhstan,Central and Southern Asia,2017,166.81,18.037776,95.0,98.0
Kazakhstan,Central and Southern Asia,2016,137.28,17.794055,94.67,98.0
Kazakhstan,Central and Southern Asia,2015,184.39,17.542806,94.67,98.0
Kyrgyzstan,Central and Southern Asia,2020,,,92.67,97.67
Kyrgyzstan,Central and Southern Asia,2019,,,91.67,97.33
Kyrgyzstan,Central and Southern Asia,2018,,,91.33,97.33
Kyrgyzstan,Central and Southern Asia,2017,,,91.0,97.33


##  Summary

| Step | Description | Tables Involved |
|------|--------------|----------------|
| 1 | Created new tables from `access_to_basic_services` | All |
| 2 | Used `LEFT JOIN` to combine data | `Geographic_Location`, `Economic_Indicators` |
| 3 | Extended join to include `Basic_Services` | All three |
| 4 | Refined join by matching `Time_period` | Improved accuracy |

Through this exercise, we learned how **Entity-Relationship structures** and **join logic** influence the quality and integrity of analytical results.
