#  UNION — Combining Results Across Regions



##  Learning Objectives
By the end of this section, you will be able to:

- Understand the concept of the **UNION** operator and its role in combining multiple query results.  
- Use **UNION** to merge datasets with similar structures from multiple regions.  
- Apply **CASE** and **COALESCE** logic to fill missing data using regional averages for better data completeness.



##  Overview

In this exercise, we aim to create a summary of **estimated unemployment rates** per country and per time period.  
However, some countries may have missing unemployment data.  

To handle this, we’ll use **regional averages** as fallback values to fill missing unemployment rates — ensuring the dataset remains complete and consistent.

### Regional Unemployment Averages

| Region | Pct_regional_unemployment |
|--------|---------------------------|
| Central and Southern Asia | 19.59 |
| Eastern and South-Eastern Asia | 22.64 |
| Europe and Northern America | 24.43 |
| Latin America and the Caribbean | 24.23 |
| Northern Africa and Western Asia | 17.84 |
| Oceania | 4.98 |
| Sub-Saharan Africa | 33.65 |



##  Connecting to the Database

We’ll connect to our **united_nations** database using the `%sql` magic command.



In [1]:
%load_ext sql

In [2]:
%sql mysql+pymysql://root:password@localhost:3306/united_nations


##  1. Fetch Countries in Central and Southern Asia

We start by selecting all countries belonging to the **Central and Southern Asia** region.



In [3]:
%%sql

SELECT
Country_name
FROM
united_nations.access_to_basic_services
WHERE
Region LIKE '%Central and Southern Asia%'
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name
Kazakhstan
Kazakhstan
Kazakhstan
Kazakhstan
Kazakhstan
Kazakhstan
Kyrgyzstan
Kyrgyzstan
Kyrgyzstan
Kyrgyzstan



##  2. Obtain Time Period and Unemployment Data

Next, we’ll retrieve each country’s **region**, **time period**, and **unemployment rate**.



In [4]:
%%sql

SELECT
Country_name,
Region,
Time_period,
Pct_unemployment
FROM
united_nations.access_to_basic_services
WHERE
Region LIKE '%Central and Southern Asia%'
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Region,Time_period,Pct_unemployment
Kazakhstan,Central and Southern Asia,2015,4.93
Kazakhstan,Central and Southern Asia,2016,4.96
Kazakhstan,Central and Southern Asia,2017,4.9
Kazakhstan,Central and Southern Asia,2018,4.85
Kazakhstan,Central and Southern Asia,2019,4.8
Kazakhstan,Central and Southern Asia,2020,4.89
Kyrgyzstan,Central and Southern Asia,2015,
Kyrgyzstan,Central and Southern Asia,2016,
Kyrgyzstan,Central and Southern Asia,2017,
Kyrgyzstan,Central and Southern Asia,2018,



##  3. Impute Missing Values Using Regional Averages

Some records have missing unemployment data.  
We’ll use a **CASE** statement to replace `NULL` values with the regional unemployment average of **19.59**.



In [5]:
%%sql

SELECT
Country_name,
Region,
Time_period,
CASE
WHEN Pct_unemployment IS NULL THEN 19.59
ELSE Pct_unemployment
END AS Pct_unemployment_imputed
FROM
united_nations.access_to_basic_services
WHERE
Region LIKE '%Central and Southern Asia%'
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Region,Time_period,Pct_unemployment_imputed
Kazakhstan,Central and Southern Asia,2015,4.93
Kazakhstan,Central and Southern Asia,2016,4.96
Kazakhstan,Central and Southern Asia,2017,4.9
Kazakhstan,Central and Southern Asia,2018,4.85
Kazakhstan,Central and Southern Asia,2019,4.8
Kazakhstan,Central and Southern Asia,2020,4.89
Kyrgyzstan,Central and Southern Asia,2015,19.59
Kyrgyzstan,Central and Southern Asia,2016,19.59
Kyrgyzstan,Central and Southern Asia,2017,19.59
Kyrgyzstan,Central and Southern Asia,2018,19.59



##  4. Combine All Regions Using UNION

Now that we know how to impute missing data for one region,  
we’ll repeat this for all other regions and merge them using the **UNION** operator.



In [6]:
%%sql

(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 19.59) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Central and Southern Asia%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 22.64) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Eastern and South-Eastern Asia%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 24.43) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Europe and Northern America%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 24.23) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Latin America and the Caribbean%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 17.84) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Northern Africa and Western Asia%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 4.98) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Oceania%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 33.65) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Sub-Saharan Africa%'
)
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/united_nations
20 rows affected.


Country_name,Region,Time_period,Pct_unemployment_imputed
Kazakhstan,Central and Southern Asia,2015,4.93
Kazakhstan,Central and Southern Asia,2016,4.96
Kazakhstan,Central and Southern Asia,2017,4.9
Kazakhstan,Central and Southern Asia,2018,4.85
Kazakhstan,Central and Southern Asia,2019,4.8
Kazakhstan,Central and Southern Asia,2020,4.89
Kyrgyzstan,Central and Southern Asia,2015,19.59
Kyrgyzstan,Central and Southern Asia,2016,19.59
Kyrgyzstan,Central and Southern Asia,2017,19.59
Kyrgyzstan,Central and Southern Asia,2018,19.59



##  5. (Optional) Save as a Database View

To reuse this combined dataset later, we can save it as a **view** called `unemployment_summary`.



In [7]:
%%sql

CREATE OR REPLACE VIEW unemployment_summary AS
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 19.59) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Central and Southern Asia%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 22.64) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Eastern and South-Eastern Asia%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 24.43) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Europe and Northern America%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 24.23) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Latin America and the Caribbean%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 17.84) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Northern Africa and Western Asia%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 4.98) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Oceania%'
)
UNION
(
SELECT Country_name, Region, Time_period,
COALESCE(Pct_unemployment, 33.65) AS Pct_unemployment_imputed
FROM united_nations.access_to_basic_services
WHERE Region LIKE '%Sub-Saharan Africa%'
);

 * mysql+pymysql://root:***@localhost:3306/united_nations
0 rows affected.


[]

In [10]:
%%sql
SHOW FULL TABLES IN united_nations;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Tables_in_united_nations,Table_type
access_to_basic_services,BASE TABLE
basic_services,BASE TABLE
economic_indicators,BASE TABLE
geographic_location,BASE TABLE
unemployment_summary,VIEW



##  Summary

In this notebook, we:
- Used **UNION** to merge unemployment data across multiple regions.  
- Handled missing data using **regional averages**.  
- Created a **view** (`unemployment_summary`) to store the final unified dataset for easy reuse in future queries.  



 *End of UNION Exercise — united_nations Database*
