#  Subquery in the JOIN Clause

In this notebook, we explore **how to use a subquery in the JOIN clause** to make SQL queries more efficient and dynamic.

### Learning Objectives
By the end of this exercise, you will be able to:
- Use the result set of a **subquery** in the main query by joining it to another table.
- Perform **aggregate calculations** once and use them in another query via a **JOIN**.
- Understand how subqueries in JOINs improve performance compared to correlated subqueries.



### Overview
Previously, we used a **correlated subquery** to calculate each country’s land area percentage relative to its sub-region.  
However, that approach recalculated totals for every row — not very efficient.

In this exercise, we’ll:
1. First calculate the **total land area per sub-region** (using a subquery).  
2. Then join that result with the **Access_to_Basic_Services** table to calculate each country’s **percentage of regional land** more efficiently.

We’ll be using the `Access_to_Basic_Services` table in the `united_nations` database, which contains columns such as:
- `Region`
- `Sub_region`
- `Country_name`
- `Land_area`



In [1]:
%load_ext sql

In [4]:
%%sql

-- Step 2: Calculate the total land area for each sub-region
-- We aggregate by Sub_region to get the total land area across all countries in that sub-region.
SELECT 
    Sub_region,
    SUM(Land_area) AS TotalLandArea
FROM 
    Access_to_Basic_Services
GROUP BY 
    Sub_region;


 * mysql+pymysql://root:***@localhost:3306/united_nations
18 rows affected.


Sub_region,TotalLandArea
Central Asia,22494091.0
Southern Asia,28620812.47
Eastern Asia,68078997.4
South-Eastern Asia,23191692.0
Northern America,56256564.0
Caribbean,1240352.0
Central America,14712480.0
South America,92408352.0
Northern Africa,39665646.0
Western Asia,20931430.0


## 2️.  Calculate Country Land Area Percentages Using a Subquery in the JOIN Clause

Now that we have the total land area per sub-region,  
we can **join** that summary back to the main table (`Access_to_Basic_Services`) to compute  
each country's percentage share of its sub-region's total land area.

The **JOIN** will match records on the `Sub_region` column.


In [5]:
%%sql

SELECT 
    a.Country_name,
    a.Sub_region,
    a.Land_area,
    (a.Land_area / b.TotalLandArea) * 100 AS Pct_of_region_land
FROM 
    Access_to_Basic_Services AS a
JOIN 
    (
        SELECT 
            Sub_region,
            SUM(Land_area) AS TotalLandArea
        FROM 
            Access_to_Basic_Services
        GROUP BY 
            Sub_region
    ) AS b
ON 
    a.Sub_region = b.Sub_region
GROUP BY 
    a.Country_name, a.Sub_region, a.Land_area, b.TotalLandArea
ORDER BY 
    a.Sub_region, Pct_of_region_land DESC;


 * mysql+pymysql://root:***@localhost:3306/united_nations
227 rows affected.


Country_name,Sub_region,Land_area,Pct_of_region_land
Australia,Australia and New Zealand,7692020.0,16.118306
Australia,Australia and New Zealand,7682300.0,16.097938
New Zealand,Australia and New Zealand,263310.0,0.551755
Cuba,Caribbean,104100.0,8.392779
Cuba,Caribbean,104040.0,8.387941
Cuba,Caribbean,103800.0,8.368592
Dominican Republic,Caribbean,48310.0,3.894862
Haiti,Caribbean,27560.0,2.22195
Jamaica,Caribbean,10830.0,0.873139
Puerto Rico,Caribbean,8870.0,0.71512


##  Summary

 In this notebook, we learned how to:
- Use a **subquery inside a JOIN** to make queries more efficient.
- Avoid recalculating aggregates for every row (as with correlated subqueries).  
- Combine results dynamically by joining on shared columns.

 The result shows each country’s **land area** and **percentage contribution** to its sub-region’s total — calculated once, cleanly, and efficiently.
