# Joins Using the CIA Factbook

The purpose of this notebook is to continue using the CIA Factbook to expand SQL skills. Specifically using Joins

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

## Inner Joins

In [2]:
%%sql
SELECT *
FROM facts
INNER JOIN cities ON cities.facts_id = facts.id
LIMIT 10;

* sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,id_1,name_1,population_1,capital,facts_id
216,aa,Aruba,180,180,0,112162,1.33,12.56,8.18,8.92,1,Oranjestad,37000,1,216
6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21,2,Saint John'S,27000,1,6
184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,3,Abu Dhabi,942000,1,184
184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,4,Dubai,1978000,0,184
184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,5,Sharjah,983000,0,184
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51,6,Kabul,3097000,1,1
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,7,Algiers,2916000,1,3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,8,Oran,783000,0,3
11,aj,Azerbaijan,86600,82629,3971,9780780,0.96,16.64,7.07,0.0,9,Baku,2123000,1,11
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3,10,Tirana,419000,1,2


The above gives us the first 10 rows with columns that are the same in both tables.

In [3]:
%%sql
SELECT c.*, f.name AS country_name 
FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
LIMIT 5;

* sqlite:///factbook.db
Done.


id,name,population,capital,facts_id,country_name
1,Oranjestad,37000,1,216,Aruba
2,Saint John'S,27000,1,6,Antigua and Barbuda
3,Abu Dhabi,942000,1,184,United Arab Emirates
4,Dubai,1978000,0,184,United Arab Emirates
5,Sharjah,983000,0,184,United Arab Emirates


For the next query, I will combine the two tables and return capital cities for each country.

In [4]:
%%sql
SELECT f.name AS country, c.name AS capital_city
FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
WHERE c.capital = 1;

* sqlite:///factbook.db
Done.


country,capital_city
Aruba,Oranjestad
Antigua and Barbuda,Saint John'S
United Arab Emirates,Abu Dhabi
Afghanistan,Kabul
Algeria,Algiers
Azerbaijan,Baku
Albania,Tirana
Armenia,Yerevan
Andorra,Andorra La Vella
Angola,Luanda


## Left Joins

The first query using left joins will select countries that do not have any city information.

In [5]:
%%sql
SELECT f.name AS country, f.population
FROM facts f
LEFT JOIN cities c ON c.facts_id = f.id
WHERE c.facts_id IS NULL;

* sqlite:///factbook.db
Done.


country,population
Kosovo,1870981.0
Monaco,30535.0
Nauru,9540.0
San Marino,33020.0
Singapore,5674472.0
Holy See (Vatican City),842.0
Taiwan,23415126.0
European Union,513949445.0
Ashmore and Cartier Islands,
Christmas Island,1530.0


An interesting thing to note is that the oceans are included in this data. The fact that their population values are None, so they didn't come up earlier when I was looking at min and max population values.

## Other Joins

Types of Joins:
1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. FULL OUTER JOIN

In [6]:
%%sql
SELECT c.name capital_city, f.name country, c.population population
FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
WHERE c.capital = 1
ORDER BY population DESC
LIMIT 10;

* sqlite:///factbook.db
Done.


capital_city,country,population
Tokyo,Japan,37217000
New Delhi,India,22654000
Mexico City,Mexico,20446000
Beijing,China,15594000
Dhaka,Bangladesh,15391000
Buenos Aires,Argentina,13528000
Manila,Philippines,11862000
Moscow,Russia,11621000
Cairo,Egypt,11169000
Jakarta,Indonesia,9769000


Now for a join using a subquery... This may be a little difficult.

Goal:
capital_city, country, population

capital cities with pop > 10 Million ordered from largest to smallest

In [7]:
%%sql
SELECT c.name capital_city, f.name country, c.population population
FROM facts f
INNER JOIN (SELECT * FROM cities
           WHERE capital = 1 AND population > 10000000) c
ON c.facts_id = f.id
ORDER BY population DESC;

* sqlite:///factbook.db
Done.


capital_city,country,population
Tokyo,Japan,37217000
New Delhi,India,22654000
Mexico City,Mexico,20446000
Beijing,China,15594000
Dhaka,Bangladesh,15391000
Buenos Aires,Argentina,13528000
Manila,Philippines,11862000
Moscow,Russia,11621000
Cairo,Egypt,11169000


The following appears to be a complex query:

The query should include:

The following columns, in order:

1. country, the name of the country.
2. urban_pop, the sum of the population in major urban areas belonging to that country.
3. total_pop, the total population of the country.
4. urban_pct, the percentage of the popularion within urban areas, calculated by dividing urban_pop by total_pop.
5. Only countries that have an urban_pct greater than 0.5.
6.  Rows should be sorted by urban_pct in ascending order.


<img src="schema.svg" width = 1000px>

In [21]:
%%sql
SELECT f.name country, c.population AS urban_pop, f.population total_pop, CAST(c.population AS FLOAT) / f.population AS urban_pct 
FROM facts f
INNER JOIN (SELECT facts_id, SUM(population) AS population
            FROM cities
            GROUP BY facts_id) c
ON c.facts_id = f.id
WHERE urban_pct > 0.5
ORDER BY urban_pct;

* sqlite:///factbook.db
Done.


country,urban_pop,total_pop,urban_pct
Uruguay,1672000,3341893,0.5003152404939356
"Congo, Republic of the",2445000,4755097,0.5141850944365594
Brunei,241000,429646,0.5609269026128487
New Caledonia,157000,271615,0.5780240413821034
Virgin Islands,60000,103574,0.5792959623071428
Falkland Islands (Islas Malvinas),2000,3361,0.5950609937518596
Djibouti,496000,828324,0.5987995035758954
Australia,13789000,22751014,0.6060828761302683
Iceland,206000,331918,0.6206352171319423
Israel,5226000,8049314,0.6492478737939655


In [11]:
%%sql 
SELECT facts_id, SUM(population) population
            FROM cities
            GROUP BY facts_id;

* sqlite:///factbook.db
Done.


facts_id,population
1,3097000
10,172000
100,1127000
101,5000
102,546000
103,94000
104,499000
105,1987000
106,772000
107,2720000
