# Guided Project: Analyzing CIA Factbook Data Using SQL

## Introduction

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

* `area` — the total land and sea area of the country.
* `population` — the country's population.
* `population_growth`— the country's population growth as a percentage.

We'll use the following code to connect our Jupyter Notebook to our database file:

In [1]:
%reload_ext sql
%sql sqlite:///factbook.db

'Connected: @factbook.db'

In [2]:
# Query the database
%sql SELECT * FROM sqlite_master WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [3]:
# Select the first five rows of the facts table
%sql SELECT * FROM facts LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Summary Statistics

Here are the descriptions for some of the columns:

* `name` — the name of the country.
* `area` — the total land and sea area of the country.
* `population` — the country's population.
* `population_growth`— the country's population growth as a percentage.
* `birth_rate` — the country's birth rate, or the number of births a year per 1,000 people.
* `death_rate` — the country's death rate, or the number of death a year per 1,000 people.
* `area`— the country's total area (both land and water).
* `area_land` — the country's land area in [square kilometers](https://www.cia.gov/library/publications/the-world-factbook/rankorder/2147rank.html).
* `area_water` — the country's water area in square kilometers.

Let's write a single query that returns the following:

* Minimum population
* Maximum population
* Minimum population growth
* Maximum population growth

In [4]:
# Write a single query that returns the data above
%sql SELECT MIN(population) AS min_pop, MAX(population) AS max_pop, MIN(population_growth) AS min_pop_growth, MAX(population_growth) AS max_pop_growth FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


## Exploring Outliers

We see a few interesting things in the summary statistics on the previous screen:

* There's a country with a population of `0`
* There's a country with a population of `7256490011` (or more than 7.2 billion people)

Let's use subqueries to zoom in on just these countries without using the specific values.

In [5]:
# Select a country with the minimum population
%sql SELECT * FROM facts WHERE population == (SELECT MIN(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [6]:
# Select a country with the maximum population
%sql SELECT * FROM facts WHERE population == (SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


## Exploring Average Population and Area

Since the maximum value refers to the world, we should recalculate the maximum value excluding the row for the whole world.

In [7]:
# Recalculate the statistics excluding the row for the whole world
%sql SELECT MIN(population) AS min_pop, MAX(population) AS max_pop, MIN(population_growth) AS min_pop_growth, MAX(population_growth) as max_pop_growth FROM facts WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


In [8]:
# Calculate the average value for population and area
%sql SELECT AVG(population) AS avg_pop, AVG(area) AS avg_area FROM facts;

 * sqlite:///factbook.db
Done.


avg_pop,avg_area
62094928.32231405,555093.546184739


## Finding Densely Populated Countries

To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have the following:

* Above-average values for population
* Below-average values for area

In [9]:
# Finding all countries whose population is above-average but the area is below-average
%sql SELECT * FROM facts WHERE population > (SELECT AVG(population) FROM facts WHERE name != 'World') AND area < (SELECT AVG(area) FROM facts WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


## Next Steps

We have several things to explore:

* Which country has the most people? Which country has the highest growth rate?
* Which countries have the highest ratios of water to land? Which countries have more water than land?
* Which countries will add the most people to their populations next year?
* Which countries have a higher death rate than birth rate?
* Which countries have the highest `population/area` ratio, and how does it compare to list we found in the previous screen?

In [10]:
# Finding the most populated country
%sql SELECT * FROM facts WHERE population == (SELECT MAX(population) FROM facts WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


In [11]:
# Finding the country with highest population growth
%sql SELECT * FROM facts WHERE population_growth == (SELECT MAX(population_growth) FROM facts WHERE NAME <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329,,,12042910,4.02,36.91,8.18,11.47


In [12]:
# Finding a country with the highest ratio of water to land
%sql SELECT * FROM facts WHERE area_water/area_land == (SELECT MAX(area_water/area_land) FROM facts WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
228,io,British Indian Ocean Territory,54400,60,54340,,,,,


In [13]:
# Find a country with more water than land
%sql SELECT * FROM facts WHERE area_water > area_land;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
228,io,British Indian Ocean Territory,54400,60,54340,,,,,
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67


In [14]:
# Finding a country that will add the most people to their populations
%sql SELECT * FROM facts WHERE (birth_rate - death_rate) = (SELECT MAX((birth_rate - death_rate)) FROM facts WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
106,mi,Malawi,118484,94080,24404,17964697,3.32,41.56,8.41,0.0


In [15]:
# Find a country with a higher death rate than birth date
%sql SELECT * FROM facts WHERE death_rate > birth_rate AND name <> 'World';

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56
16,bo,Belarus,207600,202900,4700,9589689,0.2,10.7,13.36,0.7
22,bk,Bosnia and Herzegovina,51197,51187,10,3867055,0.13,8.87,9.75,0.38
26,bu,Bulgaria,110879,108489,2390,7186893,0.58,8.92,14.44,0.29
44,hr,Croatia,56594,55974,620,4464844,0.13,9.45,12.18,1.39
47,ez,Czech Republic,78867,77247,1620,10644842,0.16,9.63,10.34,2.33
57,en,Estonia,45228,42388,2840,1265420,0.55,10.51,12.4,3.6
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
67,gr,Greece,131957,130647,1310,10775643,0.01,8.66,11.09,2.32
75,hu,Hungary,93028,89608,3420,9897541,0.22,9.16,12.73,1.33


In [16]:
# Finding a country with the highest population/area ratio
%sql SELECT * FROM facts WHERE population/area = (SELECT MAX(population/area) FROM facts WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
205,mc,Macau,28,28,0,592731,0.8,8.88,4.22,3.37
