# Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the [https://www.cia.gov/the-world-factbook/](CIA World Factbook), a compendium of statistics about all of the countries on Earth. The Factbook contains various demographic and geographic information, like the global population, the annual population growth rate, the total land and water area, and so on.

In [3]:
# connect jupyter to database file

In [4]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

## Data overview

We'll start by getting some information on the tables in the database.

In [5]:
%%sql
SELECT *
    FROM sqlite_master
    WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [6]:
%%sql
SELECT *
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Data columns

The different columns are as follows:

- *Code*: short version (2 letters) of the country name.
- *name*: name of the country.
- *area*: total area that the country occupies, sea area included. 
- *area_land*: land area that the country occupies.
- *area_water*: sea area that the country occupies.
- *population*: number of inhabitants. 
- *population_growth*: rate of population growth as a percentage. 
- *birth_rate*: number of births over a year per x people.
- *death_rate*: number of deaths over a year per x people.
- *migration_rate*: number of people that leave the country over a year per x people.


## Factbook statistics

In the following cells we will explore some relevant statistics from the CIA Factbook.

We will now write a query that returns the minimum and maximum population and the minimum and maximum population growth.

In [27]:
%%sql
SELECT 
    MIN(population) AS min_population,
    MAX(population) AS max_population,
    MIN(population_growth) AS min_population_growth,
    MAX(population_growth) AS max_population_growth
FROM facts

Done.


min_population,max_population,min_population_growth,max_population_growth
0,7256490011,0.0,4.02


The results are somewhat suprising; there appears to be a country whose population is 0 and another one who's population is equal to that of the whole world. Let's see which are these countries and see if there are any errors in the data.

In [12]:
%%sql
SELECT name, population
FROM facts
WHERE population = (
        SELECT MIN(population)
        FROM facts)
OR population = (
        SELECT MAX(population)
        FROM facts);

Done.


name,population
Antarctica,0
World,7256490011


We can see that the two "countries" with minimum and maximum populations are not really countries, but rather the continent of Antartica which has no indigenous or permanent inhabitants and the maximum equates to the population of the whole world. 

We will now run the same query as before but we will exclude Antarctica and the World. 

In [14]:
%%sql
SELECT 
    MIN(population),
    MAX(population),
    MAX(population_growth),
    MIN(population_growth)
    FROM facts
WHERE name != 'Antarctica' 
  AND name != 'World';

Done.


MIN(population),MAX(population),MAX(population_growth),MIN(population_growth)
48,1367485388,4.02,0.0


The reults are more believeable now, as the least inhabited country has a population of 48 and the most populated one has a population of around 1.37 bilion people. These countries are: 

In [19]:
%%sql
SELECT name, population
FROM facts
WHERE (population = (
        SELECT MIN(population)
        FROM facts
        WHERE
        name != 'Antarctica')
OR population = (
        SELECT MAX(population)
        FROM facts
        WHERE
        name != 'World'));

Done.


name,population
China,1367485388
Pitcairn Islands,48


The most inhabited country is China and the least is the Pitcairn Islands, a British Overseas Territory.

We will now calculate the average value for the population and area columns.

In [24]:
%%sql
SELECT 
    ROUND(AVG(population),2) AS Average_Population, 
    ROUND(AVG(area), 2) AS Average_Area
FROM facts;

Done.


Average_Population,Average_Area
62094928.32,555093.55


### Dens

We'll now identify countries that have both above-average values for population and below-average values for area.

In [26]:
%%sql
SELECT 
    name, area, population, 
    ROUND(CAST(population AS FLOAT) / area, 2) AS population_density
    FROM facts
    WHERE population > (
        SELECT AVG(population)
        FROM facts) 
    AND area < (
        SELECT AVG(area)
        FROM facts)
    ORDER BY population_density DESC;

Done.


name,area,population,population_density
Bangladesh,148460,168957745,1138.07
Philippines,300000,100998376,336.66
Japan,377915,126919659,335.84
Vietnam,331210,94348835,284.86
United Kingdom,243610,64088222,263.08
Germany,357022,80854408,226.47
Thailand,513120,67976405,132.48


There are 7 such countries, the most densely populated being Bangladesh. 

### Water to land

Which countries have the highest ratios of water to land? Which countries have more water than land?

In [30]:
%%sql
SELECT name, area_land, area_water
FROM facts
WHERE area_land < area_water;

Done.


name,area_land,area_water
British Indian Ocean Territory,60,54340
Virgin Islands,346,1564


### Population growth

Which are the countries whose population is set to grow the fastest?

In [28]:
%%sql
SELECT name, population_growth
FROM facts
ORDER BY population_growth DESC
LIMIT 10;

Done.


name,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24
Qatar,3.07
Burkina Faso,3.03
Mali,2.98
Cook Islands,2.95
Iraq,2.93


Most countries with the highest population growth are set in sub-saharan Africa, the highest being South-Sudan.

### Death and birth rates

Which countries have a higher death rate than birth rate?

In [29]:
%%sql
SELECT name, death_rate, birth_rate
FROM facts
WHERE death_rate > birth_rate
ORDER BY population_growth DESC
LIMIT 10;

Done.


name,death_rate,birth_rate
Saint Pierre and Miquelon,9.72,7.42
Latvia,14.31,10.0
Lithuania,14.27,10.1
Moldova,12.59,12.0
Ukraine,14.46,10.72
Bulgaria,14.44,8.92
Austria,9.42,9.41
Estonia,12.4,10.51
Serbia,13.66,9.08
Romania,11.9,9.14
