# Analyzing CIA Factbook Data Using SQL

## Introduction
In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), a compendium of statistics about all of the countries on Earth. It contains demographic information like population by 2015 (__population__), the annual population growth rate (__population\_growth__) and the total land and water area (__area__).

Let's start by connecting SQL modules to work with Jupyter and view the data.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

In [2]:
%%sql
SELECT * FROM facts limit 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Columns
Here are the descriptions for some of the columns:

- __name__ - The name of the country.
- __area__ - The total land and sea area of the country.
- __population__ - The country's population.
- __population_growth__ - The country's population growth as a percentage.
- __birth_rate__ - The country's birth rate, or the number of births a year per 1,000 people.
- __death_rate__ - The country's death rate, or the number of death a year per 1,000 people.
- __area__ - The country's total area (both land and water).
- __area_land__ - The country's land area in square kilometers.
- __area_water__ - The country's waterarea in square kilometers.


## Overviewing the data
Let's calculate some summary statistics and look for any outlier countries.

In [3]:
%%sql
SELECT MIN(population) AS 'Minimum population', 
    MAX(population) AS 'Maximum population',
    MIN(population_growth) AS 'Minimum population growth',
    MAX(population_growth) AS 'Maximum population growth' 
FROM facts

Done.


Minimum population,Maximum population,Minimum population growth,Maximum population growth
0,7256490011,0.0,4.02


In [4]:
%%sql
SELECT name, population FROM facts
WHERE population == (SELECT MIN(population) FROM facts)

Done.


name,population
Antarctica,0


In [5]:
%%sql
SELECT name, population FROM facts
WHERE population == (SELECT MAX(population) FROM facts)

Done.


name,population
World,7256490011


## Calculating average values
We can calculate average values for two relevant columns: __population__ and __area__.

In [6]:
%%sql
SELECT ROUND(AVG(population),2) AS 'Average population', 
    ROUND(AVG(area),2) AS 'Average area' 
    FROM facts

Done.


Average population,Average area
62094928.32,555093.55


## Analysing special countries
Let's find _overcrowded_ countries that have __both__:
- Population above average
- Area below average

In [7]:
%%sql
SELECT * FROM facts
WHERE (population > (SELECT AVG(population) FROM facts) AND area < (SELECT AVG(area) FROM facts))
ORDER BY area

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0
