# Analyzing CIA World Factbook data using SQL

## Introduction

In this project, we'll explore and analyze data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/). This database provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [2]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"



The database has two tables: `sqlite_sequence` and `facts`. In this project, we will only use the latter.

In [4]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Description of each column:

* `name` - The name of the country;
* `area` - The country's total area (both land and water) in square kilometers;
* `area_land` - The country's land area in square kilometers;
* `area_water` - The country's water area in square kilometers;
* `population` - The country's population;
* `population_growth`- The country's population growth as a percentage;
* `birth_rate` - The country's birth rate (number of births a year per 1,000 persons);
* `death_rate` - The country's death rate (number of death a year per 1,000 persons);
* `migration_rate` - Compares the difference between the number of persons entering and leaving a country during the year per 1,000 persons.

## Data validation

Let’s calculate the following statistics:
* Minimum population;
* Maximum population;
* Minimum population growth;
* Maximum population growth.

In [5]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
  FROM facts;

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


The extrema of the population seem strange (zero and 7.2 billion). Let’s see this in more detail.

In [7]:
%%sql
SELECT name, MIN(population)
  FROM facts;

 * sqlite:///factbook.db
Done.


name,MIN(population)
Antarctica,0


In [9]:
%%sql
SELECT name, MAX(population)
  FROM facts;

 * sqlite:///factbook.db
Done.


name,MAX(population)
World,7256490011


We can see now that the extrema of the population make sense. Antarctica has no inhabitants, and the 7.2 billion correspond to the world total population. Below is an extract from the CIA Factbook [page for Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html):

<img src = "https://s3.amazonaws.com/dq-content/257/fb_antarctica.png">

In [10]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
  FROM facts
 WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


If we exclude the row for the whole world, the maximum population is now around 1.4 billion, which is a plausible value.

## Average population and area

In [11]:
%%sql
SELECT AVG(population), AVG(area)
  FROM facts
 WHERE name <> 'World';   

 * sqlite:///factbook.db
Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


The average population is around 32 million, and the average area is around 555 thousand square kilometres. Let’s find the countries that have an above average population density (number of people per square kilometer).

In [26]:
%%sql
SELECT name, area, population, population / area AS density
  FROM facts
 WHERE name <> 'World'
 ORDER BY density DESC
 LIMIT 10

 * sqlite:///factbook.db
Done.


name,area,population,density
Macau,28,592731,21168
Monaco,2,30535,15267
Singapore,697,5674472,8141
Hong Kong,1108,7141106,6445
Gaza Strip,360,1869055,5191
Gibraltar,6,29258,4876
Bahrain,760,1346613,1771
Maldives,298,393253,1319
Malta,316,413965,1310
Bermuda,54,70196,1299
