# Analyzing CIA Factbook Data Using SQL

Let's connect to the SQLite database.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

## Overview of the Data 

Let's write a query to view some of the data with which we will be working. 

In [9]:
%%sql
SELECT *
  FROM facts
  LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Let's start by calculating some summary statistics and look for any outlier countries.

Let's write a query that return the minimum population, maximum population, minimum population growth, and maximum population growth.

In [13]:
%%sql
SELECT MIN(population) 'Minimum population', MAX(population) 'Maximum population', MIN(population_growth) 'Minimum population growth', MAX(population_growth) 'Maximum population growth'
  FROM facts;

Done.


Minimum population,Maximum population,Minimum population growth,Maximum population growth
0,7256490011,0.0,4.02


## Exploring Outliers

Let's see what countries correspond to these minimums and maximums.

In [15]:
%%sql
SELECT MIN(population) 'Minimum population', name 'Country'
FROM facts
GROUP BY (SELECT MIN(population) FROM facts);

Done.


Minimum population,Country
0,Antarctica


In [16]:
%%sql
SELECT MAX(population) 'Maximum population', name 'Country'
FROM facts
GROUP BY (SELECT MAX(population) FROM facts);

Done.


Maximum population,Country
7256490011,World


## Exploring Average Population and Area

Let's calculate the averages for the population and area in the table.

In [17]:
%%sql
SELECT AVG(population) 'Population Average', AVG(area) 'Area Average'
  FROM facts;

Done.


Population Average,Area Average
62094928.32231405,555093.546184739


In [18]:
%%sql
SELECT name 'Country', population, area
  FROM facts
  WHERE population>(SELECT AVG(population) FROM facts)
  AND area<(SELECT AVG(area) FROM facts);

Done.


Country,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Japan,126919659,377915
Philippines,100998376,300000
Thailand,67976405,513120
United Kingdom,64088222,243610
Vietnam,94348835,331210
