# Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- `population` — the global population.
- `population_growth` — the annual population growth rate, as a percentage.
- `area` — the total land and water area.

## Connecting to the Database

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

## Exploring the Tables

In [4]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


## Exploring the Data

In [6]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Summary Statistics

First we'll calculate some summary statistics using the data, specifically the minimum and maximum `population` and `population_growth`.

In [11]:
%%sql
SELECT MIN(population) AS population_min,
       MAX(population) AS population_max,
       MIN(population_growth) AS population_growth_min,
       MAX(population_growth) AS population_growth_max
  FROM facts;


Done.


population_min,population_max,population_growth_min,population_growth_max
0,7256490011,0.0,4.02


In the next step, let's determine what countries correspond to the minimum and maximum population.

In [61]:
%%sql
SELECT name AS country_min_population
  FROM facts
 WHERE population = (SELECT MIN(population)
                       FROM facts);   

Done.


country_min_population
Antarctica


In [62]:
%%sql
SELECT name AS country_max_population
  FROM facts
 WHERE population = (SELECT MAX(population)
                       FROM facts);  

Done.


country_max_population
World


Since 'World' isn't a country, we'll recalculate our summary statistics excluding the row for 'World', and calculate the average value for `population` and `area`.

In [17]:
%%sql
SELECT MIN(population) AS population_min,
       MAX(population) AS population_max,
       MIN(population_growth) AS population_growth_min,
       MAX(population_growth) AS population_growth_max
  FROM facts
 WHERE population != (SELECT MIN(population)
                       FROM facts);

Done.


population_min,population_max,population_growth_min,population_growth_max
48,7256490011,0.0,4.02


In [18]:
%%sql
SELECT AVG(population) AS avg_population,
       AVG(area) AS avg_area
  FROM facts;

Done.


avg_population,avg_area
62094928.32231405,555093.546184739


Next, we'll build on our average value queries to find countries that are densely populated by identifying countries with above-average values for `population` and below-average values for `area`.

In [39]:
%%sql
SELECT name AS densely_populated_countries
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM facts
                      WHERE name != "World")
   AND area < (SELECT AVG(area)
                 FROM facts
                WHERE name != "World");

Done.


densely_populated_countries
Bangladesh
Germany
Iraq
Italy
Japan
"Korea, South"
Morocco
Philippines
Poland
Spain


To conclude we'll explore a series of questions:
- Which country has the most people? **China**
- Which country has the highest growth rate? **South Sudan**

In [59]:
%%sql
SELECT name AS 'country', MAX(population) AS population_max 
  FROM facts
 WHERE name != 'World';

Done.


country,population_max
China,1367485388


In [58]:
%%sql 
SELECT name AS 'country', MAX(population_growth) AS population_growth_max
  FROM facts;

Done.


country,population_growth_max
South Sudan,4.02


- Which countries have the highest ratios of water to land? **British Indian Ocean Territory, Virgin Islands**
- Which countries have more water than land? **British Indian Ocean Territory, Virgin Islands**

In [57]:
 %%sql
    SELECT name AS 'country', CAST(area_water/area_land AS FLOAT) AS ratio_water_land
      FROM facts
     WHERE ratio_water_land != 0
  ORDER BY ratio_water_land DESC
     LIMIT 5;

Done.


country,ratio_water_land
British Indian Ocean Territory,905.0
Virgin Islands,4.0


In [63]:
%%sql
SELECT name AS country
  FROM facts
 WHERE area_water > area_land;

Done.


country
British Indian Ocean Territory
Virgin Islands


- Which countries will add the most people to their populations next year? **South Sudan, American Samoa, Syria, Micronesia, Tonga.**

In [56]:
%%sql
  SELECT name AS 'country', ROUND(birth_rate + migration_rate - death_rate,2) AS growth_rate
    FROM facts
ORDER BY growth_rate DESC
   LIMIT 5;

Done.


country,growth_rate
South Sudan,40.2
American Samoa,39.27
Syria,37.96
"Micronesia, Federated States of",37.24
Tonga,35.99


- Which countries have a higher death rate than birth rate? **Top 5 ordered by variance between `death_rate` and `birth_rate`: Bulgaria, Serbia, Latvia, Lithuania, Ukraine**

In [55]:
%%sql
  SELECT name AS 'country', death_rate, birth_rate, ROUND(death_rate - birth_rate,2) AS death_birth_variance
    FROM facts
   WHERE death_rate > birth_rate
ORDER BY death_birth_variance DESC
   LIMIT 5;

Done.


country,death_rate,birth_rate,death_birth_variance
Bulgaria,14.44,8.92,5.52
Serbia,13.66,9.08,4.58
Latvia,14.31,10.0,4.31
Lithuania,14.27,10.1,4.17
Ukraine,14.46,10.72,3.74


- Which countries have the highest `population/area` ratio?

In [54]:
%%sql
  SELECT name AS 'country', population/area AS population_area_ratio
    FROM facts
ORDER BY population_area_ratio DESC
LIMIT 5;

Done.


country,population_area_ratio
Macau,21168
Monaco,15267
Singapore,8141
Hong Kong,6445
Gaza Strip,5191
