# Analyzing CIA Factbook Data Using SQL

In this project we will be exploring and analyzing data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/) database using SQLite.

This database contains statistics on all the countries of Earth. This information includes the countries populations as of 2015, the annual population growth, and the land and water area, and more.

In [1]:
%%capture

%load_ext sql
%sql sqlite:///database/factbook.db

We begin by loading in our ipython-sql extension and opening our database file with SQLite.

In [2]:
%%sql

SELECT *
    FROM sqlite_master
    WHERE type='table'; 

 * sqlite:///database/factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


We can see that our database contains two tables, `sqlite_sequence` and `facts`. As stated above, in the sql column, we can see that our `facts` table contains our prevelant data.

## Overview

We'll begin by displaying the `facts` table with the first few rows. This will give us an idea of what we're working with.

In [3]:
%%sql

SELECT *
    FROM facts
    LIMIT 10;

 * sqlite:///database/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46
6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21
7,ar,Argentina,2780400,2736690,43710,43431886,0.93,16.64,7.33,0.0
8,am,Armenia,29743,28203,1540,3056382,0.15,13.61,9.34,5.8
9,as,Australia,7741220,7682300,58920,22751014,1.07,12.15,7.14,5.65
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56


You can visit The World Factbook: [Notes and Defs](https://www.cia.gov/library/publications/the-world-factbook/docs/notesanddefs.html) for more information on each column.

## Query: Outlier Countries
Let's create a simple query to display the minimum and maximum values for population and population growth in our table.

In [4]:
%%sql

SELECT MIN(population) smallest_population, 
       MAX(population) largest_population, 
       MIN(population_growth) smallest_growth,
       MAX(population_growth) largest_growth
    FROM facts;

 * sqlite:///database/factbook.db
Done.


smallest_population,largest_population,smallest_growth,largest_growth
0,7256490011,0.0,4.02


Using this information, we can infer that:
* There is a country with 0 population
* There is a country with over 7.25 billion population
* There is a country with 0% growth
* There is a country with 4.02% growth

Let's find out what the countries are that contain these values. We'll start by displaying the rows with the two smallest and largest population countries.

In [5]:
%%sql

SELECT *
    FROM facts
    WHERE population == (SELECT MIN(population)
                            FROM facts)
    OR population == (SELECT MAX(population)
                         FROM facts);

 * sqlite:///database/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000.0,,0,,,,
261,xx,World,,,,7256490011,1.08,18.6,7.8,


While [Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html) is our country with 0 population, but we want countries with a living population. The World is also a stat in our table, and is not a country. Due to this, we will re-run our last query that grabbed the min and max values, while excluding the World and Antarctica.

In [6]:
%%sql

SELECT MIN(population) smallest_population, 
       MAX(population) largest_population, 
       MIN(population_growth) smallest_growth,
       MAX(population_growth) largest_growth
    FROM facts
    WHERE name != "World"
    AND name != "Antarctica";

 * sqlite:///database/factbook.db
Done.


smallest_population,largest_population,smallest_growth,largest_growth
48,1367485388,0.0,4.02


Without including the World and Antarctica rows, our new query tells us that the country with the largest population contains 1.36 billion people, and our country with the smallest population contains 48 people. Let's run a query to find these countries.

In [7]:
%%sql

SELECT *
    FROM facts
    WHERE population == (SELECT MAX(population)
                            FROM facts
                            WHERE name != "World")
    OR population == (SELECT MIN(population)
                         FROM facts
                         WHERE name != 'Antarctica');

 * sqlite:///database/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44
238,pc,Pitcairn Islands,47,47,0,48,0.0,,,


We can see that [China](https://www.cia.gov/library/publications/the-world-factbook/geos/ch.html) has the largest population, and the [Pitcairn Islands](https://www.cia.gov/library/publications/the-world-factbook/geos/pc.html) contain the smallest population.

## Query: Dense Populations

Which countries contain the most dense populations? With the simple equation, $ Density \ Population = \frac{Number \ of \ People}{Land \ Area} $, we can find the number of people per square kilometer.

In [8]:
%%sql

SELECT name,
       population / area_land 'pop:sq_km'
    FROM facts
    WHERE name != 'World'
    ORDER BY "pop:sq_km" DESC
    LIMIT 10;

 * sqlite:///database/factbook.db
Done.


name,pop:sq_km
Macau,21168
Monaco,15267
Singapore,8259
Hong Kong,6655
Gaza Strip,5191
Gibraltar,4876
Bahrain,1771
Maldives,1319
Malta,1310
Bermuda,1299


These 10 countries are the most densly populated, while [Macau](https://www.cia.gov/library/publications/the-world-factbook/geos/mc.html) is the most dense with 21,168 people per square kilometer.

## Query: Water and Land

What countries contain the greatest water to land ratios, and what countries contain more water than land? 

We'll begin by answering the first question with a query that finds the top 10 countries that contain the greatest water to land ratio.

In [9]:
%%sql

SELECT name,
       CAST(area_water AS Float) / area_land AS water_to_land
    FROM facts
    WHERE name != 'World'
    ORDER BY water_to_land DESC
    LIMIT 10;

 * sqlite:///database/factbook.db
Done.


name,water_to_land
British Indian Ocean Territory,905.6666666666666
Virgin Islands,4.520231213872832
Puerto Rico,0.5547914317925592
"Bahamas, The",0.3866133866133866
Guinea-Bissau,0.2846728307254623
Malawi,0.2593962585034013
Netherlands,0.2257103236656536
Uganda,0.2229223744292237
Eritrea,0.1643564356435643
Liberia,0.1562396179401993


[Puerto Rico](https://www.cia.gov/library/publications/the-world-factbook/geos/rq.html) contains the highest water to land ratio that still has more land than water.

While we can easily find the answer to our second question as well, using the above table, we'll create a new query only display the countries with more water to land.

In [10]:
%%sql

SELECT *
    FROM facts
    WHERE area_water > area_land;

 * sqlite:///database/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
228,io,British Indian Ocean Territory,54400,60,54340,,,,,
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67


The [British Indian Ocean Territory](https://www.cia.gov/library/publications/the-world-factbook/geos/io.html) and [Virgin Islands](https://www.cia.gov/library/publications/the-world-factbook/geos/vq.html) are the only two countries with more water than land.

## Query: Greatest Population Growth

What countries will gain the most population next year? This can be answered with the equation $ Growth = \frac{(Population \ \times \ Growth \ Rate)}{100} $

In [11]:
%%sql

SELECT name, 
       CAST((population * population_growth)/100 AS Int) AS pop_gain
    FROM facts
    WHERE name != 'World'
    ORDER BY pop_gain DESC
    LIMIT 10;

 * sqlite:///database/factbook.db
Done.


name,pop_gain
India,15270686
China,6153684
Nigeria,4448270
Pakistan,2906653
Ethiopia,2874562
Bangladesh,2703323
United States,2506677
Indonesia,2355141
"Congo, Democratic Republic of the",1944690
Philippines,1626073


[India](https://www.cia.gov/library/publications/the-world-factbook/geos/in.html) will gain the most population, with 15 million people.

## Query: More Deaths to Births

What countries have more people dying than being born? This can be answered with a simple query.

In [12]:
%%sql

SELECT name, birth_rate, death_rate, 
       ROUND(death_rate - birth_rate, 2) AS more_deaths
    FROM facts
    WHERE death_rate > birth_rate
    ORDER BY more_deaths DESC;

 * sqlite:///database/factbook.db
Done.


name,birth_rate,death_rate,more_deaths
Bulgaria,8.92,14.44,5.52
Serbia,9.08,13.66,4.58
Latvia,10.0,14.31,4.31
Lithuania,10.1,14.27,4.17
Ukraine,10.72,14.46,3.74
Hungary,9.16,12.73,3.57
Germany,8.47,11.42,2.95
Slovenia,8.42,11.37,2.95
Romania,9.14,11.9,2.76
Croatia,9.45,12.18,2.73


There are 24 countries with a higher death rate than birth rate. [Bulgaria](https://www.cia.gov/library/publications/the-world-factbook/geos/bu.html) has the highest number of deaths to births.

## Conclusion

We were able to answer several questions, using queries, during our analysis of of the CIA Factbook.

In conclusion, we found:
* Pitcairn Island contains the smallest population with 48 people
* China contains the largest population with 1,367,485,388 billion people
* Macau has the greatest population density, with 21,168 people per square kilometer
* Puerto Rico contains the greatest water to land ratio, that still contains more land than water
* British Indian Ocean Territory and Virgin Islands contain more water than land
* India will gain the most population with 15,270,686 million new people
* 24 countries contain a death rate higher than the birth rate
* Bulgaria has the most deaths to births