# CIA World Factbook

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- population — the global population.
- population_growth — the annual population growth rate, as a percentage.
- area — the total land and water area.

## Summary and Statistics
- Entries such as World, and Ocean are included in the dataset and may contain 0 or NONE in its columns.
- Anomolies in the dataset can be verified by visiting https://www.cia.gov/the-world-factbook/ (See Ethiopia)


- China with 1,367,485,388 and India with 1,251,695,584 are the countries with the most people
- South Sudan has the highest growth rate with 4.02%
- Only two countries have a more water than land, Vritish Indian Ocean Territory and the Virgin Islands.
- There are also entries for the world's oceans.  
- Ethiopia has no land in its entry, while the website shows over 1 million sqkm in land.  This seems be an error.
    - https://www.cia.gov/the-world-factbook/static/a4fed22ec4e788b2ddb29a1acfbdc1a0/ET-summary.pdf
- India will add the most to its population with about 15 million people
- Bulgaria has the highest death to birth ratio
- Macau and Monoco have the highest population to area ratio.
- Bangladesh, which was the highest in the previous list, does not even show up in the top 10.

Let's start by loading in our SQL extension and reading in the dataset.

In [148]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

In [149]:
%%sql
SELECT *
FROM sqlite_master
WHERE type = 'table'

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [150]:
%%sql
/*Return first five rows of fact database*/

SELECT *
FROM facts
LIMIT 5

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


#### Here are the descriptions for some of the columns:

- name — the name of the country.
- area— the country's total area (both land and water).
- area_land — the country's land area in square kilometers.
- area_water — the country's waterarea in square kilometers.
- population — the country's population.
- population_growth— the country's population growth as a percentage, calculated from a surplus (or deficit) of births over deaths and the balance of migrants entering and leaving a country.
- birth_rate — the country's birth rate, or the number of births per year per 1,000 people.
- death_rate — the country's death rate, or the number of death per year per 1,000 people.
- migration_rate — the difference between the number of persons entering and leaving a country during the year per 1,000 persons.

## Checking the Data

In [151]:
%%sql
/* Return the minumum and maximum of population and population growth */

SELECT min(population), max(population), min(population_growth), max(population_growth)
FROM facts

Done.


min(population),max(population),min(population_growth),max(population_growth)
0,7256490011,0.0,4.02


In [152]:
%%sql
/* Returns the country names with the least and most population */

SELECT name, min(population) as population
FROM facts

UNION ALL

SELECT name, max(population) as population
FROM facts

Done.


name,population
Antarctica,0
World,7256490011


### Observations:
- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)

It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook page for Antarctica at the time.

Now that we know this, we should recalculate the summary statistics we calculated earlier — this time excluding the row for the whole world.

In [153]:
%%sql

/* Calculate Population Statistics excluding World*/
SELECT
       min(population) as min_pop, 
       max(population) as max_pop,
       min(population_growth) as min_pop_growth, 
       max(population_growth) as max_pop_growth
FROM facts
WHERE name <> 'World'

Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


In [154]:
%%sql

/* Calculate Average Population and Area*/
SELECT avg(population) as average_population,
       avg(area) as average_area
FROM facts

Done.


average_population,average_area
62094928.32231405,555093.546184739


To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have the following:

- Above-average values for population.
- Below-average values for area.

In [155]:
%%sql
/* Show the most densely populated countries with above average values for population and below average values for area */

SELECT name,
       population as abv_avg_pop,
       area as blw_avg_area,
       population / area AS pop_area_ratio
FROM facts
WHERE (abv_avg_pop > (SELECT avg(population) 
                         FROM facts))
AND   (blw_avg_area < (SELECT avg(area) 
                         FROM facts))
ORDER BY 4 DESC

Done.


name,abv_avg_pop,blw_avg_area,pop_area_ratio
Bangladesh,168957745,148460,1138
Philippines,100998376,300000,336
Japan,126919659,377915,335
Vietnam,94348835,331210,284
United Kingdom,64088222,243610,263
Germany,80854408,357022,226
Thailand,67976405,513120,132


### Observations
- Bangladesh seems to be the most densely populated area by a significant amount.
- Phillipines and Japan are similarly populated.


## More questions about the dataset:

Which country has the most people?

In [156]:
%%sql
/* Calculates the top 5 countries with most people, except for world */

SELECT f1.name as Country,
       f1.population as Population
FROM facts f1
WHERE name <> 'World'
order by 2 DESC
LIMIT 5

Done.


Country,Population
China,1367485388
India,1251695584
European Union,513949445
United States,321368864
Indonesia,255993674


Which country has the highest growth rate?

In [157]:
%%sql
/* Show top 5 countries with the highest population growth rate */

SELECT name as Country, population_growth
FROM facts
ORDER BY 2 DESC
LIMIT 5

Done.


Country,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24


Which countries have the highest ratios of water to land? Which countries - have more water than land?

In [158]:
%%sql
/* Show countries with the highest ratio of water to land */

SELECT name as Country,
       area_land,
       area_water,
       ROUND(area_water / CAST(area_land AS float), 3) AS water_land_ratio
FROM facts
ORDER BY 4 DESC

Done.


Country,area_land,area_water,water_land_ratio
British Indian Ocean Territory,60.0,54340.0,905.667
Virgin Islands,346.0,1564.0,4.52
Puerto Rico,8870.0,4921.0,0.555
"Bahamas, The",10010.0,3870.0,0.387
Guinea-Bissau,28120.0,8005.0,0.285
Malawi,94080.0,24404.0,0.259
Netherlands,33893.0,7650.0,0.226
Uganda,197100.0,43938.0,0.223
Eritrea,101000.0,16600.0,0.164
Liberia,96320.0,15049.0,0.156


### Observations:
- China with 1,367,485,388 and India with 1,251,695,584 are the countries with the most people
- South Sudan has the highest growth rate with 4.02%
- Only two countries have a more water than land, Vritish Indian Ocean Territory and the Virgin Islands.
- There are also entries for the world's oceans.  
- Ethiopia has no land in its entry, while the website shows over 1 million sqkm in land.  This seems be an error.
- See https://www.cia.gov/the-world-factbook/static/a4fed22ec4e788b2ddb29a1acfbdc1a0/ET-summary.pdf

In [159]:
%%sql
/* Show Ethiopia's incorrect data */

SELECT name,
       area_land,
       area_water
FROM facts
WHERE name = 'Ethiopia'

Done.


name,area_land,area_water
Ethiopia,,104300


## Continuted: More questions about the dataset
Which countries will add the most people to their populations next year?

Since the population_growth_rate column already takens into account the birth, death, and migration columns, we can just multiply the growth percentage to the current population.

In [160]:
%%sql
/* Top 5 countries with the most population growth by number */

SELECT name Country,
       population,
       population_growth 'pop_gro_%',
       ROUND(population * (population_growth/100), 3) as 'pop_gro_#'
FROM facts
WHERE name <> 'World'
ORDER BY 4 DESC
LIMIT 5

Done.


Country,population,pop_gro_%,pop_gro_#
India,1251695584,1.22,15270686.125
China,1367485388,0.45,6153684.246
Nigeria,181562056,2.45,4448270.372
Pakistan,199085847,1.46,2906653.366
Ethiopia,99465819,2.89,2874562.169


Which countries have a higher death rate than birth rate?

In [161]:
%%sql
/* Top 5 Countries with highest death to birth ratio */

SELECT name Country,
       birth_rate,
       death_rate,
       ROUND(death_rate / birth_rate, 2) death_birth_ratio
FROM facts
WHERE death_rate > birth_rate
ORDER BY 4 DESC
LIMIT 5

Done.


Country,birth_rate,death_rate,death_birth_ratio
Bulgaria,8.92,14.44,1.62
Serbia,9.08,13.66,1.5
Latvia,10.0,14.31,1.43
Lithuania,10.1,14.27,1.41
Hungary,9.16,12.73,1.39


Which countries have the highest population/area ratio, and how does it compare to list we found in the previous screen?

In [162]:
%%sql
/* Top 10 countries that are the most dense, similar to earlier chart, but without constrained averages */

SELECT name,
       population,
       area,
       population / area AS pop_area_ratio
FROM facts
ORDER BY 4 DESC
LIMIT 10

Done.


name,population,area,pop_area_ratio
Macau,592731,28,21168
Monaco,30535,2,15267
Singapore,5674472,697,8141
Hong Kong,7141106,1108,6445
Gaza Strip,1869055,360,5191
Gibraltar,29258,6,4876
Bahrain,1346613,760,1771
Maldives,393253,298,1319
Malta,413965,316,1310
Bermuda,70196,54,1299


### Obersvations
- India will add the most to its population with about 15 million people
- Bulgaria has the highest death to birth ratio
- Macau and Monoco have the highest population to area ratio.
- Bangladesh, which was the highest in the previous list, does not even show up in the top 10.

## Summary and Statistics
- Entries such as World, and Ocean are included in the dataset and may contain 0 or NONE in its columns.
- Anomolies in the dataset can be verified by visiting https://www.cia.gov/the-world-factbook/


- China with 1,367,485,388 and India with 1,251,695,584 are the countries with the most people
- South Sudan has the highest growth rate with 4.02%
- Only two countries have a more water than land, Vritish Indian Ocean Territory and the Virgin Islands.
- There are also entries for the world's oceans.  
- Ethiopia has no land in its entry, while the website shows over 1 million sqkm in land.  This seems be an error.
    - https://www.cia.gov/the-world-factbook/static/a4fed22ec4e788b2ddb29a1acfbdc1a0/ET-summary.pdf
- India will add the most to its population with about 15 million people
- Bulgaria has the highest death to birth ratio
- Macau and Monoco have the highest population to area ratio.
- Bangladesh, which was the highest in the previous list, does not even show up in the top 10.