# Analyzing CIA World Factbook with SQL

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth, using only SQL. According to the [CIA website](https://www.cia.gov/library/publications/the-world-factbook/index.html) the World Factbook provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities.

We'll work with a reduced version of this database that can downloaded [here](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db). The objective of the project is just to write some SQL queries to extract some interesting information from the CIA Factbook about the countries on Earth. 


![](https://www.imagemhost.com.br/images/2020/04/06/world.png)

Image: <a href="https://www.freepik.com/free-photos-vectors/background">Background vector created by evening_tao - www.freepik.com</a>

First, let's set up the SQL environment and the open the database.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

Now let's take a look in the tables inside the database.

In [2]:
%%sql
SELECT * FROM sqlite_master WHERE type = 'table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,facts,facts,2,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float, ""created_at"" datetime, ""updated_at"" datetime)"
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"


This database only has one table called facts and that's the one we're using. Let's check its first 5 rows.

In [3]:
%%sql
SELECT * FROM facts LIMIT 5 

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,created_at,updated_at
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51,2015-11-01 13:19:49.461734,2015-11-01 13:19:49.461734
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3,2015-11-01 13:19:54.431082,2015-11-01 13:19:54.431082
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,2015-11-01 13:19:59.961286,2015-11-01 13:19:59.961286
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0,2015-11-01 13:20:03.659945,2015-11-01 13:20:03.659945
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46,2015-11-01 13:20:08.625072,2015-11-01 13:20:08.625072


Now, let's see the largest and the smallest population and the highest and lowest population growth ratio.

In [4]:
%%sql
SELECT MAX(population) as max_pop, 
MIN(population) as min_pop, 
MAX(population_growth) as max_pop_growth, 
MIN(population_growth) as min_pop_growth
FROM facts;

 * sqlite:///factbook.db
Done.


max_pop,min_pop,max_pop_growth,min_pop_growth
1367485388,48,4.02,0.0


Populations of over seven billion people and of zero people does not seem to be right. Thta's because the table has a 'World' row and a 'Antartic' row. These rows might lead us to some mistakes, for example, if we calculated the average population, the world's population of over 7 billion people and the Antartic's population of 0 would make us have wrong results. That is why we'll delete these rows. Then, we'll be able to see that China is the most populated country with over 1.3 billion people and the Pitcairn Islands is the least populated country with only 48 people.

In [5]:
%%sql
DELETE FROM facts 
WHERE id = 261

 * sqlite:///factbook.db
0 rows affected.


[]

In [6]:
%%sql
DELETE FROM facts 
WHERE id = 250

 * sqlite:///factbook.db
0 rows affected.


[]

Now we're looking for the most populated country.


In [7]:
%%sql
SELECT *, MAX(population) FROM facts;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,created_at,updated_at,MAX(population)
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44,2015-11-01 13:22:53.813142,2015-11-01 13:22:53.813142,1367485388


And now for the least populated one.



In [8]:
%%sql 
SELECT *, MIN(population) FROM facts;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,created_at,updated_at,MIN(population)
238,pc,Pitcairn Islands,47,47,0,48,0.0,,,,2015-11-01 13:38:08.047849,2015-11-01 13:38:08.047849,48


Now let's see the average poopulation and average area.

In [9]:
%%sql
SELECT AVG(population) as avg_pop, AVG(area) as avg_area FROM facts;

 * sqlite:///factbook.db
Done.


avg_pop,avg_area
32377011.0125,555093.546184739


We'll now write a query that will show us the the countries whose populations are above average and whose area are below average. Those countries have a high population density.

In [10]:
%%sql
SELECT * FROM facts
WHERE population > (SELECT AVG(population) FROM facts)
and area < (SELECT AVG(area) FROM facts) ;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,created_at,updated_at
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46,2015-11-01 13:20:52.753843,2015-11-01 13:20:52.753843
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24,2015-11-01 13:25:21.942190,2015-11-01 13:25:21.942190
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62,2015-11-01 13:26:41.627918,2015-11-01 13:26:41.627918
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1,2015-11-01 13:26:58.014646,2015-11-01 13:26:58.014646
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0,2015-11-01 13:27:08.040081,2015-11-01 13:27:08.040081
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0,2015-11-01 13:27:39.881765,2015-11-01 13:27:39.881765
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36,2015-11-01 13:29:56.754568,2015-11-01 13:29:56.754568
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09,2015-11-01 13:31:23.643550,2015-11-01 13:31:23.643550
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46,2015-11-01 13:31:29.189900,2015-11-01 13:31:29.189900
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31,2015-11-01 13:33:21.563195,2015-11-01 13:33:21.563195


Thinking about the countries areas, the next query will count how many countries have have the majority of their area covered with water.

In [11]:
%%sql
SELECT count(*) FROM facts
WHERE area_water > area_land ; 

 * sqlite:///factbook.db
Done.


count(*)
2


And only two of them do.

So we'll query the top ten countries with the highest ratio of their total land covered by water.

In [12]:
%%sql
SELECT name, cast(area_water as float) / cast(area as float)
as ratio FROM facts
ORDER BY ratio  DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,ratio
British Indian Ocean Territory,0.9988970588235294
Virgin Islands,0.818848167539267
Puerto Rico,0.3568269161047059
"Bahamas, The",0.2788184438040346
Guinea-Bissau,0.2215916955017301
Malawi,0.2059687383950575
Netherlands,0.1841465469513516
Uganda,0.1822866104099768
Eritrea,0.141156462585034
Liberia,0.135127369375679


We can see that only the British Indian Ocean Territory and the Virgin Islands have more water than land. Two countries, as expected.

The next two queries will show us the top ten countries with the highest and lowest birth rates.

In [13]:
%%sql
SELECT name, birth_rate	FROM facts
ORDER BY birth_rate DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,birth_rate
Niger,45.45
Mali,44.99
Uganda,43.79
Zambia,42.13
Burkina Faso,42.03
Burundi,42.01
Malawi,41.56
Somalia,40.45
Angola,38.78
Mozambique,38.58


It's interesting to notice that all the top ten countries with the highs birth rates are placed in Africa.

Some countries do not have the information for birth rate, so to find the lowest birth rates top ten, we'll filter the countries with birth rate greater than 0.

In [14]:
%%sql
SELECT name, birth_rate	FROM facts
WHERE birth_rate > 0
ORDER BY birth_rate 
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,birth_rate
Monaco,6.65
Saint Pierre and Miquelon,7.42
Japan,7.93
Andorra,8.13
"Korea, South",8.19
Singapore,8.27
Slovenia,8.42
Germany,8.47
Taiwan,8.47
San Marino,8.63


These countries are basically placed in Europe and Asia, with the exception of the north american Saint-Pierre et Miquelon.

Now let's see the countries with highest death rate.

In [15]:
%%sql
SELECT name, death_rate	FROM facts
ORDER BY death_rate	DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,death_rate
Lesotho,14.89
Ukraine,14.46
Bulgaria,14.44
Guinea-Bissau,14.33
Latvia,14.31
Chad,14.28
Lithuania,14.27
Namibia,13.91
Afghanistan,13.89
Central African Republic,13.8


In this top ten there are only european, asian and african countries.

Now let's see the lowest death rates, also filtering the ones greater than 0.

In [16]:
%%sql
SELECT name, death_rate	FROM facts
WHERE death_rate > 0
ORDER BY death_rate	
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,death_rate
Qatar,1.53
United Arab Emirates,1.97
Kuwait,2.18
Bahrain,2.69
Gaza Strip,3.04
Turks and Caicos Islands,3.1
Saudi Arabia,3.33
Oman,3.36
Singapore,3.43
West Bank,3.5


Notice that with the exception of the north american Turks and Caicos Islands, all the countries with lowest death rates are placed in Asia.

Finally, we'll write a query that will calculate the  difference between the birth rate and the death rate of each country. We'll have this data ordered in descending order.

In [17]:
%%sql
SELECT name, cast(birth_rate as float) - cast(death_rate as float) as birth_death_dif FROM facts
WHERE birth_death_dif > 0
ORDER BY birth_death_dif DESC

 * sqlite:///factbook.db
Done.


name,birth_death_dif
Malawi,33.150000000000006
Uganda,33.1
Niger,33.03
Burundi,32.739999999999995
Mali,32.1
Burkina Faso,30.31
Zambia,29.46
Ethiopia,29.080000000000005
South Sudan,28.73
Tanzania,28.39
