# Introduction

In this project, I'll be using SQL to analyse data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of statistic about all of the countries on Earth. This contains demographic information such as:

* population - the global population

* population_growth- the annual population growth rate as a percentage

* area - the total land and water area

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

In [2]:
%%sql
SELECT *
FROM sqlite_master
WHERE type = 'table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [3]:
%%sql
SELECT *
FROM facts
LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


**Calculating Summary Statistics**

In [4]:
%%sql
SELECT
MIN(population),
MAX(population),
MIN(population_growth),
MAX(population_growth)
FROM facts;

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


Interestingly, there seems to be a country with a population of 0, and a country with a population of more than 7.2 billion people - let's zoom in on this further and identify what's going on.

In [7]:
%%sql
SELECT
*
FROM facts
WHERE population = (SELECT
                   MIN(population)
                   FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [8]:
%%sql
SELECT
*
FROM facts
WHERE population = (SELECT
                   MAX(population)
                   FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


It seems as if the table contains a row for Antarctica, which explains the lack of population as Antarctica has no indigenous population that lives there year round.  It also seems to contain a table for the statistics for the whole world, so let's exclude this.

In [10]:
%%sql
SELECT
MIN(population),
MAX(population),
MIN(population_growth),
MAX(population_growth)
FROM facts
WHERE name != 'World';


Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


We can see that the highest populated country has close to 1.4 billion people living in it.

**Calculating Average Population and Area**

In [12]:
%%sql
SELECT
AVG(population),
AVG(area)
FROM facts
WHERE name != 'World';

Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


We can see that the average population is around 32 million, and the average area is around 555000 square kilometers.

**Identifying Countries with Above Average Population and Below Average Area - Higher Population Densities**

In [15]:
%%sql
SELECT *
FROM facts
WHERE population > (SELECT AVG(population)
                    FROM facts 
                    WHERE name != 'World')
AND area < (SELECT AVG(area) 
            FROM facts 
            WHERE name != 'World');

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


It seems as there are 14 countries that fulfill the criteria of having an above average population and below average area - which implies these are some of the most densely populated countries in the world.

**Countries with the most people and highest growth rate**

In [17]:
%%sql
SELECT
*
FROM facts
WHERE name != 'World'
ORDER BY population DESC
LIMIT 1;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


In [18]:
%%sql
SELECT
*
FROM facts
WHERE name != 'World'
ORDER BY population_growth DESC
LIMIT 1;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329,,,12042910,4.02,36.91,8.18,11.47


China has the highest population of any country, and South Sudan has the highest population growth rate of any country.

**Water to Land Ratios**

In [23]:
%%sql
SELECT
*,
CAST(area_water as FLOAT) / area_land AS 'water_land_ratio'
FROM facts
WHERE name != 'World'
ORDER BY water_land_ratio DESC
LIMIT 10;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,water_land_ratio
228,io,British Indian Ocean Territory,54400,60,54340,,,,,,905.6666666666666
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67,4.520231213872832
246,rq,Puerto Rico,13791,8870,4921,3598357.0,0.6,10.86,8.67,8.15,0.5547914317925592
12,bf,"Bahamas, The",13880,10010,3870,324597.0,0.85,15.5,7.05,0.0,0.3866133866133866
71,pu,Guinea-Bissau,36125,28120,8005,1726170.0,1.91,33.38,14.33,0.0,0.2846728307254623
106,mi,Malawi,118484,94080,24404,17964697.0,3.32,41.56,8.41,0.0,0.2593962585034013
125,nl,Netherlands,41543,33893,7650,16947904.0,0.41,10.83,8.66,1.95,0.2257103236656536
182,ug,Uganda,241038,197100,43938,37101745.0,3.24,43.79,10.69,0.74,0.2229223744292237
56,er,Eritrea,117600,101000,16600,6527689.0,2.25,30.0,7.52,0.0,0.1643564356435643
99,li,Liberia,111369,96320,15049,4195666.0,2.47,34.41,9.69,0.0,0.1562396179401993


In [26]:
%%sql
SELECT
name,
area_water,
area_land
FROM facts
WHERE name != 'World'
AND area_water > area_land
ORDER BY area_water DESC
LIMIT 10;

Done.


name,area_water,area_land
British Indian Ocean Territory,54340,60
Virgin Islands,1564,346


The British Indian Ocean territory and Virgin Islands have by far the highest ratios of water to land, 