# Using SQL to analyse the statistics about the countries.

For this project, we will work the data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), which has the statistics about all the countries on Earth. 

The dataset contains demographic information like:
- `population` : The population as of 2015.
- `population_growth` : The annual population growth rate, as a percentage.
- `area` : The total land and water area.

## 1. Loding the data
First, lets connect the Notebook to the database file which is a sql file.

In [5]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

In [10]:
%%sql
SELECT *
    FROM sqlite_master
    WHERE type = 'table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


The above output clearly states that there is a table with name `facts` which has columns like `area_land`, `area_water`, `population` etc 

In [13]:
%%sql
SELECT *
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Quering min and max of population and population growth

In [20]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts;

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


One can infer the following
- There is a country with `0` population
- Max population is `7.2 Billion`
- Minimum population growth is `0`
- Max population greoth is `4.02`

The last two points are probable, but the first two doesn't seem true. Lets explore these rows in the next cell.

## Exploring outliers in the population

In [21]:
%%sql
SELECT *
    FROM facts
    WHERE population = (SELECT MIN(population) FROM FACTS);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


**The country with 0 population is Antartica which looks correct.**

In [23]:
%%sql
SELECT *
    FROM facts
    WHERE population = (SELECT MAX(population) FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


**Lol, the row which has `7.2 Billion` population is not a country but that row belongs to the overall `World`.**

So neglecting world and quering again results in 

In [26]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts
    WHERE name <> 'World';

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


The max population returned now is 1.36 Billion. Looks like China, we will confirm this in the following cell

In [28]:
%%sql
SELECT *
    FROM facts
    WHERE population = (SELECT MAX(population) FROM facts WHERE name <> 'World');

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


## Exploring average value of population and area among the countries

In [29]:
%%sql
SELECT AVG(population), AVG(area)
    FROM facts
    WHERE name <> 'World';

Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


It can be seen that average population is 32 million and average area is 555,000 sq Km^2.

## Finding Densely populated countries

To find densely populated countries we look at the countries which have
- `population` higher than the average population, and,
- `area` below the average

Ofcourse this is not ideal, because countries like 
- `India` may be left out this. These may not meet the second criteria. 
- City states like `Singapore` etc may also be left out. Because they may not have as high population as 32 million in a city.

In [30]:
%%sql
SELECT *
    FROM facts
    WHERE (population > (SELECT AVG(population)
    FROM facts
    WHERE name <> 'World')) AND (area < (SELECT AVG(area)
    FROM facts
    WHERE name <> 'World'));

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


As expected, countries like `Singapore`, `Malta`, `India` and `China` don't feature in this list. So we will change the evalution metric
## Changing the density evaluation metric
Instead of taking above average population and below average area, we will add another column density and list the countries above the average density. This will be more representative.

In [32]:
%%sql
SELECT *, population / CAST(area AS Float) AS density
    FROM facts
    WHERE (name <> 'World') AND (density > (SELECT AVG(population / CAST(area AS Float)) FROM facts WHERE name <> 'World') );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,density
13,ba,Bahrain,760,760,0.0,1346613,2.41,13.66,2.69,13.09,1771.8592105263158
14,bg,Bangladesh,148460,130170,18290.0,168957745,1.6,21.14,5.61,0.46,1138.0691432035565
15,bb,Barbados,430,430,0.0,290604,0.31,11.87,8.44,0.3,675.8232558139534
91,ks,"Korea, South",99720,96920,2800.0,49115196,0.14,8.19,6.75,0.0,492.531046931408
97,le,Lebanon,10400,10230,170.0,6184701,0.86,14.59,4.88,1.1,594.6827884615385
108,mv,Maldives,298,298,0.0,393253,0.08,15.75,3.89,12.68,1319.6409395973155
110,mt,Malta,316,316,0.0,413965,0.31,10.18,9.09,1.98,1310.01582278481
113,mp,Mauritius,2040,2030,10.0,1339827,0.64,13.29,6.91,0.0,656.7779411764706
117,mn,Monaco,2,2,0.0,30535,0.12,6.65,9.24,3.83,15267.5
123,nr,Nauru,21,21,0.0,9540,0.55,24.95,5.87,13.63,454.2857142857143


It is interesting that countries like `India` don't make up this list. So it is essential to manually check what is the average density of the world and of countries like `India`

In [34]:
%%sql
SELECT AVG(population / CAST(area AS Float)) AS avg_density FROM facts WHERE name <> 'World' ;

Done.


avg_density
419.66252469247945


In [37]:
%%sql
SELECT name, (population / CAST(area AS Float)) AS density FROM facts WHERE name in ('India', 'China','Indonesia') ;

Done.


name,density
China,142.4915168970174
India,380.7713541630226
Indonesia,134.41029125224657


**It can be observed that the query outputed is correct and the density of these countries are less than the average**

## Which are the unsafe countries
It would be interesting to see the countries with higher `death rates`

In [59]:
%%sql
SELECT name, birth_rate, death_rate, migration_rate
    FROM facts
    WHERE name <> 'World'
    ORDER BY death_rate DESC;

Done.


name,birth_rate,death_rate,migration_rate
Lesotho,25.47,14.89,7.36
Ukraine,10.72,14.46,2.25
Bulgaria,8.92,14.44,0.29
Guinea-Bissau,33.38,14.33,0.0
Latvia,10.0,14.31,6.26
Chad,36.6,14.28,3.45
Lithuania,10.1,14.27,6.27
Namibia,19.8,13.91,0.0
Afghanistan,38.57,13.89,1.51
Central African Republic,35.08,13.8,0.0
