# Analysing World Population
### An SQL Based Analysis Project

This simple project explores the different population data and area statistics, which are brought using CIA World Factbook Database. The URL for this is [here](https://www.cia.gov/library/publications/the-world-factbook/)


## Preparing Environment

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

## Exploring

The below code identifies the tables in the databse.

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


From the above, it seems that the 'facts' table is the main statistics data table. We're going to explore them both anyways.

In [4]:
%%sql
SELECT *
  FROM sqlite_sequence
 LIMIT 5;

Done.


name,seq
facts,261


Doesn't really provide much.

In [5]:
%%sql
SELECT *
  FROM facts
  LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


The above columns description are:
1. name — the name of the country.
2. area — the country's total area (both land and water).
3. area_land — the country's land area in square kilometers.
4. area_water — the country's waterarea in square kilometers.
5. population — the country's population.
6. population_growth— the country's population growth as a percentage.
7. birth_rate — the country's birth rate, or the number of births per year per 1,000 people.
8. death_rate — the country's death rate, or the number of death per year per 1,000 people.

## Statistics

In here, we're calculating some population statistics which are:
1. Minimum population
2. Maximum population
3. Minimum population growth
4. Maximum population growth

### Minimum Population

In [6]:
%%sql
SELECT name, MIN(population)
  FROM facts;

Done.


name,MIN(population)
Antarctica,0


The Antarctica has 0 population. More information about this is available [here](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html). We can exclude the entry and check the next result.

In [7]:
%%sql
SELECT name, MIN(population)
  FROM facts
 WHERE name != 'Antarctica';

Done.


name,MIN(population)
Pitcairn Islands,48


### Maximum Population

In [8]:
%%sql
SELECT name, MAX(population)
  FROM facts;

Done.


name,MAX(population)
World,7256490011


It seems that the whole world is there as an entry. We will exclude this entry as well and check the next result.

In [9]:
%%sql
SELECT name, MAX(population)
  FROM facts
 WHERE name != 'World';

Done.


name,MAX(population)
China,1367485388


### Minimum Population Growth

In [10]:
%%sql
SELECT name, MIN(population_growth)
  FROM facts;

Done.


name,MIN(population_growth)
Holy See (Vatican City),0.0


### Maximum Population Growth

In [11]:
%%sql
SELECT name, MAX(population_growth)
  FROM facts;

Done.


name,MAX(population_growth)
South Sudan,4.02


### Population and Area Calculations

#### Average Population

In [12]:
%%sql
SELECT AVG(population)
  FROM facts
 WHERE name != 'World';

Done.


AVG(population)
32242666.56846473


#### Average Area size

In [13]:
%%sql
SELECT AVG(area)
  FROM facts
 WHERE name != 'World';

Done.


AVG(area)
555093.546184739


Here, we will tag each country comparing it to the average number of population. This demonstrates nested SQL queries.

In [14]:
%%sql
SELECT name, CASE
             WHEN population > (SELECT AVG(population) FROM facts) THEN 'Above Average'
             ELSE 'Below Average'
             END AS 'Population Ratio'
  FROM facts;

Done.


name,Population Ratio
Afghanistan,Below Average
Albania,Below Average
Algeria,Below Average
Andorra,Below Average
Angola,Below Average
Antigua and Barbuda,Below Average
Argentina,Below Average
Armenia,Below Average
Australia,Below Average
Austria,Below Average


We can count the countried above and below average:

In [57]:
%%sql
SELECT (SELECT COUNT(name)
               FROM facts
              WHERE population < (SELECT AVG(population) FROM facts) 
            ) AS 'Population Below Avg. Count', 
       (SELECT COUNT(name)
               FROM facts
              WHERE population > (SELECT AVG(population) FROM facts)
            ) AS 'Population Above Avg. Count'


Done.


Population Below Avg. Count,Population Above Avg. Count
218,24


# Thank you.