## Introduction
In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- `population`        - the global population.
- `population_growth` - the annual population growth rate, as a percentage.
- `area`              - the total land and water area.

We'll use SQL to analyze data from this database
Lets Start Connect to our database file

In [9]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

Find the table name

In [14]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


## Overview of the Data
Here the data shown in `facts` table

In [8]:
%%sql
SELECT *
  FROM facts
 LIMIT 5

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns:

- `name`       — the name of the country.
- `area`       — the country's total area (both land and water).
- `area_land`  — the country's land area in square kilometers.
- `area_water` — the country's waterarea in square kilometers.
- `population` — the country's population.
- `population_growth` — the country's population growth as a percentage.
- `birth_rate` — the country's birth rate, or the number of births per year per 1,000 people.
- `death_rate` — the country's death rate, or the number of death per year per 1,000 people.

## Summary Statistics

In [65]:
%%sql
SELECT MIN(population) AS 'Minimum Population', 
       MAX(population) AS 'Maximum Population', 
       MIN(population_growth) AS 'Minimum Population Growth', 
       MAX(population_growth) AS 'Maximum Population Growth'
  FROM facts

Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,7256490011,0.0,4.02


A few things stick out from the summary statistics above:

- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)
Let's use subqueries to zoom in on just these countries without using the specific values.

## Exploring Outliers

In [63]:
%%sql
SELECT *
  FROM facts
  WHERE population = (SELECT MAX(population) FROM facts) 
     OR population = (SELECT MIN(population) FROM facts)

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000.0,,0,,,,
261,xx,World,,,,7256490011,1.08,18.6,7.8,


It seems like the table contains a row for the:
 - Whole world, which explains the population of over 7.2 billion.
 - Antarctica, which explains the population of 0. This seems to match the CIA Factbook page for Antarctica:

Lets recompute the statistic by excluding the whole world.

## Summary Statistics Revisited

In [64]:
%%sql
SELECT MIN(population) AS 'Minimum Population', 
       MAX(population) AS 'Maximum Population', 
       MIN(population_growth) AS 'Minimum Population Growth', 
       MAX(population_growth) AS 'Maximum Population Growth'
  FROM facts
  WHERE name != 'World'

Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,1367485388,0.0,4.02


There's a country with maximum population approx 1.4 billion!

## Exploring Average Population and Area
Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.
Ofcourse we exclude the row for whole world.

In [61]:
%%sql
SELECT ROUND(AVG(population)) AS 'Average Population', ROUND(AVG(area)) AS 'Average area'
    FROM facts
    WHERE name != 'World'

Done.


Average Population,Average area
32242667.0,555094.0


We see that the average population is around 32 million and the average area is 555 thousand square kilometers.

## Finding Countries with high population density.
To identify population density we find countries that meets the following criteria:
- The `population` is above average.
- The `area` is below average.

In [67]:
%%sql
SELECT name, population, area
    FROM facts
    WHERE population > (SELECT AVG(population) FROM facts WHERE name != 'World') 
      AND area       < (SELECT AVG(area) FROM facts WHERE name != 'World')

Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Iraq,37056169,438317
Italy,61855120,301340
Japan,126919659,377915
"Korea, South",49115196,99720
Morocco,33322699,446550
Philippines,100998376,300000
Poland,38562189,312685
Spain,48146134,505370


For more precise result,we create a new column represent population density. Population Density are simply the result of `area`/`density` of each country.
Then we sort our reselt by this column to find most densely pupulated countries.

In [70]:
%%sql
SELECT *, population/area AS 'Population_Density'
    FROM facts
    WHERE population > (SELECT AVG(population) FROM facts WHERE name != 'World') 
            AND area < (SELECT AVG(area) FROM facts WHERE name != 'World')
    ORDER BY Population_Density DESC

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,Population_Density
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46,1138
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0,492
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09,336
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0,335
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3,284
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54,263
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24,226
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1,205
182,ug,Uganda,241038,197100,43938,37101745,3.24,43.79,10.69,0.74,153
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0,132


We note that Bangladesh, South Korea, Philippines are among densely populated countries.