## Introduction

In [None]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

## Overview of Data

Let's get a sense of what the data looks like

In [5]:
%%sql

SELECT *
FROM sqlite_master
WHERE type = 'table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [6]:
%%sql
SELECT *
FROM facts
LIMIT 5

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns:

- name - The name of the country.
- area - The total land and sea area of the country.
- population - The country's population.
- population_growth- The country's population growth as a percentage.
- birth_rate - The country's birth rate, or the number of births a year per 1,000 people.
- death_rate - The country's death rate, or the number of death a year per 1,000 people.
- area- The country's total area (both land and water).
- area_land - The country's land area in square kilometers.
- area_water - The country's waterarea in square kilometers.

Let's start by calculating some summary statistics and see what they tell us.

In [7]:
%%sql
SELECT MIN(population) AS min_population,
       MAX(population) AS max_population,
       MIN(population_growth) AS min_population_growth,
       MAX(population_growth) AS max_population_growth
    FROM facts

 * sqlite:///factbook.db
Done.


min_population,max_population,min_population_growth,max_population_growth
0,7256490011,0.0,4.02



A few things stick out from the summary statistics in the last screen:

- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)

Let's use subqueries to zoom in on just these countries without using the specific values.

## Exploring Outliers

In [18]:
%%sql
SELECT name, MIN(population)
FROM facts

 * sqlite:///factbook.db
Done.


name,MIN(population)
Antarctica,0


It seems like the table contains a row for Antarctica, which explains the population of 0.

In [9]:
%%sql
SELECT name, MAX(population)
FROM facts

 * sqlite:///factbook.db
Done.


name,MAX(population)
World,7256490011


We also see that the table contains a row for the whole world, which explains the maximum population of over 7.2 billion we found earlier.

Now that we know this, we should recalculate the summary statistics we calculated earlier, while excluding the row for the whole world.

## Summary Statistics Revisited

In [10]:
%%sql
SELECT MIN(population) AS min_population,
       MAX(population) AS max_population,
       MIN(population_growth) AS min_population_growth,
       MAX(population_growth) AS max_population_growth
    FROM facts
    WHERE name <> "World"

 * sqlite:///factbook.db
Done.


min_population,max_population,min_population_growth,max_population_growth
0,1367485388,0.0,4.02


## Exploring Average Population and Area
Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.

We should take care of discarding the row for the whole planet.

In [11]:
%%sql
SELECT AVG(population) AS average_pop, AVG(area) AS avg_area
    FROM facts
    WHERE name <> "World"

 * sqlite:///factbook.db
Done.


average_pop,avg_area
32242666.56846473,555093.546184739


We see that the average population is around 32 million and the average area is 555 thousand square kilometers.

## Finding Densely Populated Countries

To finish, we'll build on the query above to find countries that are densely populated. We'll identify countries that have:

- Above average values for population.
- Below average values for area.

In [17]:
%%sql
SELECT name, population
    FROM facts
    WHERE population > (SELECT AVG(population) AS average_pop
    FROM facts
    WHERE name <> "World")
    AND area < (SELECT AVG(area) AS avg_area
    FROM facts
    WHERE name <> "World")

 * sqlite:///factbook.db
Done.


name,population
Bangladesh,168957745
Germany,80854408
Iraq,37056169
Italy,61855120
Japan,126919659
"Korea, South",49115196
Morocco,33322699
Philippines,100998376
Poland,38562189
Spain,48146134


## Largest Country and Highest Growth Rate

In [21]:
%%sql

SELECT name, MAX(population) as population
    FROM facts
    WHERE name <> 'World'

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


China is the largest country in the world

In [22]:
%%sql

SELECT name, MAX(population_growth) as highest_growth
FROM facts
WHERE name <> 'World'

 * sqlite:///factbook.db
Done.


name,highest_growth
South Sudan,4.02


South Sudan has the highest population growth rate. This is an emerging country!

## Water to Land Ratio

In [39]:
%%sql
SELECT name, area_water, area_land, MAX(ROUND(CAST(area_water AS FLOAT)/CAST(area_land AS FLOAT) , 4))
FROM facts
WHERE name <> 'World'

 * sqlite:///factbook.db
Done.


name,area_water,area_land,"MAX(ROUND(CAST(area_water AS FLOAT)/CAST(area_land AS FLOAT) , 4))"
British Indian Ocean Territory,54340,60,905.6667


In [41]:
%%sql
SELECT name, area_water, area_land, ROUND(CAST(area_water AS FLOAT)/CAST(area_land AS FLOAT) , 4) AS water_to_land
FROM facts
WHERE name <> 'World' AND water_to_land > 0.5

 * sqlite:///factbook.db
Done.


name,area_water,area_land,water_to_land
British Indian Ocean Territory,54340,60,905.6667
Puerto Rico,4921,8870,0.5548
Virgin Islands,1564,346,4.5202


## Which Country will add the most people to their population next year?

In [47]:
%%sql
SELECT name, MAX(population * CAST(population_growth*0.01 AS FLOAT)) AS Max_people_added
FROM facts
WHERE name <> 'World'

 * sqlite:///factbook.db
Done.


name,Max_people_added
India,15270686.1248


## Which Country have a higher death rate than birth rate?

In [49]:
%%sql
SELECT name, birth_rate,death_rate
FROM facts
WHERE name <> 'World' AND birth_rate < death_rate

 * sqlite:///factbook.db
Done.


name,birth_rate,death_rate
Austria,9.41,9.42
Belarus,10.7,13.36
Bosnia and Herzegovina,8.87,9.75
Bulgaria,8.92,14.44
Croatia,9.45,12.18
Czech Republic,9.63,10.34
Estonia,10.51,12.4
Germany,8.47,11.42
Greece,8.66,11.09
Hungary,9.16,12.73


Almost all of these countries that have a higher death rate than birth rate are European. This could indicate this generation in Europe is less inclined to have kids than the elderly generation, or an alarming health crisis.

In [53]:
%%sql
SELECT name, ROUND(CAST(population AS FLOAT)/CAST(area AS FLOAT),4) as population_by_area
FROM facts
ORDER BY population_by_area desc LIMIT 20

 * sqlite:///factbook.db
Done.


name,population_by_area
Macau,21168.9643
Monaco,15267.5
Singapore,8141.2798
Hong Kong,6445.0415
Gaza Strip,5191.8194
Gibraltar,4876.3333
Bahrain,1771.8592
Maldives,1319.6409
Malta,1310.0158
Bermuda,1299.9259


This list significantly varies from the previous density screen we did previously. This is because we looked for countries that had areas below average and population above average. This is a pure screen for the ratio population/area, so you see many more small islands and countries on this list becaues these countries populations are below average and also have small areas.