# Analyzing CIA Factbook Data

In this project, we will explore data sourced from the CIA World Factbook, an extensive repository of statistics pertaining to countries worldwide. Specifically, we will delve into demographic details, such as:

- **Global Population:** The total population of the world.
- **Population Growth:** Annual population growth rates, represented as percentages.
- **Geographic Area:** The combined land and water areas of each country.

Our data analysis will be performed using SQL within the Jupyter Notebook platform.

### Connect the Jupyter Notebook to the database file

In [16]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [18]:
%%sql
SELECT * FROM sqlite_master
     WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [24]:
%%sql 
SELECT * FROM facts
LIMIT 5

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Column Description

- **name** — The name of the country.
- **area** — The country's total area (both land and water).
- **area_land** — The country's land area in square kilometers.
- **area_water** — The country's water area in square kilometers.
- **population** — The country's population.
- **population_growth** — The country's population growth as a percentage.
- **birth_rate** — The country's birth rate, or the number of births per year per 1,000 people.
- **death_rate** — The country's death rate, or the number of deaths per year per 1,000 people.


## Exploring Statistics

In [39]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts;

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


These statisitcs are interesting and should be investigated further:
- *There is a country with a minimum population of 0*
- *There is a country with a population of 7.2 billion*

### Exploring Outliers

In [40]:
%%sql
SELECT *
FROM facts
WHERE population == (SELECT MIN(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


The dataset appears to contain a row for Antartica the population of 0 therefore makes sense as nobody lives in Antartica but it still is considered a "country".

In [41]:
%%sql
SELECT *
FROM facts
WHERE population == (SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


There dataset also appears to contain an entry for the whole world which explains such a large population with this in 
mind we should recalculate our statistics.

In [51]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts
WHERE id IS NOT 261

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


### Average Population & Area

In [56]:
%%sql
SELECT 
    ROUND(AVG(population),0) as 'Average Population', 
    ROUND(AVG(area),0) as 'Average Area'
FROM facts
    WHERE id is NOT 261

 * sqlite:///factbook.db
Done.


Average Population,Average Area
32242667.0,555094.0


### Densely Populated Areas

Densely Populated Areas would be those who have above average population but below average area. This leading to overcrowding.

In [64]:
%%sql
SELECT
    name,
    population,
    area
FROM facts
    WHERE (
        (population > (SELECT AVG(population) FROM facts WHERE id IS NOT 261)) AND
        (area < (SELECT AVG(area) FROM facts WHERE id IS NOT 261)))

 * sqlite:///factbook.db
Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Iraq,37056169,438317
Italy,61855120,301340
Japan,126919659,377915
"Korea, South",49115196,99720
Morocco,33322699,446550
Philippines,100998376,300000
Poland,38562189,312685
Spain,48146134,505370


#### Exploratory Analysis

1. What country has the largest population?

In [73]:
%%sql
SELECT 
    name
FROM facts
    WHERE id IS NOT 261
    ORDER BY population DESC
    LIMIT 1

 * sqlite:///factbook.db
Done.


name
China


2. Which countries death rate surpasses the birthrate

In [86]:
%%sql
SELECT 
    name,
    death_rate
FROM facts
    WHERE birth_rate < death_rate
    ORDER BY death_rate DESC

 * sqlite:///factbook.db
Done.


name,death_rate
Ukraine,14.46
Bulgaria,14.44
Latvia,14.31
Lithuania,14.27
Russia,13.69
Serbia,13.66
Belarus,13.36
Hungary,12.73
Moldova,12.59
Estonia,12.4


3. Which country has the highest/lowest birth rate & death rate?

In [80]:
%%sql
SELECT
    (SELECT name FROM facts ORDER BY birth_rate) AS 'lowest birth rate',
    (SELECT name FROM facts ORDER BY birth_rate DESC) AS 'highest birth rate',
    (SELECT name FROM facts ORDER BY death_rate) AS 'lowest death rate',
    (SELECT name FROM facts ORDER BY death_rate DESC) AS 'highest death rate'
FROM facts
LIMIT 1


 * sqlite:///factbook.db
Done.


lowest birth rate,highest birth rate,lowest death rate,highest death rate
Kosovo,Niger,Kosovo,Lesotho


4. How many countries have a greater water/land ratio and how many do not?

In [82]:
%%sql
SELECT
    (SELECT COUNT(name) FROM facts WHERE area_land > area_water) AS 'Land Dominanted',
    (SELECT COUNT(name) FROM facts WHERE area_land < area_water) AS 'Water Dominanted'
FROM facts
LIMIT 1
    

 * sqlite:///factbook.db
Done.


Land Dominanted,Water Dominanted
239,2


In [85]:
%%sql
SELECT 
    name,
    area_land,
    area_water
FROM facts
WHERE area_water>area_land;

 * sqlite:///factbook.db
Done.


name,area_land,area_water
British Indian Ocean Territory,60,54340
Virgin Islands,346,1564


5. Accuracy of Densley Populated Areas

In [96]:
%%sql
SELECT
    name,
    population,
    area,
    ROUND(population/area,2) AS 'density'
FROM facts
ORDER BY population/area DESC
LIMIT 12


 * sqlite:///factbook.db
Done.


name,population,area,density
Macau,592731,28,21168.0
Monaco,30535,2,15267.0
Singapore,5674472,697,8141.0
Hong Kong,7141106,1108,6445.0
Gaza Strip,1869055,360,5191.0
Gibraltar,29258,6,4876.0
Bahrain,1346613,760,1771.0
Maldives,393253,298,1319.0
Malta,413965,316,1310.0
Bermuda,70196,54,1299.0


We can see that this list is a more accurate answer to our previous analysis on density. This calculation looks directly at the area of a country compared to its population as oppose to comparing those values to the average. This finding is comparable to online sources and provides proven accuracy.