# Analyze CIA Factbook Data

This project is to analyze data within the CIA Factbook. 

Dataset: https://www.cia.gov/library/publications/the-world-factbook/

Some pieces of data gathered for each country are population, land & water area, and population growth. 

## Index 

- SQL Setup & Exploring the *facts* dataset
- Summary Statistics for the *facts* dataset
- Finding Densely Populated Countries
- Population and Growth Rate
- Exploring Water-to-Land Ratio
- Comparing Death Rates and Birth Rates
- Migration Rate

## SQL Setup & Exploring the *facts* dataset

In [18]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

We need to see the tables in the database and the column names. 

In [19]:
%%sql
SELECT *
    FROM sqlite_master
    WHERE type = 'table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


The *facts* table will the table I will be conducting analysis on. The following are the first 5 rows of the *facts* table.

In [20]:
%%sql
SELECT *
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Summary Statistics for the *facts* table

The summary statistics are the smallest population, largest population, smallest population growth rate, and the largest population growth rate.


In [21]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts;


Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


After pulling some summary statistics for the Population and the Population Growth columns, there are outliers. The minimum value (0) and the maximum value (7 billion+) of the population column are outliers. The next step is to analyze the countries that have the minimum or the maximum value.


In [22]:
%%sql
SELECT name
    FROM facts
    WHERE population = (SELECT MIN(population)
                           FROM facts);

Done.


name
Antarctica


In [23]:
%%sql
SELECT name
    FROM facts
    WHERE population = (SELECT MAX(population)
                           FROM facts);

Done.


name
World


After analyzing the outliers, the minimum value is Antartica which is a continent. The maximum value is the population of the entire world. These are not accurate reflections of what countries have the lowest and highest populations. As a result, the next step is to omit *World* and *Antarctica* from the summary statistics.

In [24]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts
    WHERE population != (SELECT MAX(population)
                            FROM facts);

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


After modifying the summary statistics to eliminate the 'World' population data point, the max population is around 1.3 billion. 

Next step is to pull the average population and the average land and water area while omitting the 'World' row.

In [25]:
%%sql
SELECT AVG(population) avg_population, AVG(area) avg_land_area
    FROM facts
    WHERE population != (SELECT MAX(population)
                            FROM facts);

Done.


avg_population,avg_land_area
32242666.56846473,582949.8523206752


The average population is around 32 million and the average land and water area of the countries is around 5 million. 

Buliding upon the average population and average land & water area results, the next query will help see the countries where the population is above average AND the area is below average. 

## Finding Densely Populated Countries

In [26]:
%%sql
SELECT *
    FROM facts
    WHERE population > (SELECT AVG(population)
                           FROM facts
                           WHERE name <> 'World') 
        AND area < (SELECT AVG(area)
                        FROM facts
                        WHERE name <> 'World');

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


Some of the densely populated countries are Bangladesh, Germany, Iraq, the Phillippines, Thailand, United Kingdom and Vietnam. 

## Population and Growth Rate

This step will examine the most populous country and the country with the highest growth rate.

In [27]:
%%sql
SELECT * 
    FROM facts
    WHERE name <> 'World'
    ORDER BY population DESC
    LIMIT 1;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


In [28]:
%%sql
SELECT * 
    FROM facts
    WHERE name <> 'World'
    ORDER BY population_growth DESC
    LIMIT 1;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329,,,12042910,4.02,36.91,8.18,11.47


The country with the highest population is China with 1.3 billion people; however, the country with the highest population growth rate is South Sudan with a rate of 4.02.

## Exploring Water-to-Land Ratio

This section pulls the countries with the highest ratios of water to land, and the countries that have more water than land. 

In [29]:
%%sql
SELECT name,
       area_land,
       area_water,
       CAST(area_water AS Float) / CAST(area_land AS Float) AS water_land_ratio
    FROM facts
    WHERE water_land_ratio > 0.0
    ORDER BY water_land_ratio DESC
    LIMIT 10;

Done.


name,area_land,area_water,water_land_ratio
British Indian Ocean Territory,60,54340,905.6666666666666
Virgin Islands,346,1564,4.520231213872832
Puerto Rico,8870,4921,0.5547914317925592
"Bahamas, The",10010,3870,0.3866133866133866
Guinea-Bissau,28120,8005,0.2846728307254623
Malawi,94080,24404,0.2593962585034013
Netherlands,33893,7650,0.2257103236656536
Uganda,197100,43938,0.2229223744292237
Eritrea,101000,16600,0.1643564356435643
Liberia,96320,15049,0.1562396179401993


The countries with the highest water to land ratios are the British Indian Ocean Territory, the Virgin Islands, Puerto Rico, The Bahamas and Guinea-Bissau. British Indian Ocean Territory and Virgin Islands have more water than land. 

## Comparing Death Rates and Birth Rates

Which countries have a higher death rate than birth rate and vice versa?

In [30]:
%%sql
SELECT name,
       death_rate,
       birth_rate
FROM facts;

Done.


name,death_rate,birth_rate
Afghanistan,13.89,38.57
Albania,6.58,12.92
Algeria,4.31,23.67
Andorra,6.96,8.13
Angola,11.49,38.78
Antigua and Barbuda,5.69,15.85
Argentina,7.33,16.64
Armenia,9.34,13.61
Australia,7.14,12.15
Austria,9.42,9.41


In the birth_rate and death_rate, there are 'None' values, so these have to be taken into account in our queries comparing the values.

With this next query, I seek to determine which countries have a higher death rate than birth rate. 

In [31]:
%%sql
SELECT name,
       death_rate,
       birth_rate,
       CAST(death_rate AS Float) - CAST(birth_rate AS Float) death_birth_diff
FROM facts
WHERE birth_rate != 'None'
    AND death_rate != 'None'
    AND death_rate > birth_rate
ORDER BY death_birth_diff DESC;

Done.


name,death_rate,birth_rate,death_birth_diff
Bulgaria,14.44,8.92,5.52
Serbia,13.66,9.08,4.58
Latvia,14.31,10.0,4.3100000000000005
Lithuania,14.27,10.1,4.17
Ukraine,14.46,10.72,3.74
Hungary,12.73,9.16,3.5700000000000003
Germany,11.42,8.47,2.9499999999999997
Slovenia,11.37,8.42,2.9499999999999997
Romania,11.9,9.14,2.76
Croatia,12.18,9.45,2.7300000000000004


There are 24 countries in the factbook that have death rates higher than birth rates. The top 5 countries with the largest difference between their death rates and their birth rates are *Bulgaria, Serbia, Latvia, Lithuania, and Ukraine.* Furthermore, most of the countries with higher death rates than birth rates are located in Europe. 

Next is to see which countries have a birth rate higher than a death rate. 

In [32]:
%%sql
SELECT name, 
       birth_rate,
       death_rate,
       CAST(birth_rate AS Float) - CAST(death_rate AS Float) birth_death_diff
FROM facts
WHERE birth_rate != 'None'
    AND death_rate != 'None'
    AND birth_rate > death_rate
ORDER BY birth_death_diff DESC
LIMIT 10;
       

Done.


name,birth_rate,death_rate,birth_death_diff
Malawi,41.56,8.41,33.150000000000006
Uganda,43.79,10.69,33.1
Niger,45.45,12.42,33.03
Burundi,42.01,9.27,32.739999999999995
Mali,44.99,12.89,32.1
Burkina Faso,42.03,11.72,30.31
Zambia,42.13,12.67,29.46
Ethiopia,37.27,8.19,29.080000000000005
South Sudan,36.91,8.18,28.73
Tanzania,36.39,8.0,28.39


There are far more countries that have a birth rate that is higher than their death rate. The top 5 countries with the largest differences between their birth rate and death rate are *Malawi, Uganda, Niger, Burundi, and Mali.* All located in the African continent. 

## Migration Rate

In this final section, I want to explore countries and their migration rates. Which countries have the highest or lowest migration rates?

In [33]:
%%sql
SELECT name,
       migration_rate
FROM facts
WHERE migration_rate != 'None'
ORDER BY migration_rate DESC
LIMIT 5;

Done.


name,migration_rate
Qatar,22.39
American Samoa,21.13
"Micronesia, Federated States of",20.93
Syria,19.79
Tonga,17.84


In [34]:
%%sql 
SELECT name,
       migration_rate
FROM facts
WHERE migration_rate == 0.0;

Done.


name,migration_rate
Andorra,0.0
Argentina,0.0
Azerbaijan,0.0
"Bahamas, The",0.0
Belize,0.0
Benin,0.0
Bhutan,0.0
Burkina Faso,0.0
Burundi,0.0
Central African Republic,0.0


The countries with the highest migration rate is Qatar at 22.39% which means that there are far more immigrants than emigrants in the country. There are several countries tied with a rate of 0.0. 