# Analyzing CIA Factbook Data Using SQL

The World Factbook is maintained by the US Government and is one of its most accessed documents.

"The World Factbook provides basic intelligence on the history, people, government, economy, energy, geography, environment, communications, transportation, military, terrorism, and transnational issues for 266 world entities."

<a href="https://www.cia.gov/the-world-factbook/">Click</a> for more information.

#### Purpose
This project explores human distribution throughout the world using factbook data on land area, population and growth, as well as birth, and death rates of each country.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
    

'Connected: None@factbook.db'

In [3]:
%%sql
SELECT * FROM sqlite_master
WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [4]:
%%sql
SELECT * 
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


In [6]:
%%sql
SELECT MIN(population),Max(population),MIN(population_growth),MAX(population_growth)
 FROM facts

Done.


MIN(population),Max(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


In [7]:
%%sql
SELECT name,population 
 FROM facts
WHERE population=(SELECT MIN(population)
                 FROM facts)

Done.


name,population
Antarctica,0


In [8]:
%%sql
SELECT name,population 
 FROM facts
WHERE population=(SELECT MAX(population)
                 FROM facts)

Done.


name,population
World,7256490011


According to the analysis above inorder to get the countries with maximum and minimum population we need to exclude World, Antartica, and European Union from our query. We figured out these 3 names by exploring their <a href="https://www.cia.gov/the-world-factbook/">website</a>.

### Country Having the Lowest Population

In [19]:
%%sql
SELECT name,population,population_growth
 FROM facts
    WHERE (name!='Antartica' AND population!='None' AND population!=0)
    ORDER BY population ASC
    LIMIT 1

Done.


name,population,population_growth
Pitcairn Islands,48,0.0


According to the <a href="https://www.cia.gov/the-world-factbook/countries/pitcairn-islands/">CIA FactBook</a> Pitcairn Islands are an overseas territory of UK <a href="https://www.cia.gov/the-world-factbook/countries/pitcairn-islands/locator-map">located</a> in the South Pacific Ocean midway between Peru and New Zealand.

The Island is inhabitted by descendants of the bounty mutineers and their Tahitian wives.It has a GDP of 0 and the inhabbitants rely on fishing, subsistence farming, handicrafts, and postage stamps as a source of food, and income.

### Country having the Highest Population

In [21]:
%%sql
SELECT name,population,population_growth
 FROM facts
    WHERE (name!='World' AND population!='None' AND population!=0)
    ORDER BY population DESC
    LIMIT 1

Done.


name,population,population_growth
China,1367485388,0.45


China has the world's largest population but its population density is smaller than some countries in Asia, and Europe. Most of the population is found across the rivers, and the industrial cities Beijing, and Shenyang

### Country Having the Highest Population Growth Rate

In [30]:
%%sql
SELECT name,population,population_growth
 FROM facts
    WHERE (name!='World' AND name!='European Union' AND population!='None' AND population!=0)
    ORDER BY population_growth DESC
    LIMIT 1

Done.


name,population,population_growth
South Sudan,12042910,4.02


South Sudan is located in East-Central Africa; south of Sudan, north of Uganda,and Kenya, and west of Ethiopia. The population's age structure suggests that 41% of the people are aged 0-14 this suggests that their already burdened education system will face a new challenge in the future.

### Finding Densely Populated Countries

These countries have the following characteristic:
<ul>
    <li>Above-average values for population</li>
    <li>Below-average values for area</li>
</ul>

In [31]:
%%sql
SELECT name,population,area
 FROM facts
    WHERE (population>(SELECT AVG(CAST(population as FLOAT))
                      FROM facts 
                      WHERE name!='World' AND name!='European Union' )) 
          AND area<(SELECT AVG(CAST(area as FLOAT))
                    FROM facts
                    WHERE name!='World' AND name!='European Union')
    ORDER BY population DESC
    
    

Done.


name,population,area
Bangladesh,168957745,148460
Japan,126919659,377915
Philippines,100998376,300000
Vietnam,94348835,331210
Germany,80854408,357022
Thailand,67976405,513120
United Kingdom,64088222,243610
Italy,61855120,301340
"Korea, South",49115196,99720
Spain,48146134,505370


Majority of the densely populated countries in the world are located in Aisa

### Country having Highest ratio of water to land

In [27]:
%%sql
SELECT name,area_water as area_water,area_land,area_water/area_land AS ratio_water_to_land
FROM facts
WHERE area_water>area_land
ORDER BY area_water DESC

Done.


name,area_water,area_land,ratio_water_to_land
British Indian Ocean Territory,54340,60,905
Virgin Islands,1564,346,4


The values for area_water, and area_land are in sqkm.

<b>British Indian Ocean Territory</b>:

It is located in south of India, about halfway between Africa and Indonesia. The islands are now inhabitted by 3000 UK and US soldiers.

<b>Virgin Islands</b>:

These islands are found between the Caribbean Sea and the North Atlantic Ocean, east of Puerto Rico. These islands are part of the United States, but they have their local self-governments. The population of this Island is 105,780. 

### TOP 5 Countries having a high Population Increment Next Year

In [33]:
%%sql
SELECT name,population as current_population,population_growth as annual_growth_rate,(CAST(population as FLOAT)*population_growth)/100 as annual_population_increment,((CAST(population as FLOAT)*population_growth)/100)+population as population_next_year
FROM facts
WHERE name!='World' AND name!='European Union'
ORDER BY annual_population_increment DESC
LIMIT 5


Done.


name,current_population,annual_growth_rate,annual_population_increment,population_next_year
India,1251695584,1.22,15270686.1248,1266966270.1248
China,1367485388,0.45,6153684.246,1373639072.246
Nigeria,181562056,2.45,4448270.372,186010326.372
Pakistan,199085847,1.46,2906653.3662,201992500.3662
Ethiopia,99465819,2.89,2874562.1691,102340381.1691


### TOP 5 Countries having a high death rate

In [41]:
%%sql
SELECT name,death_rate,birth_rate
FROM facts
WHERE death_rate>birth_rate
ORDER BY death_rate DESC
LIMIT 5

Done.


name,death_rate,birth_rate
Ukraine,14.46,10.72
Bulgaria,14.44,8.92
Latvia,14.31,10.0
Lithuania,14.27,10.1
Russia,13.69,11.6


### Analyzing the Countries by Population GROUPS

We will be classifying the dataset into the following population groups:
<ul>
    <li>Below 1M</li>
    <li>Between 1M & 10M</li>
    <li>Between 10M & 100M</li>
    <li>Above 100M</li>
</ul>

In [85]:
%%sql
SELECT  CASE
        WHEN population<1000000 THEN 'Below 1M'
        WHEN (population>=1000000 AND population<10000000) THEN 'Between 1M & 10M'
        WHEN (population>=10000000 AND population<100000000) THEN 'Between 10M & 100M'
        WHEN (population>=100000000) THEN 'Above 100M'
        ELSE 'None'
        END AS population_group,Count(*) AS total_countries,SUM(population)as total_group_population,ROUND(AVG(population),2) AS avg_group_population,
        MAX(population) AS max_group_population,
        ROUND(AVG(population/area),2) AS group_population_density,
        ROUND(AVG(area_water/area_land),2) AS group_water_to_land_ratio,
        ROUND(AVG(death_rate),2) AS group_avg_death_rate,ROUND(AVG(birth_rate),2) AS group_avg_birth_rate,name as most_populated_country
        FROM facts
        WHERE name!='European Union' AND name!='World'
        GROUP BY population_group
        

        
                        


Done.


population_group,total_countries,total_group_population,avg_group_population,max_group_population,group_population_density,group_water_to_land_ratio,group_avg_death_rate,group_avg_birth_rate,most_populated_country
Above 100M,12,4442487587.0,370207298.92,1367485388.0,252.92,0.0,7.96,18.3,China
Below 1M,80,15335372.0,191692.15,909389.0,756.84,0.05,6.69,16.13,Fiji
Between 10M & 100M,76,2463170924.0,32410143.74,99465819.0,119.19,0.0,8.32,23.46,Ethiopia
Between 1M & 10M,72,335539315.0,4660268.26,9897541.0,398.18,0.0,8.29,18.24,Hungary
,19,,,,,75.42,,,Southern Ocean


Least populated country of each of the groups above is listed below

In [86]:
%%sql
SELECT CASE
                WHEN population<1000000 THEN 'Below 1M'
                WHEN (population>=1000000 AND population<10000000) THEN 'Between 1M & 10M'
                WHEN (population>=10000000 AND population<100000000) THEN 'Between 10M & 100M'
                WHEN (population>=100000000) THEN 'Above 100M'
                ELSE 'None'
                END AS population_group, name AS least_populated_country
                FROM facts
                WHERE name!='World' AND name!='European Union'
                GROUP BY population_group
                ORDER BY population ASC

Done.


population_group,least_populated_country
,Southern Ocean
Below 1M,Western Sahara
Between 1M & 10M,West Bank
Between 10M & 100M,Taiwan
Above 100M,United States


## Conclusion

Summarising the analysis above
<ul>
    <li>Most populous Country is <b>China</b> with a population of over 100M</li>
    <li>Least populous Country is <b>Pitcairn Islands</b> with a population of less than 50 and a growth rate of 0 </li>
    <li><b>South Sudan</b> has the highest population growth rate</li>
    <li><b>Asia</b> seems to be the most densely populated continent with countries like Bangladesh, Japan, Philipines, and Vietnam being the front runners of the list.</li>
    <li><b>Europe</b> seems to have a death rate greater than birth rate as the top 5 countries having a higher death rate than birth rate are European</li>
    <li>Group Analysis Summary:
        <ul>
            <li><b>12</b> countries have a population <b>Above 100M</b> with <b>China</b> being the most populous and <b>United States</b> being the least populous of the group</li>
            <li><b>76</b> countries have a population <b>Between 10M & 100M</b> with <b>Ethiopia</b> being the most populous and <b>Taiwan</b> being the least populous in the group</li>
            <li><b>72</b> countries have a population <b>Between 1M & 10M</b> with <b>Hungary</b> being the most populous and <b>West Bank</b> being the least populous in the group.</li>
            <li><b>80</b> countries have a population <b>Below 1M</b> with <b>Fiji</b> being the most populated and <b>Western Sahara</b> being the least populated in the group</li>
        </ul>
    </li>
    <li> Population projections indicate that <b>India</b> will have the highest population increment next year followed by China, Nigeria, Pakistan, and Ethiopia</li>
</ul>