# Insights from the CIA World Factbook: Patterns and Relationships Among Country Statistics

This project uses SQL to analyze data from the CIA World Factbook, a compendium of statistics about all countries on Earth. Our goal is to identify patterns and relationships among various country statistics, including population, development, and natural population decline. To reach this goal, we used SQL to query the database. 

Our analysis revealed several interesting results, including that the most densely populated countries are located in Asia, and that countries with above-average populations and high population growth rates tend to be less economically developed. Additionally, the countries with the highest rates of natural population decline are former USSR republics in the Balkans and the Baltics. These findings demonstrate the power of SQL to uncover important insights about global trends and patterns.

In [1]:
# Installing ipython-sql
!conda install -yc conda-forge ipython-sql

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/rinatkamalov/opt/anaconda3

  added / updated specs:
    - ipython-sql


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    boltons-23.0.0             |     pyhd8ed1ab_0         296 KB  conda-forge
    conda-23.3.1               |   py39hecd8cb5_0         962 KB
    ipython-sql-0.3.9          |  pyhd8ed1ab_1004          18 KB  conda-forge
    jsonpatch-1.32             |     pyhd8ed1ab_0          14 KB  conda-forge
    jsonpointer-2.0            |             py_0           9 KB  conda-forge
    prettytable-3.7.0          |     pyhd8ed1ab_0          29 KB  conda-forge
    python_abi-3.9             |           2_cp39           4 KB  conda-forge
    ruamel.yaml-0.17.22        |   py39hdc70f33_0         190 KB  conda-forge
    ruamel.yaml.clib-0.2.7     |   p

In [2]:
%%capture
%load_ext sql # Loading SQL extension
%sql sqlite:///factbook.db # Connecting to factbook.db database

We will examine the metadata of the database first.

In [3]:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Now we will take a look at the first 5 rows of the facts table.

In [6]:
%%sql
SELECT *
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


We will start by calculating some summary statistics and look for any outlier countries.

In [8]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts;

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


There is a country with 0 population, and with a population of over 7.2 bln. We will zoom in on these countries.

In [12]:
%%sql
SELECT name, population
FROM facts
WHERE population=0;

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0


In [11]:
%%sql
SELECT name, population
FROM facts
ORDER BY population DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population
World,7256490011
China,1367485388
India,1251695584
European Union,513949445
United States,321368864
Indonesia,255993674
Brazil,204259812
Pakistan,199085847
Nigeria,181562056
Bangladesh,168957745


We will exclude the row for the whole world and recalculate the statistics.

In [13]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts
WHERE name != 'World';

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


Calculating the average population and area.

In [14]:
%%sql
SELECT AVG(population), AVG(area)
FROM facts
WHERE name != 'World';

 * sqlite:///factbook.db
Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


Now we will find countries that are densely populated. We'll identify countries that have the following:
- Above-average values for population.
- Below-average values for area.

In [16]:
%%sql
SELECT name, area, population
FROM facts
WHERE population > (SELECT AVG(population)
                    FROM facts
                    WHERE name != 'World') AND area < (SELECT AVG(area)
                                                       FROM facts
                                                       WHERE name != 'World')
ORDER BY population DESC;

 * sqlite:///factbook.db
Done.


name,area,population
Bangladesh,148460,168957745
Japan,377915,126919659
Philippines,300000,100998376
Vietnam,331210,94348835
Germany,357022,80854408
Thailand,513120,67976405
United Kingdom,243610,64088222
Italy,301340,61855120
"Korea, South",99720,49115196
Spain,505370,48146134


The most densely populated countries (Bangladesh, Japan, the Philippines and Vietnam) are all in Asia.

Next, we look at the countries with above-average population and population growth, and compare them with the world as a whole.

In [25]:
%%sql
SELECT name, population, population_growth
FROM facts
WHERE (population > (SELECT AVG(population)
                    FROM facts
                    WHERE name != 'World') AND population_growth > (SELECT AVG(population_growth)
                                                       FROM facts
                                                       WHERE name != 'World')) OR name='World'
ORDER BY population_growth DESC;

 * sqlite:///factbook.db
Done.


name,population,population_growth
Uganda,37101745,3.24
Iraq,37056169,2.93
Ethiopia,99465819,2.89
Tanzania,51045882,2.79
"Congo, Democratic Republic of the",79375136,2.45
Nigeria,181562056,2.45
Afghanistan,32564342,2.32
Kenya,45925301,1.93
Algeria,39542166,1.84
Egypt,88487396,1.79


The countries with above-average populations and the highest population growth rates (Uganda, Iraq, Ethiopia, Tanzania) appear to be the least economically developed countries.

We now look at the countries with the worst rate of natural increase, or RNI (birth rate minus death rate).

In [26]:
%%sql
SELECT name, ROUND((birth_rate - death_rate), 2) AS RNI
FROM facts
WHERE death_rate > birth_rate
ORDER BY RNI
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,RNI
Bulgaria,-5.52
Serbia,-4.58
Latvia,-4.31
Lithuania,-4.17
Ukraine,-3.74
Hungary,-3.57
Germany,-2.95
Slovenia,-2.95
Romania,-2.76
Croatia,-2.73


The countries with the highest rates of natural population decline are Bulgaria, Serbia, Latvia and Lithuania, former USSR republics in the Balkans and the Baltics.

## Conclusion

This project used SQL to analyze data from the CIA World Factbook, with the goal of identifying patterns and relationships among various country statistics.

Through our analysis, we discovered several interesting results. For example, we found that the most densely populated countries are located in Asia, and that countries with above-average populations and high population growth rates tend to be less economically developed. Additionally, the countries with the highest rates of natural population decline are former USSR republics in the Balkans and the Baltics. 

Overall, this project demonstrates the power of data analysis to uncover important insights about global trends and patterns. By using SQL to explore country statistics in the CIA World Factbook, we were able to gain a deeper understanding of the relationships between population, development, and natural population decline. Moving forward, these insights could be used to inform policy decisions and guide future research in this area.