# Analyzing CIA Factbook Data Using SQL

The CIA factbook, also known as the [World Factbook](https://www.cia.gov/the-world-factbook/), is an annual publication of the US Central Intelligence Agency. It provides basic intelligence by summarizing information about countries and regions worldwide. The factbook contains a mix of demographic and geographic data, among many more.

![](https://nbviewer.org/github/Lippins/CIA_factbook_data_analysis/blob/main/images/cover_image.png)

In this project, we will use SQL in Jupyter Notebook to analyze data from this [SQLite factbook.db](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db) database. The goal is to identify the leading and lagging territories of the world based on demographic indicators like population, mortality, and fertility.

## Table of Contents
---
- [Connecting to the database](#Connecting-to-the-Database)
- [Summarizing the Data](#Summarizing-the-Data)
- [Exploring the Data](#Exploring-the-Data)
- [Exploring Population Density Statistics](#Exploring-Population-Density-Statistics)
- [Exploring Population Growth Statistics](#Exploring-Population-Growth-Statistics)
- [Exploring Birth and Death Rates Statistics](#Exploring-Birth-and-Death-Rates-Statistics)
- [Conclusion](#Conclusion)

## Connecting to the Database

In [31]:
#!pip install ipython-sql

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

## Summarizing the Data
We will first examine the data in the database.

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


#### Notes
>- There are two tables in the database, **sqlite_sequence** and **facts**. 
>- The sqlite_sequence table does not contain any columns. 
>- The facts table contains information on _population, birth rate, migration rate_ and many more: this is what we need.
  
We will work with the **facts** table henceforth.

In [4]:
%%sql
-- Examine the first five rows in facts table
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


The facts table comprises **11 columns** with intuitive names:
>- **id** - Entry row number.
>- **code** — The country's [internet code](https://www.cia.gov/the-world-factbook/field/internet-country-code/).
>- **name** — Name of the country.
>- **area** — The country's total area (both land and water).
>- **area_land** — The country's land area in square kilometers.
>- **area_water** — The country's water area in square kilometers.
>- **population** —  Whole number of people or inhabitants in the country.
>- **population_growth** — The country's population growth as a percentage.
>- **birth_rate** — The number of births per year per 1,000 inhabitants.
>- **death_rate** — The number of deaths per year per 1,000 inhabitants.
>- **migration_rate** — The difference between the number of persons entering and leaving the country during the year per 1,000 persons. 
  
Finally, let's check the total number of records in the facts table:

In [5]:
%%sql
-- Calculate the total number of records
SELECT COUNT(*) AS num_records,
       COUNT(DISTINCT name) AS unique_countries
  FROM facts;

 * sqlite:///factbook.db
Done.


num_records,unique_countries
261,261


The `facts` table contains demographic and geographic data about all the countries. Since the table counts 261 rows, and  since in 2015 (the year when this factbook was released) there were [198 countries](https://en.wikipedia.org/wiki/List_of_countries_by_population_in_2015), some of the rows must represent other territories or administrative units rather than countries.

## Exploring the Data

We will now calculate some summary statistics and look for any outlier countries.

In [6]:
%%sql
SELECT MIN(population) AS min_population,
       MAX(population) AS max_population,      
       AVG(population) AS average_population
    FROM facts;

 * sqlite:///factbook.db
Done.


min_population,max_population,average_population
0,7256490011,62094928.32231405


That is strange - how can a country have a minimum population of zero or a maximum population of over seven million (which is approximately the [world population in 2015](https://population.un.org/wpp/Publications/Files/WPP2015_DataBooklet.pdf)? Let's investigate further.

In [7]:
%%sql
SELECT name, population
  FROM facts
WHERE population = (
        SELECT MIN(population) 
          FROM facts)
    OR population = (
        SELECT MAX(population) 
          FROM facts);

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0
World,7256490011


Therefore, we can confirm that the maximum value of `population` corresponds to the world population in 2015. The minimum value corresponds to Antarctica, which also looks reasonable, since, according to the [CIA Factbook page for Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html), there are 
> no indigenous inhabitants, but there are both permanent and summer-only staffed research stations

Let's recalculate the summary statistics excluding these outliers.

In [8]:
%%sql
SELECT MIN(population) AS min_population,
       MAX(population) AS max_population,      
       ROUND(AVG(population), 0) AS average_population
    FROM facts
WHERE name != 'Antarctica' 
  AND name != 'World';

 * sqlite:///factbook.db
Done.


min_population,max_population,average_population
48,1367485388,32377011.0


Now both the minimum and maximum values look much more reasonable. Also, this decreased the average population by a factor of two.

Let's find out what countries have the largest or smallest populations. We will specifically look for the top five countries with the largest and smallest population.

In [9]:
%%sql
SELECT name, population AS top_5_countries
  FROM facts
WHERE name != 'World'
ORDER BY population DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


name,top_5_countries
China,1367485388
India,1251695584
European Union,513949445
United States,321368864
Indonesia,255993674


This doesn't look correct. The European Union isn't a country. Let's fix this.

In [10]:
%%sql
SELECT name, population AS top_5_countries
  FROM facts
WHERE name != 'World' 
  AND name != 'European Union'
ORDER BY population DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


name,top_5_countries
China,1367485388
India,1251695584
United States,321368864
Indonesia,255993674
Brazil,204259812


We see that the _most populated_ country is **China**, followed by **India**, then with a big gap by the **United States**, **Indonesia**, and **Brazil**. This looks perfectly reasonable.

In [11]:
%%sql
SELECT name, population AS bottom_5_countries
  FROM facts
WHERE name != 'Antarctica'
  AND population NOT NULL
ORDER BY population
LIMIT 5;

 * sqlite:///factbook.db
Done.


name,bottom_5_countries
Pitcairn Islands,48
Cocos (Keeling) Islands,596
Holy See (Vatican City),842
Niue,1190
Tokelau,1337


This also looks reasonable. The **[Pitcairn Islands](https://en.wikipedia.org/wiki/Pitcairn_Islands)**, or just Pitcairn, is located in the southern Pacific Ocean, and is the _least populous_ national jurisdiction in the world. **Niue** and **Tokelau** are also very isolated islands in the southern Pacific Ocean, while **Cocos Islands** – in the Indian Ocean. Among the bottom 5 countries, there is also a microstate **Vatican**.

In our database, there are some territories with null values of the `population`.

In [12]:
%%sql
SELECT name, population
  FROM facts
WHERE population IS NULL;

 * sqlite:///factbook.db
Done.


name,population
Ashmore and Cartier Islands,
Coral Sea Islands,
Heard Island and McDonald Islands,
Clipperton Island,
French Southern and Antarctic Lands,
Bouvet Island,
Jan Mayen,
British Indian Ocean Territory,
South Georgia and South Sandwich Islands,
Navassa Island,


All these territories represent or different uninhabited islands, or the oceans.

## Exploring Population Density Statistics

In this section, we're going to find out which countries are the most densely populated and which ones the least most densely populated. 

First, let's calculate the average population density in the world (i.e.,  dividing `population` by `area`). Here and later, we have to:
- use `area_land` instead of `area`, for being more precise,
- exclude null values for both `population` and `area_land`,
- exclude `area_land`=0 to avoid having 0 in the denominator,
- exclude the outliers: World, European Union, and Antarctica.

Let's look once again at the outliers.

In [13]:
%%sql
SELECT name, population, area_land
  FROM facts
WHERE name = 'World' 
   OR name = 'European Union' 
    OR name = 'Antarctica';

 * sqlite:///factbook.db
Done.


name,population,area_land
European Union,513949445,
Antarctica,0,280000.0
World,7256490011,


`Antarctica` is a valid outlier while `European Union` and `World` are not because they have `None` for land area.

Now we can calculate the average values for the population density, population, and area.

In [14]:
%%sql
SELECT ROUND(CAST(AVG(population) AS Float)/CAST(AVG(area_land) AS Float), 0) 
                                AS average_population_density,
        ROUND(AVG(population),0) AS average_population,
          ROUND(AVG(area_land),0) AS average_area
    FROM facts
WHERE population NOT NULL 
  AND area_land NOT NULL 
    AND name != 'Antarctica' 
      AND area_land != 0;

 * sqlite:///factbook.db
Done.


average_population_density,average_population,average_area
55.0,30641707.0,553017.0


Next, we're going to count the countries with the population density above average. We can assume that there will be roughly half of such countries in our database, while another half will have the population density below average.

In [15]:
%%sql
SELECT COUNT(name) AS above_average
  FROM facts
WHERE name != 'Antarctica'
  AND population NOT NULL 
    AND area_land NOT NULL
      AND area_land != 0
         AND CAST(population AS Float)/CAST(area_land AS Float) > (
            SELECT CAST(AVG(population) AS Float)/CAST(AVG(area_land) AS Float)                                          
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0);


 * sqlite:///factbook.db
Done.


above_average
149


Instead, let's look only at the first 20.

In [16]:
%%sql
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/CAST(area_land AS Float), 0) 
                                          AS population_density
    FROM facts
WHERE name != 'Antarctica'
  AND population NOT NULL 
    AND area_land NOT NULL
      AND area_land != 0
        AND population_density > (
            SELECT CAST(AVG(population) AS Float)/CAST(AVG(area_land) AS Float) 
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0)
ORDER BY population_density DESC
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,population,area_land,population_density
Macau,592731,28,21169.0
Monaco,30535,2,15268.0
Singapore,5674472,687,8260.0
Hong Kong,7141106,1073,6655.0
Gaza Strip,1869055,360,5192.0
Gibraltar,29258,6,4876.0
Bahrain,1346613,760,1772.0
Maldives,393253,298,1320.0
Malta,413965,316,1310.0
Bermuda,70196,54,1300.0


According to this table, the _highest population density_ is observed in **Macau**, followed with a big gap by **Monaco**, which in turn is followed with another big gap by **Singapore** and **Hong Kong**. In general, as we can see, these high values are mostly related to **small countries and islands**, with the area much below average (553,000 km2), but also the population below average (30.6 mln people). To narrow the subset of all the countries with the population density above average and to find bigger territories among them, we'll select only those of them, for which both of the following statements are true:
- the population is above average,
- the area is below average. 

In [17]:
%%sql
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/CAST(area_land AS Float), 0) 
                                          AS population_density
    FROM facts
WHERE name != 'Antarctica'
  AND population NOT NULL 
    AND area_land NOT NULL
      AND area_land != 0
        AND population > (
            SELECT AVG(population)
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0)
        AND area_land < (
            SELECT AVG(area_land)
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0)
ORDER BY population_density DESC;

 * sqlite:///factbook.db
Done.


name,population,area_land,population_density
Bangladesh,168957745,130170,1298.0
"Korea, South",49115196,96920,507.0
Japan,126919659,364485,348.0
Philippines,100998376,298170,339.0
Vietnam,94348835,310070,304.0
United Kingdom,64088222,241930,265.0
Germany,80854408,348672,232.0
Nepal,31551305,143351,220.0
Italy,61855120,294140,210.0
Uganda,37101745,197100,188.0


Our initial subset is now significantly reduced. We can conclude that among the countries with both the _population above average_ and the _area below average_, the _highest population density_ is related to **Bangladesh**, followed with a big gap by **South Korea**.

Next, we're going to do the same steps for the countries with the population density below average, starting from the countries in the bottom 20.

In [18]:
%%sql
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/CAST(area_land AS Float), 2) 
                                          AS population_density
    FROM facts
WHERE name != 'Antarctica'
  AND population NOT NULL 
    AND area_land NOT NULL
      AND area_land != 0
        AND population_density < (
            SELECT CAST(AVG(population) AS Float)/CAST(AVG(area_land) AS Float) 
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0)
ORDER BY population_density
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,population,area_land,population_density
Greenland,57733,2166086,0.03
Svalbard,1872,62045,0.03
Falkland Islands (Islas Malvinas),3361,12173,0.28
Pitcairn Islands,48,47,1.02
Mongolia,2992908,1553556,1.93
Western Sahara,570866,266000,2.15
Namibia,2212307,823290,2.69
Australia,22751014,7682300,2.96
Iceland,331918,100250,3.31
Mauritania,3596702,1030700,3.49


The absolute leaders for their _low population density_ are **Greenland** and **Svalbard**.

Let's put further restrictions also to this subset and select only those countries, for which both of the following statements are true:
- the population is below average,
- the area is above average.

In [19]:
%%sql
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/CAST(area_land AS Float), 2) 
                                          AS population_density
    FROM facts
WHERE name != 'Antarctica'
  AND population NOT NULL 
    AND area_land NOT NULL
      AND area_land != 0
        AND population < (
            SELECT AVG(population)
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0)
        AND area_land > (
            SELECT AVG(area_land)
              FROM facts
            WHERE name != 'Antarctica'
              AND population NOT NULL 
                AND area_land NOT NULL
                  AND area_land != 0)
ORDER BY population_density;

 * sqlite:///factbook.db
Done.


name,population,area_land,population_density
Greenland,57733,2166086,0.03
Mongolia,2992908,1553556,1.93
Namibia,2212307,823290,2.69
Australia,22751014,7682300,2.96
Mauritania,3596702,1030700,3.49
Libya,6411776,1759540,3.64
Botswana,2182719,566730,3.85
Kazakhstan,18157122,2699700,6.73
Central African Republic,5391539,622984,8.65
Chad,11631456,1259200,9.24


Now, our second subset is also reduced: we see _big countries_ with relatively _low values of population_, with **Greenland** again opening the list and followed by **Mongolia**, **Namibia**, and **Australia**.

## Exploring Population Growth Statistics

Population growth is calculated by dividing the number of people added to a population in a year (natural increase + net in-migration) by the population size at the start of the year. 

Let's start analyzing the population growth with respect to the other rates by looking at the cases where it is null or 0.

In [20]:
%%sql
SELECT name, population_growth, birth_rate, death_rate, migration_rate
  FROM facts
WHERE population_growth IS NULL 
   OR population_growth = 0;

 * sqlite:///factbook.db
Done.


name,population_growth,birth_rate,death_rate,migration_rate
Kosovo,,,,
Holy See (Vatican City),0.0,,,
Ashmore and Cartier Islands,,,,
Cocos (Keeling) Islands,0.0,,,
Coral Sea Islands,,,,
Heard Island and McDonald Islands,,,,
Greenland,0.0,14.48,8.49,5.98
Clipperton Island,,,,
French Southern and Antarctic Lands,,,,
Saint Barthelemy,,,,


Here we see again the same uninhabited islands and oceans that we have seen for the rows with null `population`, the microstate Vatican, Kosovo with no data available for none of the rates. An interesting case is Greenland, where the 0 value of the population growth is actually due to compensation of the difference between the birth and death rates by the migration rate (in this case, emigration).

Let's see if there are any cases where `birth_rate` or `death_rate` is 0 or null, but `population_growth` has not-null values.

In [21]:
%%sql
SELECT COUNT(name) AS zero_birth_death_rates
  FROM facts
WHERE birth_rate = 0
   OR death_rate = 0;

 * sqlite:///factbook.db
Done.


zero_birth_death_rates
0


There are no `birth_rate` or `death_rate` with the value of 0 in the whole database.

In [22]:
%%sql
SELECT name, population, population_growth, birth_rate, death_rate, migration_rate
  FROM facts
WHERE population_growth != 0
  AND (birth_rate IS NULL OR death_rate IS NULL);

 * sqlite:///factbook.db
Done.


name,population,population_growth,birth_rate,death_rate,migration_rate
Christmas Island,1530,1.11,,,
Norfolk Island,2210,0.01,,,
Niue,1190,0.03,,,
Tokelau,1337,0.01,,,
Svalbard,1872,0.03,,,


For the rows above, we have no data for `birth_rate`, `death_rate`, and `migration_rate`; however, we have not-null data for `population_growth`. Anyway, the values are very low, as well as the `population` values themselves, with the maximum of 2,210 persons in Norfolk Island. Indeed, we have already seen some of these territories among the bottom 5 countries by population.

Next, we'll find the top 20 and bottom 20 countries by population growth.

In [23]:
%%sql
SELECT name, population_growth AS top_20_population_growth
  FROM facts
ORDER BY population_growth DESC
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,top_20_population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24
Qatar,3.07
Burkina Faso,3.03
Mali,2.98
Cook Islands,2.95
Iraq,2.93


The _highest population growth_, up to 4%, is predominantly observed in **African countries**, and also in **Qatar** and **Iraq**.

Now, we'll look at the bottom 20 countries.

In [24]:
%%sql
SELECT name, population_growth AS bottom_20_population_growth
  FROM facts
WHERE population_growth NOT NULL
ORDER BY population_growth
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,bottom_20_population_growth
Holy See (Vatican City),0.0
Cocos (Keeling) Islands,0.0
Greenland,0.0
Pitcairn Islands,0.0
Greece,0.01
Norfolk Island,0.01
Tokelau,0.01
Falkland Islands (Islas Malvinas),0.01
Guyana,0.02
Slovakia,0.02


The _lowest-to-zero population growth_ is observed in **small countries and islands** (some of them we have already seen earlier in the project), the above-discussed **Greenland**, a few countries of **Western Europe** (Greece, Portugal, Andorra, Monaco), **Eastern Europe** (Slovakia, Poland), **Russia**, and **Georgia**. Curious that in 2015 no country showed negative population growth.

## Exploring Birth and Death Rates Statistics

Now, we'll find the top 20 and bottom 20 countries by birth and death rates.

In [25]:
%%sql
SELECT name, birth_rate AS top20_birth_rate
  FROM facts
ORDER BY birth_rate DESC
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,top20_birth_rate
Niger,45.45
Mali,44.99
Uganda,43.79
Zambia,42.13
Burkina Faso,42.03
Burundi,42.01
Malawi,41.56
Somalia,40.45
Angola,38.78
Mozambique,38.58


The _highest birth rate_, up  to 46 births per 1,000 persons, is observed, again, mostly in **African countries** and also in **Afganistan**. This is visually represented on the [Wikipedia birth rate map](https://en.wikipedia.org/wiki/Birth_rate#:~:text=The%20average%20global%20birth%20rate,rate%20was%207.8%20per%201%2C000.&text=The%202016%20average%20of%2018.6,per%20minute%20for%20the%20world.) of 2017, meaning that the general tendency is preserved after 2 years:

<h1><center>Birth Rate World Map</center></h1>


![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b0/Countries_by_Birth_Rate_in_2017.svg/2560px-Countries_by_Birth_Rate_in_2017.svg.png)

In [26]:
%%sql
SELECT name, birth_rate AS bottom20_birth_rate
  FROM facts
WHERE birth_rate NOT NULL
ORDER BY birth_rate
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,bottom20_birth_rate
Monaco,6.65
Saint Pierre and Miquelon,7.42
Japan,7.93
Andorra,8.13
"Korea, South",8.19
Singapore,8.27
Slovenia,8.42
Germany,8.47
Taiwan,8.47
San Marino,8.63


The _lowest birth rate_, up to 7 births per 1,000 persons, is observed in 
**Western Europe** (Monaco, Andorra, Germany, Greece, Italy, Portugal), **some small countries and islands** (Saint Pierre and Miquelon, Singapore, Taiwan, San-Marino, Macao, Hong Kong), **Japan**, **South Korea**, and **Eastern Europe** (Slovenia, Bosnia and Herzegovina, Bulgaria, Serbia, Romania, Hungary).

This time the list of countries doesn't look so obvious, especially the TOP5 countries. Google search, in particular [Knoema data source](https://knoema.com), gives us some graphs showing that the birth rate decline is an evident tendency for these countries for already many years, and seemingly still continuing (in the end of 2020).

![monaco_birth_rate.jpg](attachment:monaco_birth_rate.jpg)

![japan_birth_rate.jpg](attachment:japan_birth_rate.jpg)

![andorra_birth_rate.jpg](attachment:andorra_birth_rate.jpg)

![south_korea_birth_rate.jpg](attachment:south_korea_birth_rate.jpg)

Additionally, this [Wikipedia article](https://en.wikipedia.org/wiki/Birth_rate) throws a light on the reasons for such a situation in Japan:
>As of 2016, Japan has the third lowest crude birth rate (i.e. not allowing for the population's age distribution) in the world, with only Saint Pierre and Miquelon and Monaco having lower crude birth rates. Japan has an unbalanced population with many elderly but few young people, and this is projected to be more extreme in the future, unless there are major changes. An increasing number of Japanese people are staying unmarried: between 1980 and 2010, the percentage of the population who had never married increased from 22% to almost 30%, even as the population continued to age, and by 2035 one in four people will not marry during their childbearing years. The Japanese sociologist Masahiro Yamada coined the term "parasite singles" for unmarried adults in their late 20s and 30s who continue to live with their parents.

Now, let's have a look at the death rate.

In [27]:
%%sql
SELECT name, death_rate AS top20_death_rate
  FROM facts
ORDER BY death_rate DESC
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,top20_death_rate
Lesotho,14.89
Ukraine,14.46
Bulgaria,14.44
Guinea-Bissau,14.33
Latvia,14.31
Chad,14.28
Lithuania,14.27
Namibia,13.91
Afghanistan,13.89
Central African Republic,13.8


The _highest death rate_, up to 15 deaths per 1,000 persons, is observed in **African countries**, **Eastern Europe** (Ukraine, Bulgaria, Serbia, Belarus, Hungary), **Baltic countries**, **Afganistan**, and **Russia**. We see that despite African countries strongly dominate also this list, the birth rates in all of them are much higher, which results in the the highest values of the population growth in them. For the countries of Eastern Europe, high death rates, together with low birth rates, lead to low values of the population growth.      

Anyway, while African countries are notoriously famous for their high death rates (as well as their birth rates), the presence of Ukraine, Bulgaria, and Latvia among the TOP5 countries in this list looks unexpected. In this [Wikipedia article](https://en.wikipedia.org/wiki/Mortality_rate) we find a map of the death rate worldwide that confirms our conclusions.

<h1><center>Death Rate World Map</center></h1>

![Death_rate_world_map.PNG](attachment:Death_rate_world_map.PNG)

One of the main factors of this phenomenon, according to the article, is national income, which is directly tied to standard of living within a country.
In addition, it is mentioned that mortality rates are affected by short term price increases. 

In particular, according to the Wikipedia page for [Ukraine](https://en.wikipedia.org/wiki/Ukraine), the reasons for such a high death rate in this country are major public health issues, alcoholism and smoking among working-age males, obesity, systemic high blood pressure and the HIV endemic. For [Bulgaria](https://en.wikipedia.org/wiki/Bulgaria), the demographic crisis and high death rate result from a combination of an ageing population, a high number of people at risk of poverty, a weak healthcare system, the emigration of doctors due to low wages, understaffed and under-equipped regional hospitals, supply shortages. As for Baltic countries, [The Baltic Times](https://www.baltictimes.com/baltic_countries__mortality_rates_considerably_higher_than_european_average/) mentions diseases of the circulatory system, cancer and injuries as the main causes of high death rates.

The graphs from [Knoema](https://knoema.com) confirm that the tendency for these countries has been continuing for many years.

![ukraine_death_rate.jpg](attachment:ukraine_death_rate.jpg)

![bulgaria_death_rate.jpg](attachment:bulgaria_death_rate.jpg)

![latvia_death_rate.jpg](attachment:latvia_death_rate.jpg)

Next, we'll explore the bottom 20 countries by death rate.

In [28]:
%%sql
SELECT name, death_rate AS bottom20_death_rate
  FROM facts
WHERE death_rate NOT NULL
ORDER BY death_rate
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,bottom20_death_rate
Qatar,1.53
United Arab Emirates,1.97
Kuwait,2.18
Bahrain,2.69
Gaza Strip,3.04
Turks and Caicos Islands,3.1
Saudi Arabia,3.33
Oman,3.36
Singapore,3.43
West Bank,3.5


The _lowest death rates_, up to 2 deaths per 1,000 persons, are mostly related to the **Middle East countries**, which are historically characterized by a very high standard of living.

Now, let's investigate another demografic indicator: the natural increase, i.e. the difference between the birth and death rates.

In [29]:
%%sql
SELECT name, ROUND(birth_rate - death_rate, 1) AS top20_natural_increase
  FROM facts
WHERE death_rate NOT NULL AND birth_rate NOT NULL
ORDER BY top20_natural_increase DESC
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,top20_natural_increase
Malawi,33.2
Uganda,33.1
Niger,33.0
Burundi,32.7
Mali,32.1
Burkina Faso,30.3
Zambia,29.5
Ethiopia,29.1
South Sudan,28.7
Tanzania,28.4


As we could expect from the population growth and birth rate trends, in **African countries** the _birth rate strongly exceeds the death rate_. In addition, a high natural increase is observed in **Iraq**.

In [30]:
%%sql
SELECT name, ROUND(birth_rate - death_rate, 1) AS bottom20_natural_increase
  FROM facts
WHERE death_rate NOT NULL AND birth_rate NOT NULL
ORDER BY bottom20_natural_increase
LIMIT 20;

 * sqlite:///factbook.db
Done.


name,bottom20_natural_increase
Bulgaria,-5.5
Serbia,-4.6
Latvia,-4.3
Lithuania,-4.2
Ukraine,-3.7
Hungary,-3.6
Germany,-3.0
Slovenia,-3.0
Romania,-2.8
Belarus,-2.7


As for the _lowest natural increase_, which being negative is practically a decrease here, not surprisingly, we see again the same countries of **Eastern Europe**, suffering from the demographic crisis (Bulgaria, Serbia, Ukraine, Hungary, Slovenia, Romania, Belarus, Croatia, Bosnia and Herzegovina), **Baltic countries**, **Western Europe** (Germany, Monaco, Greece, Portugal, Italy), **Saint Pierre and Miquelon**, which was at the 2nd place among the bottom 20 countries by birth rate, **Russia**, and **Japan**. 

Serbia at the 2nd place looks curious, we have never encountered it in any top 5 or bottom 5 lists. Let's look at it in more detail.

![serbia_birth_rate.jpg](attachment:serbia_birth_rate.jpg)

![serbia_death_rate.jpg](attachment:serbia_death_rate.jpg)

Both [Knoema](https://knoema.com/atlas/Serbia) and [Wikipedia](https://en.wikipedia.org/wiki/Serbia) confirm that Serbia has been suffering a heavy demographic crisis since the beginning of the 1990s, with a death rate continuously exceeding the birth rate. This country has low life expectancy, one of the oldest populations in the world, and it is shrinking at one of the fastest rates in the world.

## Conclusion
In this project, we have analyzed various demographic and geographic statistics for all the countries in the world from the CIA World Factbook 2015. Below are our main findings.
- The countries with the biggest **population** are China, India, the United States, Indonesia, and Brazil. The least populated are some islands (Pitcairn, Niue, Tokelau, Cocos Islands) and Vatican.
- The highest **population density** is observed in Macau, Monaco, Singapore, and Hong Kong; from bigger countries – Bangladesh and South Korea. The lowest population density: Greenland and Svalbard; from bigger countries – Mongolia, Namibia, and Australia.
- **Population dynamics**:
  - **African countries** show the highest both birth and death rates. However, the birth rates are much higher, which results in the highest values of the natural increase and population growth in the world.
  - In **Western Europe**, the following countries demonstrate low birth rates: Monaco, Andorra, Germany, Greece, Italy, and Portugal. It leads to a negative natural increase, and, as a result, to a low population growth. However, in Germany and Italy the negative natural increase seems to be compensated by the immigration, hence the population growth for them is not among the lowest ones. 
  - In many **Eastern Europe** and **Baltic countries**, a negative natural increase is observed. In some of them (Bosnia and Herzegovina, Slovenia, Romania) it is explained by low birth rates, in some other (Ukraine, Belarus, Baltic countries) – by high death rates; finally, in the others (Bulgaria, Serbia, Hungary) both low birth rates and high death rates take place. 
  - All the countries of the **Middle East** show the lowest death rates in the world.
  - **Iraq** is related to the countries with the highest birth rates, which leads to a high natural increase and population growth.
  - In **Afganistan**, both the birth and death rates are one of the highest in the world.
  - **Japan**, **Saint Pierre and Miquelon**, and **South Korea** have ones of the lowest birth rates in the world, leading to a negative natural increase.
  - **Russia** has one of the highest death rates, leading to a negative natural increase and low population growth.
- **Area**. The biggest countries are Russia, Canada, the United States, China, and Brazil. The smallest ones: Vatican and Monaco.