# Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

* population — the global population.
* population_growth — the annual population growth rate, as a percentage.
* area — the total land and water area.

### 1. Connect to database


In [3]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

### 2. Overview of the Data

In [4]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [5]:
%%sql
SELECT *
  FROM facts
  LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns:

* name - The name of the country.
* area - The total land and sea area of the country.
* area_land - The country's land area in square kilometers.
* area_water - The country's waterarea in square kilometers.
* population - The country's population.
* population_growth- The country's population growth as a percentage.
* birth_rate - The country's birth rate, or the number of births a year per 1,000 people.
* death_rate - The country's death rate, or the number of death a year per 1,000 people.

Let's start by calculating some summary statistics and see what they tell us.

### 3. Summary Statistics

In [6]:
%%sql
SELECT MIN(population) AS min_pop,
    MAX(population) AS max_pop,
    MIN(population_growth) AS min_pop_growth,
    MAX(population_growth) AS max_pop_growth
  FROM facts;

Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02



From the summary statistics above, we have following finding:

- There's a country with a population of 0
- There's a country with a population of 7256490011 (more than 7 billion)

Let's use subqueries to zoom in on just these countries without using the specific values.

### 4. Exploring Outliers

In [7]:
%%sql
SELECT *
  FROM facts
  WHERE population == (
      SELECT MIN(population)
      FROM facts
      )  ;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [8]:
%%sql
SELECT *
  FROM facts
  WHERE population == (
      SELECT MAX(population)
      FROM facts
      )  ;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0.

We should exclude the row for Antarctica and for the whole world, andrecalculate the summary statistics.

### 5. Summary Statistics Revisited

In [9]:
%%sql
SELECT MIN(population) AS min_pop,
    MAX(population) AS max_pop,
    MIN(population_growth) AS min_pop_growth,
    MAX(population_growth) AS max_pop_growth
  FROM facts
   WHERE name NOT IN ('World','Antarctica');

Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
48,1367485388,0.0,4.02


### 6. Exploring Average Population and Area

In [10]:
%%sql
SELECT AVG(population) AS avg_pop,
  AVG(area) AS avg_area
  FROM facts
  WHERE name NOT IN ('World','Antarctica')

Done.


avg_pop,avg_area
32377011.0125,555093.546184739


- The average population is around 32 million 
- The average area is 555 thousand square kilometers.


### 7. Finding Densely Populated Countries

To finish, we'll build on the query above to find countries that are densely populated. We'll identify countries that have:

* Above average values for population.
* Below average values for area.

In [11]:
%%sql
SELECT * 
    FROM facts
    WHERE population > ( SELECT AVG(population)
                        FROM facts
                       WHERE name NOT IN ('World','Antarctica')
                       )
      AND area < (SELECT AVG(area) 
                  FROM facts
                  WHERE name NOT IN ('World','Antarctica')
                 );


Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


### 8. Which country has the most people?

In [12]:
%%sql
SELECT name AS Contry, population
  FROM facts
  WHERE name <>'World'
  ORDER BY population DESC
  LIMIT 1;

Done.


Contry,population
China,1367485388


### 9. Which country has the highest growth rate?

In [13]:
%%sql
SELECT name AS Contry, population_growth
  FROM facts
  ORDER BY population_growth DESC
  LIMIT 1;

Done.


Contry,population_growth
South Sudan,4.02


### 10. Which countries have the highest ratio of water to land? 

In [14]:
%%sql
SELECT name AS Contry, 
       area_water/area_land AS  Water_to_Land_Ratio 
  FROM facts
  WHERE name <>'World'
  ORDER BY Water_to_Land_Ratio DESC
  LIMIT 1;

Done.


Contry,Water_to_Land_Ratio
British Indian Ocean Territory,905


### 11. Which countries have more water than land?

In [15]:
%%sql
SELECT name AS Contry, 
       area_water/area_land AS  Water_to_Land_Ratio 
  FROM facts
  WHERE name <>'World'
    AND Water_to_Land_Ratio >1
  ORDER BY Water_to_Land_Ratio DESC;


Done.


Contry,Water_to_Land_Ratio
British Indian Ocean Territory,905
Virgin Islands,4


### 12. Which countries will add the most people to their populations next year?

In [16]:
%%sql
SELECT name AS Contry, 
      population*population_growth AS  net_population_growth
  FROM facts
  WHERE name <>'World'
  ORDER BY  net_population_growth DESC
  LIMIT 1;

Done.


Contry,net_population_growth
India,1527068612.48


### 13. Which countries have a higher death rate than birth rate?

In [17]:
%%sql
SELECT name AS Contry, birth_rate, death_rate, population/area AS population_area_ratio
  FROM facts
  WHERE name <>'World' AND birth_rate < death_rate;

Done.


Contry,birth_rate,death_rate,population_area_ratio
Austria,9.41,9.42,103
Belarus,10.7,13.36,46
Bosnia and Herzegovina,8.87,9.75,75
Bulgaria,8.92,14.44,64
Croatia,9.45,12.18,78
Czech Republic,9.63,10.34,134
Estonia,10.51,12.4,27
Germany,8.47,11.42,226
Greece,8.66,11.09,81
Hungary,9.16,12.73,106


### 14. Which countries have the highest population/area ratio

In [18]:
%%sql
SELECT name AS Contry, 
      population/area AS population_area_ratio
  FROM facts
  WHERE name <>'World'
  ORDER BY  population_area_ratio DESC
  LIMIT 1;

Done.


Contry,population_area_ratio
Macau,21168
