## Project 8 - Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the CIA World Factbook (the SQLite factbook.db database), a compendium of statistics about all of the countries on Earth. The object of this project is to explore the database using Jupyter Notebook, filtering the data accordingly to the purpose of our analysis and finding densely populated countries.

![Image](https://images.unsplash.com/photo-1517732306149-e8f829eb588a?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2072&q=80)
_Photo by Jacek Dylag on Unsplash_

In [14]:
%%capture
%load_ext sql
%sql sqlite:///C:\Users\Denisa\Desktop\db\factbook.db

In [15]:
%%sql
SELECT *
  FROM sqlite_master
WHERE type='table';


 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


### Overview of the Data

In [16]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns:

* `name` — the name of the country.
* `area` — the total land and sea area of the country.
* `population` — the country's population.
* `population_growth`— the country's population growth as a percentage.
* `birth_rate` — the country's birth rate, or the number of births a year per 1,000 people.
* `death_rate` — the country's death rate, or the number of death a year per 1,000 people.
* `area— the` country's total area (both land and water).
* `area_land` — the country's land area in square kilometers.
* `area_water` — the country's water area in square kilometers.

### Summary Statistics

In [17]:
%%sql
SELECT MIN(population), MAX(population),MIN(population_growth),MAX(population_growth) FROM facts;

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


Interesting observations based on the previous query:
* There's a country with a population of `0`.
* There's a country with a population of `7256490011` (or more than 7.2 billion people).

### Exploring Outliers

In [18]:
%%sql
SELECT name from facts where population=(SELECT MIN(population) from facts);

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


name
Antarctica


In [19]:
%%sql
SELECT name from facts where population=(SELECT MAX(population) from facts);

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


name
World


### Summary Statistics Revisited

We recompute the summary statistics you found earlier while excluding the row for the whole world. 

In [7]:
%%sql
SELECT MIN(population), MAX(population),MIN(population_growth),MAX(population_growth) FROM facts where population!=(SELECT MAX(population) from facts);

Environment variable $DATABASE_URL not set, and no connect string given.
Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])


### Exploring Average Population and Area

In [20]:
%%sql
SELECT AVG(population) FROM facts;

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


AVG(population)
62094928.32231405


In [21]:
%%sql
SELECT AVG(area) FROM facts;

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


AVG(area)
555093.546184739


The focus of the analysis is the density factor, which depends on the population and the country's area.We see that the average population is around 32 million and the average area is 555 thousand square kilometers.

### Finding Densely Populated Countries

We'll identify countries that have the following:

* Above-average values for population.
* Below-average values for area.


In [22]:
%%sql
SELECT name from facts where population > (SELECT AVG(population) FROM facts) and area>(SELECT AVG(area) FROM facts);

 * sqlite:///C:\Users\Denisa\Desktop\db\factbook.db
Done.


name
Brazil
China
"Congo, Democratic Republic of the"
Egypt
Ethiopia
France
India
Indonesia
Iran
Mexico
