# SQL Exploration of CIA Factbook

The purpose of this project is to explore using **SQL** through accessing a real world database.

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

The previous cell loads up an instance of SQLite and connects it to the CIA Factbook database that we have.


This following cell will query *factbook.db* and return information about the contained tables.

In [5]:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


This next query is to gather some information about the facts table. Just getting the first 5 rows will tell us which columns are available as well as give us an idea of the data involved.

In [4]:
%%sql
SELECT * FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


In order to make any interesting insights, we will look at the minimum and maximum values for population and population growth.

In [6]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts;

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


I notice that the minimums for population and population_growth are both 0. 

This leads me to believe that there is some missing information. The population cannot be 0 for any known country. It is more likely that the information was not available, and therefore assumed to be 0. The same is true for population_growth.

The population max is 7,2xx,xx,xxx or 7.2 billion. This does not seem to be any individual country. I will look into what rows give us these odd values.

In [8]:
%%sql
SELECT name AS country
FROM facts
WHERE population = (SELECT MIN(population)
                    FROM facts)
OR population = (SELECT MAX(population)
                 FROM facts);

 * sqlite:///factbook.db
Done.


country
Antarctica
World


This makes more sense, now that the table has been queried. Because the first 5 values were countries, I assumed that the value of *name* would always be country rather than a geographical feature. 

Antarctica is not a country, though it is a land mass with 0 population. And the World as a whole would contain a population in the realm of 7 billion when these data were gathered.

Excluding these two rows, I will gather Min/Max population and population_growth.

In [9]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts
WHERE name != 'World' AND name != 'Antarctica';

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
48,1367485388,0.0,4.02


This makes more sense, though it still seems odd to have a population of 48... More research into that can be done later.

In [11]:
%%sql
SELECT AVG(population), AVG(area)
FROM facts
WHERE name != 'World' AND name != 'Antarctica';

 * sqlite:///factbook.db
Done.


AVG(population),AVG(area)
32377011.0125,555093.546184739


Now we know that the average population of the named regions is around 32 Million. With an average area of 555,000 .. (not sure the measurement) Probably Square KM or Square Miles.

In [12]:
%%sql
SELECT name AS country
FROM facts
WHERE population > (SELECT AVG(population)
                    FROM facts
                    WHERE name != 'World' AND name != 'Antarctica') 
AND area < (SELECT AVG(area)
            FROM facts
            WHERE name != 'World' AND name != 'Antarctica');

 * sqlite:///factbook.db
Done.


country
Bangladesh
Germany
Iraq
Italy
Japan
"Korea, South"
Morocco
Philippines
Poland
Spain


Future Questions to Consider...

1. Which country has the most people?
2. Which country has the highest growth rate?
3. Which countries have the highest ratios of water to land? 
4. Which countries have more water than land?
5. Which countries will add the most people to their population next year?
6. Which countries have a higher death rate than birth rate?
7. What countries have the highest population/area ratio and how does it compare to list we found in the previous screen?
