## Guided Project: Analyzing CIA Factbook Data Using SQL
This is a guided project from Dataquest for learning SQL strategies


#### Initialize the connection to the DB and run some queries to test the connection

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [4]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [6]:
%%sql
SELECT * 
    FROM facts
    LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


#### Begin Exploring The data
Using some simple queries, let's learn about the countries in the DB

Find: 
 - Minimum population
 - Max population
 - Min pop growth
 - Max pop growth

In [7]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts;

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


We see there is a country(or countries) with 0 population and 0.0 growth. So let's figure out what those are

In [10]:
%%sql
SELECT name, population
    FROM facts
    WHERE population == (SELECT MIN(population)
                            FROM facts);

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0


Ok, so Antarctica has the min population. That makes sense. What about the max population?

In [11]:
%%sql
SELECT name, population
    FROM facts
    WHERE population == (SELECT MAX(population)
                            FROM facts);

 * sqlite:///factbook.db
Done.


name,population
World,7256490011


Well, ok, the highest population is the 'WORLD', which isn't exactly what I want. Let's run the min/max again but exclude the 'World' row

In [14]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts
    WHERE name NOT IN ('World');

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


Great, which country is that then?

In [18]:
%%sql
SELECT name, population
    FROM facts
    WHERE population == (SELECT MAX(population) 
                         FROM facts
                         WHERE name NOT IN ('World'));

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


Cool, so the greatest population is China, which is generally known. 

Next, lets start exploring different data like population/area to see some density stuff


In [22]:
%%sql
SELECT name, population, area
    FROM facts
    WHERE name NOT IN ('World')
    ORDER BY population DESC
    LIMIT 5;

 * sqlite:///factbook.db
Done.


name,population,area
China,1367485388,9596960
India,1251695584,3287263
European Union,513949445,4324782
United States,321368864,9826675
Indonesia,255993674,1904569


From this, we can calculate population densities, let's see a list of countries with HIGHER than average denisity and LOWER than average density

In [30]:
%%sql
SELECT name, population, area, population/area AS density
    FROM facts
    WHERE name NOT IN ('World')
    AND density < (SELECT AVG(population/area) FROM facts)
    ORDER BY density DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,density
Tuvalu,10869,26,418
Netherlands,16947904,41543,407
Marshall Islands,72191,181,398
Israel,8049314,20770,387
Burundi,10742276,27830,385
India,1251695584,3287263,380
Belgium,11323973,30528,370
Haiti,10110019,27750,364
Comoros,780971,2235,349
Philippines,100998376,300000,336


So these countries above are BELOW average density, what about ABOVE?

In [31]:
%%sql
SELECT name, population, area, population/area AS density
    FROM facts
    WHERE name NOT IN ('World')
    AND density > (SELECT AVG(population/area) FROM facts)
    ORDER BY density DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,density
Macau,592731,28,21168
Monaco,30535,2,15267
Singapore,5674472,697,8141
Hong Kong,7141106,1108,6445
Gaza Strip,1869055,360,5191
Gibraltar,29258,6,4876
Bahrain,1346613,760,1771
Maldives,393253,298,1319
Malta,413965,316,1310
Bermuda,70196,54,1299


Macau, Monaco, Singapore. All of these make sense as high-density countries. 