<h1>CIA_Factbook_SQL</h1>
<h3>Generate summary statistics using SQL</h3>
The Data is from CIA World Factbook (factbook.db) and was provided by DataQuest.

<h4>SQL: install and connect</h4>

In [10]:
!conda install -yc conda-forge ipython-sql

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [6]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

<h4>Quick look at data</h4>

In [9]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


- [name] - The name of the country.
- [area] - The total land and sea area of the country.
- [population] - The country's population.
- [population_growth] - The country's population growth as a percentage.
- [birth_rate] - The country's birth rate, or the number of births a year per 1,000 people.
- [death_rate] - The country's death rate, or the number of death a year per 1,000 people.
- [area] - The country's total area (both land and water).
- [area_land] - The country's land area in square kilometers.
- [area_water] - The country's waterarea in square kilometers.

<h4>Summary of data: min/max population, min/max population growth

In [12]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth
  FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


<h2>Population</h2>
<h4>Outliers: select countries with min population</h4>
(according to cell with summary there is a country without population)

In [111]:
%%sql
SELECT name AS Lowest_pop_country,
       population
  FROM facts
 WHERE population == (SELECT MIN(population)
                      FROM   facts);

 * sqlite:///factbook.db
Done.


Lowest_pop_country,population
Antarctica,0


<h4>Outliers: select countries with max population</h4>
(according to cell above there is a country with 7 billion population)

In [110]:
%%sql
SELECT name AS Highest_pop_country,
       population
  FROM facts
 WHERE population == (SELECT MAX(population)
                      FROM   facts);

 * sqlite:///factbook.db
Done.


Highest_pop_country,population
World,7256490011


In [28]:
%%sql
SELECT name AS Highest_pop_country,
       population
  FROM facts
 WHERE population == (SELECT MAX(population)
                      FROM   facts
                      WHERE  name <> 'World');

 * sqlite:///factbook.db
Done.


Highest_pop_country,population
China,1367485388


<h2>Density</h2>
<h4>Get average population/country and average area/country</h4>
(excluding World)

In [108]:
%%sql
SELECT CAST(AVG(population) AS Integer) AS avg_pop,
       ROUND(AVG(area), 2) AS avg_area
  FROM facts
 WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


avg_pop,avg_area
32242666,555093.55


<h4>Select high density countries</h4>
Countries with population above average, area below average

In [42]:
%%sql
  SELECT name,
         population,
         area
    FROM facts
   WHERE population > (SELECT AVG(population)
                       FROM   facts
                       WHERE  name <> 'World')
     AND area < (SELECT AVG(area)
                 FROM   facts
                 WHERE  name <> 'World')
ORDER BY population DESC
   LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area
Bangladesh,168957745,148460
Japan,126919659,377915
Philippines,100998376,300000
Vietnam,94348835,331210
Germany,80854408,357022
Thailand,67976405,513120
United Kingdom,64088222,243610
Italy,61855120,301340
"Korea, South",49115196,99720
Spain,48146134,505370


Top 5 countries with highest population density<br/>
(population_density = population / area)

In [45]:
%%sql
  SELECT name,
         population,
         area,
         population / area AS population_density
    FROM facts
ORDER BY population_density DESC
   LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,population_density
Macau,592731,28,21168
Monaco,30535,2,15267
Singapore,5674472,697,8141
Hong Kong,7141106,1108,6445
Gaza Strip,1869055,360,5191
Gibraltar,29258,6,4876
Bahrain,1346613,760,1771
Maldives,393253,298,1319
Malta,413965,316,1310
Bermuda,70196,54,1299


<h2>Growth Rate</h2>
<h4>Select countries with highest population and highest growth rate</h4>

In [33]:
%%sql
SELECT name AS Highest_growth_country,
       population_growth
  FROM facts
 WHERE population_growth == (SELECT MAX(population_growth)
                             FROM   facts
                             WHERE  name <> 'World');

 * sqlite:///factbook.db
Done.


Highest_growth_country,population_growth
South Sudan,4.02


<h4>Calculate population for next year</h4>
new_population = population * population_growth

(show top 5 countries with highest expected population)

In [46]:
%%sql
  SELECT name,
         population,
         population_growth,
         CAST(population * population_growth AS Integer) AS new_population
    FROM facts
   WHERE name <> 'World'
ORDER BY new_population DESC
   LIMIT 5;

 * sqlite:///factbook.db
Done.


name,population,population_growth,new_population
India,1251695584,1.22,1527068612
China,1367485388,0.45,615368424
Nigeria,181562056,2.45,444827037
Pakistan,199085847,1.46,290665336
Ethiopia,99465819,2.89,287456216


<h2>Migration Rate</h2>
<h4>Outiers: highest migration rate</h4> 

In [79]:
%%sql
SELECT name AS highest_migration_rate_country,
       migration_rate
  FROM facts
 WHERE migration_rate == (SELECT MAX(migration_rate)
                          FROM   facts
                          WHERE  name <> 'World');

 * sqlite:///factbook.db
Done.


highest_migration_rate_country,migration_rate
Qatar,22.39


<h4>Count countries with migration_rate is equal to 0.0</h4>

In [100]:
%%sql
SELECT SUM(migration_rate=0.0) AS countries_zero_migration_rate
  FROM facts;

 * sqlite:///factbook.db
Done.


countries_zero_migration_rate
35


<h4>Sort Countries according to migration_rate</h4>

In [106]:
%%sql
  SELECT name AS country,
         migration_rate,
         CASE
         WHEN migration_rate < (SELECT AVG(migration_rate) FROM facts) THEN 'low'
         WHEN migration_rate > (SELECT AVG(migration_rate) FROM facts) THEN 'high'
         END AS classified_migration_rate
    FROM facts
   WHERE migration_rate <> 'None'   
GROUP BY country
ORDER BY classified_migration_rate;

 * sqlite:///factbook.db
Done.


country,migration_rate,classified_migration_rate
American Samoa,21.13,high
Anguilla,12.18,high
Armenia,5.8,high
Aruba,8.92,high
Australia,5.65,high
Austria,5.56,high
Bahrain,13.09,high
Belgium,5.87,high
Botswana,4.56,high
British Virgin Islands,17.28,high


<h2>Area</h2>
<h4>Select Countries that have more water than land</h4>
(ratio_water_land = area_water/area_land)

In [50]:
%%sql
SELECT name,
       area_water,
       area_land,
       (area_water / area_land) AS ratio_water_land
  FROM facts
 WHERE area_water > area_land
   AND name <> 'World';

 * sqlite:///factbook.db
Done.


name,area_water,area_land,ratio_water_land
British Indian Ocean Territory,54340,60,905
Virgin Islands,1564,346,4


British Indian Ocian Territory has 905 times more water than land<br/>
Virgin Islands has 4 times more water than land

<h4>Calculate percentage water</h4>
(perc_area_water = area_water / area)

In [132]:
%%sql
  SELECT name,
         area,
         area_land,
         area_water,
         ROUND((CAST(area_water AS Float) / CAST(area AS Float) * 100),2) AS perc_area_water
    FROM facts
   WHERE name <> 'World'
     AND perc_area_water <> 'None'
ORDER BY perc_area_water DESC;

 * sqlite:///factbook.db
Done.


name,area,area_land,area_water,perc_area_water
British Indian Ocean Territory,54400,60.0,54340,99.89
Virgin Islands,1910,346.0,1564,81.88
Puerto Rico,13791,8870.0,4921,35.68
"Bahamas, The",13880,10010.0,3870,27.88
Guinea-Bissau,36125,28120.0,8005,22.16
Malawi,118484,94080.0,24404,20.6
Netherlands,41543,33893.0,7650,18.41
Uganda,241038,197100.0,43938,18.23
Eritrea,117600,101000.0,16600,14.12
Liberia,111369,96320.0,15049,13.51
