# SQLITE Project Analysis

These are my notes for SQL Fundamentals. I will be analyzing CIA Factbook Data.
I will be combining sql queries with pandas.

In [1]:
import pandas as pd
import sqlite3

In [2]:
## Connect to database
conn = sqlite3.connect('data/factbook.db')

In [3]:
## A function to return sql queries in pandas DataFrame
def run(q):
    return pd.read_sql(q, conn)

In [4]:
## Let's look at content of the database
content='''
SELECT *
FROM sqlite_master
WHERE type='table'
'''

run(content)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
1,table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY..."
2,table,cities,cities,2,CREATE TABLE cities (\n id integer prim...


## Schema diagram:

![](data/schema.svg)

In [5]:
facts_table='''
SELECT *
FROM facts
LIMIT 5
'''
run(facts_table)

Unnamed: 0,id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
0,1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
1,2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
2,3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
3,4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
4,5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## 1) Query to find countries with the highest and lowest population and population growth:

In [6]:
q1='''
SELECT  
        MIN(population) min_population, 
        MAX(population) max_population, 
        MIN(population_growth) min_population_growth,
        MAX(population_growth) max_population_growth
FROM facts
'''
run(q1)

Unnamed: 0,min_population,max_population,min_population_growth,max_population_growth
0,0,7256490011,0.0,4.02


#### Population with over 7 billion looks a bit odd. Let's investigate that in the next query.

## 2) Let's find the country that has the highest population

In [7]:
q2='''
SELECT MAX(population) AS max_population, name
FROM facts
'''
run(q2)

Unnamed: 0,max_population,name
0,7256490011,World


## 3) Let's find the country that has the lowest population

In [8]:
q3='''
SELECT MIN(population) AS min_population, name
FROM facts
'''
run(q3)

Unnamed: 0,min_population,name
0,0,Antarctica


####  As you can see above, there is a 'country' called World in the table. Now it makes sense why we have over 7 billion people in the table. The same goes for the 'country' that has 0 population. Because it is Antarctica.

## 4) Now let's find the country with the lowest population after Antarctica

In [9]:
q4='''
SELECT 
    MIN(population) AS min_population, 
    name
FROM facts
WHERE population > (SELECT MIN(population) FROM facts)
'''
run(q4)

Unnamed: 0,min_population,name
0,48,Pitcairn Islands


In [10]:
q5='''
SELECT  
        MIN(population) min_population, 
        MAX(population) max_population, 
        MIN(population_growth) min_population, 
        MAX(population_growth) max_population
FROM facts
WHERE name <> 'World'
AND name <> 'Antarctica'
'''
run(q5)

Unnamed: 0,min_population,max_population,min_population.1,max_population.1
0,48,1367485388,0.0,4.02


## 6) Query to find countries that are densely populated. We'll identify countries that have the following:

* Above-average values for population.

* Below-average values for area.

In [11]:
q6='''
SELECT 
        name, 
        population, 
        area
FROM facts
WHERE population > (SELECT AVG(population) FROM facts WHERE name <> 'World')
AND area < (SELECT AVG(area) FROM facts)
ORDER BY population ASC
'''
run(q6)

Unnamed: 0,name,population,area
0,Morocco,33322699,446550
1,Iraq,37056169,438317
2,Uganda,37101745,241038
3,Poland,38562189,312685
4,Spain,48146134,505370
5,"Korea, South",49115196,99720
6,Italy,61855120,301340
7,United Kingdom,64088222,243610
8,Thailand,67976405,513120
9,Germany,80854408,357022


## 7) Query to find top 5 countries with the highest ratio of water to land:

In [12]:
q7='''
SELECT  
        name, 
        area_water/CAST(area_land AS float) AS ratio
FROM facts
ORDER BY ratio DESC
LIMIT 5
'''
run(q7)

Unnamed: 0,name,ratio
0,British Indian Ocean Territory,905.666667
1,Virgin Islands,4.520231
2,Puerto Rico,0.554791
3,"Bahamas, The",0.386613
4,Guinea-Bissau,0.284673


## 8) Top 10 countries that have a higher death rate than birth rate:

In [13]:
q8='''
SELECT 
        name,
        birth_rate,
        death_rate,
        birth_rate/death_rate ratio
FROM facts
WHERE birth_rate < death_rate
ORDER BY ratio ASC
LIMIT 10
'''
run(q8)

Unnamed: 0,name,birth_rate,death_rate,ratio
0,Bulgaria,8.92,14.44,0.617729
1,Serbia,9.08,13.66,0.664714
2,Latvia,10.0,14.31,0.698812
3,Lithuania,10.1,14.27,0.707779
4,Hungary,9.16,12.73,0.71956
5,Monaco,6.65,9.24,0.719697
6,Slovenia,8.42,11.37,0.740545
7,Ukraine,10.72,14.46,0.741355
8,Germany,8.47,11.42,0.741681
9,Saint Pierre and Miquelon,7.42,9.72,0.763374


## 9) Which countries have more water than land?

In [14]:
q9='''
SELECT 
        name, 
        area_land, 
        area_water
FROM facts
WHERE area_land < area_water
'''
run(q9)

Unnamed: 0,name,area_land,area_water
0,British Indian Ocean Territory,60,54340
1,Virgin Islands,346,1564


## THE END