# SQL | Joining Data
---
- `INNER`, `LEFT`, `RIGHT` joins

In [1]:
# importing libraries and database

import sqlite3
import pandas as pd

conn = sqlite3.connect("factbook.db")

In [2]:
q1 = ''' SELECT * FROM sqlite_master WHERE type='table' '''
pd.read_sql_query(q1, conn)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
1,table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY..."
2,table,cities,cities,2,CREATE TABLE cities (\n id integer prim...


### Instructions

Write a query that returns all columns from the facts and cities tables.
   - Use an __`INNER JOIN`__ to join the cities table to the facts table.
   - Join the tables on the values where __`facts.id`__ and __`cities.facts_id`__ are equal.
   - Limit the query to the first 10 rows.

In [3]:
q1 = '''SELECT * FROM facts
    INNER JOIN cities 
    ON facts.id = cities.facts_id
    LIMIT 10'''
pd.read_sql_query(q1, conn)

Unnamed: 0,id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,id.1,name.1,population.1,capital,facts_id
0,216,aa,Aruba,180,180,0,112162,1.33,12.56,8.18,8.92,1,Oranjestad,37000,1,216
1,6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21,2,Saint John'S,27000,1,6
2,184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,3,Abu Dhabi,942000,1,184
3,184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,4,Dubai,1978000,0,184
4,184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,5,Sharjah,983000,0,184
5,1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51,6,Kabul,3097000,1,1
6,3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,7,Algiers,2916000,1,3
7,3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,8,Oran,783000,0,3
8,11,aj,Azerbaijan,86600,82629,3971,9780780,0.96,16.64,7.07,0.0,9,Baku,2123000,1,11
9,2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3,10,Tirana,419000,1,2


### Instructions

1. Write a query that:
    - Joins __`cities`__ to __`facts`__ using an __`INNER JOIN`__.
    - Uses aliases for table names.
2. Includes, in order:
    - All columns from cities.
    - The name column from facts aliased to __`country_name`__.
3. Includes only the first 5 rows.

In [4]:
q1 = '''SELECT c.*, f.name country_name FROM facts f
    INNER JOIN cities c
    ON c.facts_id = f.id
    LIMIT 5'''
pd.read_sql_query(q1, conn)

Unnamed: 0,id,name,population,capital,facts_id,country_name
0,1,Oranjestad,37000,1,216,Aruba
1,2,Saint John'S,27000,1,6,Antigua and Barbuda
2,3,Abu Dhabi,942000,1,184,United Arab Emirates
3,4,Dubai,1978000,0,184,United Arab Emirates
4,5,Sharjah,983000,0,184,United Arab Emirates


### Instructions

1. Write a query that uses an __`INNER JOIN`__ to join the two tables in your query and returns, in order:
    - A column of country names, called __`country`__.
    - A column of each country's capital city, called __`capital_city`__

In [5]:
q1 = '''SELECT f.name country, c.name capital_city
    FROM cities c
    INNER JOIN facts f 
    ON f.id = c.facts_id
    WHERE c.capital = 1'''
pd.read_sql_query(q1, conn)

Unnamed: 0,country,capital_city
0,Aruba,Oranjestad
1,Antigua and Barbuda,Saint John'S
2,United Arab Emirates,Abu Dhabi
3,Afghanistan,Kabul
4,Algeria,Algiers
...,...,...
203,Samoa,Apia
204,Swaziland,Mbabane
205,Yemen,Sanaa
206,Zambia,Lusaka


### Instructions

1. Write a query that returns the __countries__ that don't exist in __`cities`__:
- Your query should return two columns:
    - The country names, with the alias __`country`__.
    - The country population.
- Use a __`LEFT JOIN`__ to join __`cities`__ to __`facts`__.
- Include only the countries from facts that don't have a corresponding value in __`cities`__.

In [6]:
q1 = '''SELECT f.name country, f.population from facts f
    LEFT JOIN cities c ON c.facts_id = f.id
    WHERE c.facts_id IS NULL
    LIMIT 10''' #actual result of the query conntains 50 rows
pd.read_sql_query(q1, conn)

Unnamed: 0,country,population
0,Kosovo,1870981.0
1,Monaco,30535.0
2,Nauru,9540.0
3,San Marino,33020.0
4,Singapore,5674472.0
5,Holy See (Vatican City),842.0
6,Taiwan,23415126.0
7,European Union,513949445.0
8,Ashmore and Cartier Islands,
9,Christmas Island,1530.0


### Instructions

1. Write a query that returns the 10 capital cities with the highest population ranked from biggest to smallest population.
- You should include the following columns, in order:
    - __`capital_city`__, the name of the city.
    - __`country`__, the name of the country the city is from.
    - __`population`__, the population of the city.

In [7]:
q1 = '''SELECT c.name capital_city, f.name country, c.population from cities c
    LEFT JOIN facts f ON c.facts_id = f.id
    WHERE c.capital IS 1
    ORDER BY c.population DESC
    LIMIT 10''' #actual result of the query conntains 50 rows
pd.read_sql_query(q1, conn)

Unnamed: 0,capital_city,country,population
0,Tokyo,Japan,37217000
1,New Delhi,India,22654000
2,Mexico City,Mexico,20446000
3,Beijing,China,15594000
4,Dhaka,Bangladesh,15391000
5,Buenos Aires,Argentina,13528000
6,Manila,Philippines,11862000
7,Moscow,Russia,11621000
8,Cairo,Egypt,11169000
9,Jakarta,Indonesia,9769000


### Instructions

1. Using a join and a subquery, write a query that returns capital cities with populations of over 10 million ordered from largest to smallest. Include the following columns:
    - __`capital_city`__ - the name of the city.
    - __`country`__ - the name of the country the city is the capital of.
    - __`population`__ - the population of the city

In [8]:
q1 = '''SELECT c.name capital_city, f.name country, c.population 
    FROM facts f
    INNER JOIN (SELECT * FROM cities WHERE capital = 1 and population > 10000000) c
    ON c.facts_id = f.id
    ORDER BY c.population DESC''' 
pd.read_sql_query(q1, conn)

Unnamed: 0,capital_city,country,population
0,Tokyo,Japan,37217000
1,New Delhi,India,22654000
2,Mexico City,Mexico,20446000
3,Beijing,China,15594000
4,Dhaka,Bangladesh,15391000
5,Buenos Aires,Argentina,13528000
6,Manila,Philippines,11862000
7,Moscow,Russia,11621000
8,Cairo,Egypt,11169000


### Instructions

1. Write a query that generates output as shown above. The query should include:
    - The following columns, in order:
        - __`country`__, the name of the country.
        - __`urban_pop`__, the sum of the population in major urban areas belonging to that country.
        - __`total_pop`__, the total population of the country.
        - __`urban_pct`__, the percentage of the popularion within urban areas, calculated by dividing __`urban_pop`__ by __`total_pop`__.
    - Only countries that have an __`urban_pct`__ greater than __`0.5`__.
    - Rows should be sorted by __`urban_pct`__ in ascending order.

In [9]:
q1 = '''SELECT 
        f.name country, 
        c.urban_pop, 
        f.population total_pop, 
        (c.urban_pop / CAST(f.population as float)) urban_pct
    FROM facts f
    INNER JOIN 
    (SELECT facts_id, SUM(population) urban_pop FROM cities 
    GROUP BY facts_id) c
    ON f.id = c.facts_id  
    WHERE urban_pct > 0.5
    ORDER BY urban_pct ASC''' 
pd.read_sql_query(q1, conn)

Unnamed: 0,country,urban_pop,total_pop,urban_pct
0,Uruguay,1672000,3341893,0.500315
1,"Congo, Republic of the",2445000,4755097,0.514185
2,Brunei,241000,429646,0.560927
3,New Caledonia,157000,271615,0.578024
4,Virgin Islands,60000,103574,0.579296
5,Falkland Islands (Islas Malvinas),2000,3361,0.595061
6,Djibouti,496000,828324,0.5988
7,Australia,13789000,22751014,0.606083
8,Iceland,206000,331918,0.620635
9,Israel,5226000,8049314,0.649248


In [10]:
#Closing a sqlite3 connection
conn.close()