In [1]:
import pandas as pd
import sqlite3
import os

# Path to the directory containing your CSV files
directory = 'C:/Users/aBr/Downloads/countries/countries'

# Connect to the SQLite database
conn = sqlite3.connect('1st.db')

# Iterate over every CSV file in the directory
for filename in os.listdir(directory):
    if filename.endswith('.csv'):
        # Construct the full file path
        file_path = os.path.join(directory, filename)

        # Read the CSV file into a pandas DataFrame
        df = pd.read_csv(file_path)

        # Extract table name from the filename (remove '.csv')
        table_name = os.path.splitext(filename)[0]

        # Import the DataFrame into the SQLite database
        df.to_sql(table_name, conn, if_exists='replace', index=False)

# Close the connection
conn.close()


In [3]:
# Load the SQL extension
%load_ext sql

# Connect to your SQLite database
%sql sqlite:///1st.db

Suppose you are interested in the relationship between fertility and unemployment rates. Your task in this exercise is to join tables to return the country name, year, fertility rate, and unemployment rate in a single result from the countries, populations and economies tables.

In [14]:
%%sql
SELECT country_name,year,fertility_rate FROM countries AS c
INNER JOIN populations as p
ON c.code=p.country_code
LIMIT 5;

 * sqlite:///1st.db
Done.


country_name,year,fertility_rate
Afghanistan,2010,5.746
Afghanistan,2015,4.653
Netherlands,2010,1.79
Netherlands,2015,1.71
Albania,2010,1.663


In [21]:
%%sql
SELECT country_name, e.year, fertility_rate, unemployment_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
INNER JOIN economies AS e
-- Match on country code
ON c.code = e.code
limit 4;

 * sqlite:///1st.db
Done.


country_name,year,fertility_rate,unemployment_rate
Afghanistan,2010,4.653,
Afghanistan,2015,4.653,
Afghanistan,2010,5.746,
Afghanistan,2015,5.746,


# Semi join

Let's say you are interested in identifying languages spoken in the Middle East. The languages table contains information about languages and countries, but it does not tell you what region the countries belong to. You can build up a semi join by filtering the countries table by a particular region, and then using this to further filter the languages table.



In [23]:
%%sql
/* Select country code for countries in the Middle East */
SELECT code
FROM countries
WHERE region = 'Middle East'
LIMIT 3;

 * sqlite:///1st.db
Done.


code
ARE
ARM
AZE


Create a semi join out of the two queries you've written, which filters unique languages returned in the first query for only those languages spoken in the 'Middle East'.

In [26]:
%%sql
SELECT DISTINCT name
FROM languages
/* Add syntax to use bracketed subquery below as a filter */
WHERE code IN
    (SELECT code
    FROM countries
    WHERE region = 'Middle East')
ORDER BY name
LIMIT 3;

 * sqlite:///1st.db
Done.


name
Arabic
Aramaic
Armenian


# Subquery challenge

Suppose you're interested in analyzing inflation and unemployment rate for certain countries in 2015. You are not interested in countries with "Republic" or "Monarchy" as their form of government, but are interested in all other forms of government, such as emirate federations, socialist states, and commonwealths.

You will use the field gov_form to filter for these two conditions, which represents a country's form of government. You can review the different entries for gov_form in the countries table.

In [33]:
%%sql
SELECT code, inflation_rate, unemployment_rate
FROM economies
WHERE year = 2015 
  AND code NOT IN
/* Subquery returning country codes filtered on gov_form */
    (SELECT code
     FROM countries
     WHERE (gov_form LIKE '%Monarchy%' OR gov_form LIKE '%Republic%'))
ORDER BY inflation_rate;

 * sqlite:///1st.db
Done.


code,inflation_rate,unemployment_rate
AFG,-1.549,
CHE,-1.14,3.178
PRI,-0.751,12.0
ROU,-0.596,6.812
TLS,0.553,
MNE,1.204,
SRB,1.392,18.2
HKG,3.037,3.296
ARE,4.07,
MAC,4.564,1.825


# Challenge Program
Your task is to determine the top 10 capital cities in Europe and the Americas by city_perc, a metric you'll calculate. city_perc is a percentage that calculates the "proper" population in a city as a percentage of the total population in the wider metro area, as follows:

city_proper_pop / metroarea_pop * 100

In [34]:
%%sql
SELECT 
	name, 
    country_code, 
    city_proper_pop, 
    metroarea_pop,
    city_proper_pop / metroarea_pop * 100 AS city_perc
FROM cities
WHERE name IN
  (SELECT capital
   FROM countries
   WHERE (continent = 'Europe'
   OR continent LIKE '%America'))
-- Add filter condition such that metroarea_pop does not have null values
	  AND metroarea_pop IS NOT NULL
-- Sort and limit the result
ORDER BY city_perc DESC
LIMIT 10;

 * sqlite:///1st.db
Done.


name,country_code,city_proper_pop,metroarea_pop,city_perc
Lima,PER,8852000,10750000.0,82.34418604651162
Bogota,COL,7878783,9800000.0,80.39574489795919
Moscow,RUS,12197596,16170000.0,75.43349412492269
Vienna,AUT,1863881,2600000.0,71.68773076923077
Montevideo,URY,1305082,1947604.0,67.00961797162051
Caracas,VEN,1943901,2923959.0,66.48181455348724
Rome,ITA,2877215,4353775.0,66.08552348249508
Brasilia,BRA,2556149,3919864.0,65.21014504584853
London,GBR,8673713,13879757.0,62.491821722815466
Budapest,HUN,1759407,2927944.0,60.09018615110126
