# Data Engineering Capstone Project

Data Engineering Nanodegree conclusion project.

## Immigration in the US



In [4]:
import numpy as np
import pandas as pd
from datetime import datetime
import psycopg2

The following project consists in building a database model for immigration data in the United States of America. Analyses on this data can be useful for both government and business decision making.

#### Datasets

- I94 Immigration Data: This data comes from the US National Tourism and Trade Office, but for this notebook only a small sample will be used. [This is where the data comes from](https://travel.trade.gov/research/reports/i94/historical/2016.html).

- World Temperature Data: This dataset came from Kaggle. [You can read more about it here](https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data).

- U.S. City Demographic Data: This data comes from OpenSoft. [You can read more about it here](https://public.opendatasoft.com/explore/dataset/us-cities-demographics/export/).

- Airport Code Table: This is a simple table of airport codes and corresponding cities. [It comes from here](https://datahub.io/core/airport-codes#data).


## Exploring the Data




#### I94 Immigration

In [5]:
immigration = pd.read_csv('immigration_data_sample.csv')
print(immigration.dtypes)
immigration.head()

Unnamed: 0      int64
cicid         float64
i94yr         float64
i94mon        float64
i94cit        float64
i94res        float64
i94port        object
arrdate       float64
i94mode       float64
i94addr        object
depdate       float64
i94bir        float64
i94visa       float64
count         float64
dtadfile        int64
visapost       object
occup          object
entdepa        object
entdepd        object
entdepu       float64
matflag        object
biryear       float64
dtaddto        object
gender         object
insnum        float64
airline        object
admnum        float64
fltno          object
visatype       object
dtype: object


Unnamed: 0.1,Unnamed: 0,cicid,i94yr,i94mon,i94cit,i94res,i94port,arrdate,i94mode,i94addr,...,entdepu,matflag,biryear,dtaddto,gender,insnum,airline,admnum,fltno,visatype
0,2027561,4084316.0,2016.0,4.0,209.0,209.0,HHW,20566.0,1.0,HI,...,,M,1955.0,7202016,F,,JL,56582670000.0,00782,WT
1,2171295,4422636.0,2016.0,4.0,582.0,582.0,MCA,20567.0,1.0,TX,...,,M,1990.0,10222016,M,,*GA,94362000000.0,XBLNG,B2
2,589494,1195600.0,2016.0,4.0,148.0,112.0,OGG,20551.0,1.0,FL,...,,M,1940.0,7052016,M,,LH,55780470000.0,00464,WT
3,2631158,5291768.0,2016.0,4.0,297.0,297.0,LOS,20572.0,1.0,CA,...,,M,1991.0,10272016,M,,QR,94789700000.0,00739,B2
4,3032257,985523.0,2016.0,4.0,111.0,111.0,CHM,20550.0,3.0,NY,...,,M,1997.0,7042016,F,,,42322570000.0,LAND,WT


#### World temperature by city

In [6]:
world_temperature = pd.read_csv('GlobalLandTemperaturesByCity.csv')
print(world_temperature.dtypes)
world_temperature.head()

dt                                object
AverageTemperature               float64
AverageTemperatureUncertainty    float64
City                              object
Country                           object
Latitude                          object
Longitude                         object
dtype: object


Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1743-11-01,6.068,1.737,Århus,Denmark,57.05N,10.33E
1,1743-12-01,,,Århus,Denmark,57.05N,10.33E
2,1744-01-01,,,Århus,Denmark,57.05N,10.33E
3,1744-02-01,,,Århus,Denmark,57.05N,10.33E
4,1744-03-01,,,Århus,Denmark,57.05N,10.33E


#### us cities

In [7]:
us_cities = pd.read_csv('us-cities-demographics.csv', sep=';')
print(us_cities.dtypes)
us_cities.head()

City                       object
State                      object
Median Age                float64
Male Population           float64
Female Population         float64
Total Population            int64
Number of Veterans        float64
Foreign-born              float64
Average Household Size    float64
State Code                 object
Race                       object
Count                       int64
dtype: object


Unnamed: 0,City,State,Median Age,Male Population,Female Population,Total Population,Number of Veterans,Foreign-born,Average Household Size,State Code,Race,Count
0,Silver Spring,Maryland,33.8,40601.0,41862.0,82463,1562.0,30908.0,2.6,MD,Hispanic or Latino,25924
1,Quincy,Massachusetts,41.0,44129.0,49500.0,93629,4147.0,32935.0,2.39,MA,White,58723
2,Hoover,Alabama,38.5,38040.0,46799.0,84839,4819.0,8229.0,2.58,AL,Asian,4759
3,Rancho Cucamonga,California,34.5,88127.0,87105.0,175232,5821.0,33878.0,3.18,CA,Black or African-American,24437
4,Newark,New Jersey,34.6,138040.0,143873.0,281913,5829.0,86253.0,2.73,NJ,White,76402


#### airports

In [8]:
airports = pd.read_csv('airport-codes_csv.csv')
print(airports.dtypes)
airports.head()

ident            object
type             object
name             object
elevation_ft    float64
continent        object
iso_country      object
iso_region       object
municipality     object
gps_code         object
iata_code        object
local_code       object
coordinates      object
dtype: object


Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates
0,00A,heliport,Total Rf Heliport,11.0,,US,US-PA,Bensalem,00A,,00A,"-74.93360137939453, 40.07080078125"
1,00AA,small_airport,Aero B Ranch Airport,3435.0,,US,US-KS,Leoti,00AA,,00AA,"-101.473911, 38.704022"
2,00AK,small_airport,Lowell Field,450.0,,US,US-AK,Anchor Point,00AK,,00AK,"-151.695999146, 59.94919968"
3,00AL,small_airport,Epps Airpark,820.0,,US,US-AL,Harvest,00AL,,00AL,"-86.77030181884766, 34.86479949951172"
4,00AR,closed,Newport Hospital & Clinic Heliport,237.0,,US,US-AR,Newport,,,,"-91.254898, 35.6087"


### Data Quality Checks

The data quality checks include:
 * Integrity constraints on the relational database.
 * try/except code blocks to double check on duplicates.

### Cleaning Steps

- immigration: Multiple fields need changing. For example, two columns are filled with codes that correspond to location names - i94cit e i94res -, we will use a JSON file to join this data, which can also be made by creating another table in the schema.
- world_temperature: The coordinates are in a differente format if comparing with the airport data frame, but those will not relate much, so we can ignore it.
- us_cities: Nothing special to be changed, some columns could be INT instead of DOUBLE or FLOAT, but pandas did not permit this transformation without taking some action on NA values(like removing them or filling with zeros), which is terrible. I'd rather model the database using FLOAT data type than restricting analysts' options on handling missing data.
- airports: Data is not in the first normal form, so lets transform it by writing coordinates in separate columns. 

In [9]:
# Data Cleaning

## immigration
#### getting location codes
locations = pd.read_json('locations.json', typ='series')
locations = pd.DataFrame(locations)
locations['id'] = locations.index
locations = locations.rename(columns = {'id': 'location_id', 0 : 'location'})

#### i94res
immigration['i94res'] = immigration['i94res'].astype(int)
df_immigration = pd.merge(locations, immigration,how='inner',left_on=['location_id'],right_on=['i94res'])
df_immigration = df_immigration.drop(columns=['i94res', 'location_id']).rename(columns = {'location': 'i94res'})

#### i94cit
immigration['i94cit'] = immigration['i94cit'].astype(int)
df_immigration = pd.merge(locations, df_immigration ,how='inner',left_on=['location_id'],right_on=['i94cit'])
df_immigration = df_immigration.drop(columns=['i94cit', 'location_id']).rename(columns = {'location': 'i94cit'})

#### convert float to int
df_immigration['i94yr'] = df_immigration['i94yr'].astype(int)
df_immigration['i94mon'] = df_immigration['i94mon'].astype(int)
df_immigration['i94mode'] = df_immigration['i94mode'].astype(int)
df_immigration['biryear'] = df_immigration['biryear'].astype(int)
df_immigration['cicid'] = df_immigration['cicid'].astype(int)
df_immigration['i94bir'] = df_immigration['i94bir'].astype(int)
df_immigration['i94visa'] = df_immigration['i94visa'].astype(int)
df_immigration['count'] = df_immigration['count'].astype(int)

#### converting dates
df_immigration['arrdate'] = pd.to_timedelta(df_immigration['arrdate'], unit='d') + pd.datetime(1960, 1, 1)
df_immigration['depdate'] = pd.to_timedelta(df_immigration['depdate'], unit='d') + pd.datetime(1960, 1, 1)

pd.set_option('display.max_columns', 500)

print(df_immigration.dtypes)
df_immigration.head()

i94cit                object
i94res                object
Unnamed: 0             int64
cicid                  int64
i94yr                  int64
i94mon                 int64
i94port               object
arrdate       datetime64[ns]
i94mode                int64
i94addr               object
depdate       datetime64[ns]
i94bir                 int64
i94visa                int64
count                  int64
dtadfile               int64
visapost              object
occup                 object
entdepa               object
entdepd               object
entdepu              float64
matflag               object
biryear                int64
dtaddto               object
gender                object
insnum               float64
airline               object
admnum               float64
fltno                 object
visatype              object
dtype: object


Unnamed: 0.1,i94cit,i94res,Unnamed: 0,cicid,i94yr,i94mon,i94port,arrdate,i94mode,i94addr,depdate,i94bir,i94visa,count,dtadfile,visapost,occup,entdepa,entdepd,entdepu,matflag,biryear,dtaddto,gender,insnum,airline,admnum,fltno,visatype
0,"MEXICO Air Sea, and Not Reported (I-94, no lan...","MEXICO Air Sea, and Not Reported (I-94, no lan...",2171295,4422636,2016,4,MCA,2016-04-23,1,TX,2016-04-24,26,2,1,20160423,MTR,,G,R,,M,1990,10222016,M,,*GA,94362000000.0,XBLNG,B2
1,"MEXICO Air Sea, and Not Reported (I-94, no lan...","MEXICO Air Sea, and Not Reported (I-94, no lan...",1387092,2826530,2016,4,SNJ,2016-04-15,1,CA,2016-04-17,42,2,1,20160415,MEX,,G,O,,M,1974,10142016,F,,Y4,93617880000.0,00930,B2
2,"MEXICO Air Sea, and Not Reported (I-94, no lan...","MEXICO Air Sea, and Not Reported (I-94, no lan...",2888997,5835717,2016,4,DET,2016-04-30,1,FL,2016-05-07,35,2,1,20160430,MEX,,G,O,,M,1981,10292016,F,,AA,94957650000.0,01498,B2
3,"MEXICO Air Sea, and Not Reported (I-94, no lan...","MEXICO Air Sea, and Not Reported (I-94, no lan...",2360660,4805034,2016,4,HOU,2016-04-25,1,PA,2016-04-29,40,1,1,20160425,MEX,,G,O,,M,1976,10242016,F,,UA,94493820000.0,01085,B1
4,"MEXICO Air Sea, and Not Reported (I-94, no lan...","MEXICO Air Sea, and Not Reported (I-94, no lan...",1773904,3599863,2016,4,HOU,2016-04-19,1,TX,2016-07-04,42,3,1,20160419,MEX,,G,Q,,M,1974,D/S,F,,WN,93991160000.0,02831,F1


In [10]:
## airports
airports['latitude'] = airports['coordinates'].apply(lambda x: float(x.split(',')[0]))
airports['longitude'] = airports['coordinates'].apply(lambda x: float(x.split(',')[1]))
airports = airports.drop(columns=['coordinates'])
airports.head()

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,latitude,longitude
0,00A,heliport,Total Rf Heliport,11.0,,US,US-PA,Bensalem,00A,,00A,-74.933601,40.070801
1,00AA,small_airport,Aero B Ranch Airport,3435.0,,US,US-KS,Leoti,00AA,,00AA,-101.473911,38.704022
2,00AK,small_airport,Lowell Field,450.0,,US,US-AK,Anchor Point,00AK,,00AK,-151.695999,59.9492
3,00AL,small_airport,Epps Airpark,820.0,,US,US-AL,Harvest,00AL,,00AL,-86.770302,34.864799
4,00AR,closed,Newport Hospital & Clinic Heliport,237.0,,US,US-AR,Newport,,,,-91.254898,35.6087


# Data Model

## Schema

Event though they have some limitations, relational databases are consolidated and well suited for Big Data situations. For many of the upcoming analyses in this notebook there is no need to resort to Cloud solutions for Data Warehousing such as AWS Redshift, so I opted to stick to the basics and at the end of this document you can find some scenarios(where reasonable solutions are adressed) that require more speed and scalability. PostgreSQL is the RDBMS of choice as it integrates easily with Python.

The schema is described as follows:


## fact table
### immigrants

- id (PRIMARY KEY)
- country_cit 
- country_res
- cicid
- year
- month
- age(i94bir)
- birth_year
- gender
- transport_type(i94mode)
- state(i94addr)
- count
- record_date(DTADFILE)
- occup
- arrival_flag(entdepa)
- departure_flag(entdepd)          
- update_flag(entdepu)       
- match_flag
- airport_code
- admnum
- flight         
- visacode
- visatype
- visapost


## dimension tables

### admissions

- id (PRIMARY KEY) FOREIGN KEY (immigrations.admnum))
- ins_num
- date(dtaddto)

### flights

- id (PRIMARY KEY) FOREIGN KEY (immigrations.flight))
- airline
- arrdate
- depdate


### airports

- id (PRIMARY KEY)  (FOREIGN KEY (immigrations.airport_code))     
- type         
- name 
- elevation_ft 
- continent
- iso_country
- iso_region
- municipality
- gps_code
- iata_code
- local_code
- latitude
- longitude

### locations

- id (PRIMARY KEY)
- name

### demographics


- state_code (PRIMARY KEY) (FOREIGN KEY (immigrations.state))
- city 
- state  
- median_age  
- male_population
- female_population
- total_population  
- number_of_veterans
- foreign_born
- average_household_size
- race               

### transports

- id (PRIMARY KEY) (FOREIGN KEY (immigrations.transport_type))
- mode (1 = 'Air', 2 = 'Sea', 3 = 'Land', 9 = 'Not reported'-NA included)

### visa

- id (PRIMARY KEY) (FOREIGN KEY (immigrations.visatype))
- category (1 = Business, 2 = Pleasure, 3 = Student)


------------- IMAGE OF SCHEMA -----------------------------


In [11]:
# CREATE DATABASE
try:
    default_db = 'studentdb'
    conn = psycopg2.connect(f"host=127.0.0.1 dbname={default_db} user=student password=student")
    conn.set_session(autocommit=True)
    cur = conn.cursor()
except psycopg2.Error as e:
    print(f"Connection to {default_db} failed")
    print(e)

try:
    db = 'immigration_us'
    cur.execute(f"DROP DATABASE IF EXISTS {db} ")
    cur.execute(f"CREATE DATABASE {db} WITH ENCODING 'utf8' TEMPLATE template0")
    conn.close()
except psycopg2.Error as e:
    print(f"{db} creation failed")
    print(e)

# CONNECT TO DATABASE
try:
    conn = psycopg2.connect(f"host=127.0.0.1 dbname={db} user=student password=student")
    cur = conn.cursor()
except psycopg2.Error as e:
    print(f"Connection to {db} failed")
    print(e)

In [16]:
# DROP TABLES
immigrants_table_drop = "DROP TABLE IF EXISTS immigrants"
admissions_table_drop = "DROP TABLE IF EXISTS admissions"
flights_table_drop = "DROP TABLE IF EXISTS flights"
airports_table_drop = "DROP TABLE IF EXISTS airports"
transports_table_drop = "DROP TABLE IF EXISTS transports"
demographics_table_drop = "DROP TABLE IF EXISTS demographics"
locations_table_drop = "DROP TABLE IF EXISTS locations"
visa_table_drop = "DROP TABLE IF EXISTS visa"

# CREATE TABLES
immigrants_table_create = ("""
    CREATE TABLE IF NOT EXISTS immigrants(   
    id             INT        PRIMARY KEY,
    cicid          INT        NOT NULL,
    country_cit    TEXT,
    country_res    TEXT,
    year           INT,
    month          INT,
    age            INT,
    birth_year     INT,
    gender         VARCHAR(1),
    transport_type INT,
    state          TEXT,
    count          INT,
    record_date    TEXT       NOT NULL,
    occup          TEXT,
    arrival_flag   VARCHAR(1),
    departure_flag VARCHAR(1),
    update_flag    VARCHAR(1),       
    match_flag     VARCHAR(1),
    airport_code   TEXT,
    admnum         INT,
    flight         INT,
    visacode       INT,
    visatype       TEXT,
    visapost       TEXT
    );
""")

admissions_table_create = ("""
    CREATE TABLE IF NOT EXISTS admissions(
    id      INT       PRIMARY KEY,
    ins_num INT,
    date    TEXT
    );
""")

flights_table_create = ("""
    CREATE TABLE IF NOT EXISTS flights(
    id      INT PRIMARY KEY,
    airline TEXT,
    arrdate TEXT, 
    depdate TEXT
    );
""")

airports_table_create = ("""
    CREATE TABLE IF NOT EXISTS airports(
    id            INT   PRIMARY KEY ,
    type          TEXT,
    name          TEXT,
    elevation_ft  FLOAT,
    continent     TEXT,
    iso_country   TEXT,
    iso_region    TEXT,
    municipality  TEXT,
    gps_code      TEXT,
    iata_code     TEXT,
    local_code    TEXT,
    latitude      TEXT,
    longitude     TEXT
    );
""")


locations_table_create = ("""
    CREATE TABLE IF NOT EXISTS locations(
    id         TEXT   PRIMARY KEY,
    name       TEXT
    );
""")


demographics_table_create = ("""
    CREATE TABLE IF NOT EXISTS locations(
    state_code             TEXT   PRIMARY KEY,
    city                   TEXT,
    state                  TEXT,
    median_age             FLOAT,
    male_population        INT,
    female_population      INT,
    total_population       INT,
    number_of_veterans     INT,
    foreign_born           INT,
    average_household_size FLOAT,
    race                   TEXT
    );
""")


transports_table_create = ("""
    CREATE TABLE IF NOT EXISTS transports(
    id    INT  PRIMARY KEY,
    mode  TEXT
    );
""")

visa_table_create = ("""
    CREATE TABLE IF NOT EXISTS visa(
    id        INT PRIMARY KEY,
    category  TEXT
    );
""")


create_table_queries = [immigrants_table_create, admissions_table_create, flights_table_create, 
                        airports_table_create, demographics_table_create, locations_table_create,
                        transports_table_create, visa_table_create]

drop_table_queries = [immigrants_table_drop, admissions_table_drop, flights_table_drop, 
                      airports_table_drop, demographics_table_drop, locations_table_drop,
                      transports_table_drop, visa_table_drop]

### Creating tables

In [17]:
def drop_tables(cur, conn):
    """
    drop_tables(cur, conn)
    Drops each table using the queries in `drop_table_queries` list.
    """
    for query in drop_table_queries:
        try:
            cur.execute(query)
        except psycopg2.Error as e:
            print(f"Error: Dropping table {query}")
            print(e)
        conn.commit()

def create_tables(cur, conn):
    """
    create_tables(cur, conn
    Creates each table using the queries in `create_table_queries` list.
    """
    for query in create_table_queries:
        try:
            cur.execute(query)
            print(f"{query} created successfully")
        except psycopg2.Error as e:
            print("Error: Issue creating table")
            print(e)
        conn.commit()

drop_tables(cur, conn)
create_tables(cur, conn)


    CREATE TABLE IF NOT EXISTS immigrants(   
    id             INT        PRIMARY KEY,
    cicid          INT        NOT NULL,
    country_cit    TEXT,
    country_res    TEXT,
    year           INT,
    month          INT,
    age            INT,
    birth_year     INT,
    gender         VARCHAR(1),
    transport_type INT,
    state          TEXT,
    count          INT,
    record_date    TEXT       NOT NULL,
    occup          TEXT,
    arrival_flag   VARCHAR(1),
    departure_flag VARCHAR(1),
    update_flag    VARCHAR(1),       
    match_flag     VARCHAR(1),
    airport_code   TEXT,
    admnum         INT,
    flight         INT,
    visacode       INT,
    visatype       TEXT,
    visapost       TEXT
    );
 created successfully

    CREATE TABLE IF NOT EXISTS admissions(
    id      INT       PRIMARY KEY,
    ins_num INT,
    date    TEXT
    );
 created successfully

    CREATE TABLE IF NOT EXISTS flights(
    id      INT PRIMARY KEY,
    airline TEXT,
    arrdate TEXT, 


#### ETL pipeline


Extracting files content, checking for inconsistencies across datasets, and finally inserting data into tables described by the Schema.

In [10]:
# INSERT STATEMENTS

# 24 columns
immigrants_table_insert = ("""
    INSERT INTO immigrants (id, cicid, country_cit, country_res,          \
    year, month, age, birth_year, gender, transport_type, state, count,   \
    record_date, occup, arrival_flag, departure_flag, update_flag,        \  
    match_flag, airport_code, admnum, flight, visacode, visatype, visapost)
    VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
""")

admissions_table_insert = ("""
    INSERT INTO admissions (id, ins_num, date)
    VALUES(%s, %s, %s)
    ON CONFLICT (id)
    DO UPDATE
    SET level = EXCLUDED.level;
""")

flights_table_insert = ("""
    INSERT INTO songs (id, airline, arrdate, depdate)
    VALUES(%s, %s, %s, %s)
    ON CONFLICT (id)
    DO NOTHING
""")

airports_table_insert = ("""
    INSERT INTO airports (id, type, name, elevation_ft, continent, \
    iso_country, iso_region, municipality, gps_code, iata_code,     \
    local_code, latitude, longitude)
    VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    ON CONFLICT (id)
    DO NOTHING
""")


locations_table_insert = ("""
    INSERT INTO locations (id, name)
    VALUES(%s, %s)
    ON CONFLICT (id)
    DO NOTHING
""")

demographics_table_insert = ("""
    INSERT INTO locations (state_code, city, state, median_age, male_population, \
    female_population, total_population, number_of_veterans, foreign_born,        \
    average_household_size, race)
    VALUES(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    ON CONFLICT (state_code)
    DO NOTHING
""")

transports_table_insert = ("""
    INSERT INTO transports (id, mode)
    VALUES(%s, %s)
    ON CONFLICT (id)
    DO NOTHING
""")

visa_table_insert = ("""
    INSERT INTO transports (id, category)
    VALUES(%s, %s)
    ON CONFLICT (id)
    DO NOTHING
""")

## Running Pipelines to Model the Data 

### Data analysis

Some analysis on immigration data could be:

- Find patterns of gender and/or age(differences in visa type or airline chosen);
- Rank airlines and routes by number of immigrants.
- Track flow of passengers flying to great urban centers.
- Find patterns of immigration based on seasons of the year.
- Check if exists some relation between a city thermal amplitude and emigration.

In [None]:
# Data analysis
# %load_ext sql
# %sql postgresql://student:student@127.0.0.1/immigrationdb
#%sql SELECT * FROM [...] LIMIT 5;

## Complete Project Write Up

A data model was created to store relational data on immigration in the USA. PostgreSQL was the choice of use for its easy integration with Python. As mentioned earlier, using a relational database may be enough for most applications, but the cost of its limitations, such as having to draw a complex schema beforehand, can slow down the development process. Moving to the cloud is always an option and some situations where this is reasonable are described below.

Possible decisions for alternate scenarios:

- "Data increases by 100x": Instead of using a structured database in disk, a Data Lake could be used, as the project especifies that various types and sources of data(structured or unstructured) can be explored. It is not possible to tell upfront which will be useful. A possible approach is to launch an EMR Cluster and design the schema on read.

- "The data populates a dashboard that must be updated on a daily basis by 7am every day": Schedule tasks using Apache Airflow.

- "The database needed to be accessed by 100+ people": Redshift Clusters can handle the traffic without changing the RDBMS, but it's important to monitor AWS billing to avoid unnecessary costs.

In [None]:
# DROP TABLES
immigrants_table_drop = "DROP TABLE IF EXISTS immigrants"
admissions_table_drop = "DROP TABLE IF EXISTS admissions"
flights_table_drop = "DROP TABLE IF EXISTS flights"
airports_table_drop = "DROP TABLE IF EXISTS airports"
transports_table_drop = "DROP TABLE IF EXISTS transports"
demographics_table_drop = "DROP TABLE IF EXISTS demographics"
locations_table_drop = "DROP TABLE IF EXISTS locations"
visa_table_drop = "DROP TABLE IF EXISTS visa"