## Exercise

Write a data pipeline that ingests the source data into a **fact** called `air_travel_passengers` with two supporting **dimensions** as `airlines` and `airports`.

To load the dimension tables, lookup additional columns from the supporting files: `global_airlines.csv` and `global_airports.csv`

<br>

Your data pipeline should look similar to:

<img src="./imgs/dm_air_travel_exercise.jpg" alt="Air Travel Pipeline" width="700" />

<br>

Your pipeline must meet the following requirements:

1. _airlines_ dimension:
    - Looking up additional airline columns such as iata and icao codes, callsign, and country
    - Generate a new airline_id by **hashing** the airline name
1. _airports_ dimension:
    - Using the airport iata code, look up additional column such as: airport lat/lon, icao code, and timezone information
    - Set the iata code as the airport_id column
1. _air\_travel\_passengers_ fact:
    - Look up both airline_id and airport_id from their dimension tables
    - Add a new column called _report\_date_ set to the 1st of the report month/year (as date data type)
    - Create a fact_id by hashing a **composite key** of: airline name, src, dest, year, and month

<br>

### Data Model

Using draw.io, create a data model of your target tables. You must show at least three final tables: `air_travel_passengers`, `airlines`, and `airports`

See data model below:

<img src="./imgs/us_monthly_air_passangers.drawio.png" alt="Air Travel Pipeline" width="400" />


### Data Pipeline

Develope your pipeline code. We recommend breaking down the pipeline into the following sections (code cells):

In [None]:
import os
import sys
import pandas as pd
import logging
from google.cloud import bigquery
from hashlib import md5
from typing import List

# **** SETUP ****

# change to match your filesystem
DATA_DIR = "../data/air_travel/"
DEFAULT_RECEIPTS_FILE = os.path.join(DATA_DIR, "us_monthly_air_passengers_sample.csv")
# change to match your gcloud project 
PROJECT_NAME = "deb-01-371820"
DATASET_NAME = "us_monthly_air_passengers"

# **** TABLE SCHEMAS ****

TABLE_METADATA = {
    'airlines': {
        'table_name': 'airlines',
        'schema': [
            # indexes are written if only named in the schema
            bigquery.SchemaField('airline_id', 'string', mode='REQUIRED'),
            bigquery.SchemaField('carrier_name', 'string', mode='REQUIRED'),
            bigquery.SchemaField('iata', 'string', mode='NULLABLE'),
            bigquery.SchemaField('icao', 'string', mode='NULLABLE'),
            bigquery.SchemaField('callsign', 'string', mode='NULLABLE'),
            bigquery.SchemaField('country', 'string', mode='NULLABLE'),
        ],
    },
    'airports': {
        'table_name': 'airports',
        'schema': [
            # indexes are written if only named in the schema
            bigquery.SchemaField('airport_id', 'string', mode='REQUIRED'),
            bigquery.SchemaField('name', 'string', mode='NULLABLE'),
            bigquery.SchemaField('city', 'string', mode='NULLABLE'),
            bigquery.SchemaField('country', 'string', mode='NULLABLE'),
            bigquery.SchemaField('icao', 'string', mode='NULLABLE'),
            bigquery.SchemaField('latitude', 'float', mode='NULLABLE'),
            bigquery.SchemaField('longitude', 'float', mode='NULLABLE'),
        ],
    },
}

filename = DEFAULT_RECEIPTS_FILE

: 