Creates three dictionaries and saves them as binary files:
<ul>
    <li>Maps incorrect city name spellings to correct/standard spellings
    <li>Maps city names to corresponding IATA codes.
    <li>Maps IATA codes to corresponding city names.
    <li>Maps carrier codes to corresponding passenger load factors.
    <li>Maps carrier+aircraft_type combination to carrying capacity.
</ul>
<b>These dictionaries must be rebuilt everytime a new city code is to be added or city name spelling is to be corrected.

In [18]:
import csv
import pandas as pd
import pickle

In [19]:
'''

This dictionary must be updated/extended everytime an incorrect spelling for a city, already in the 
city_to_code and code_to_city dictionaries, is encountered in the raw data.

The entry in the dictionary is in the form {'incorrect_spelling' : 'correct_spelling'}.

'''

city_names = {'Delhi' : 'New Delhi', 'Bhubaneshwar' : 'Bhubaneswar', 'Trivandrum' : 'Thiruvananthapuram', 
              'Pondicherry' : 'Puducherry', 'Porbander' : 'Porbandar', 'Tirupathi' : 'Tirupati', 
              'Tuticorin' : 'Thoothukudi', 'Vizag' : 'Visakhapatnam', 'Cuddapah': 'Kadapa', 
              'Jalgoan' : 'Jalgaon', 'Rajamundry' : 'Rajahmundry', 'Aizawal' : 'Aizawl', 
              'Trichy' : 'Tiruchirappally', 'Bathinda' : 'Bhatinda', 'Passighat' : 'Pasighat'}

with open('./data/processed/city_spelling_corrected_dict.txt', 'w+b') as handle:
    pickle.dump(city_names, handle)

In [20]:
'''

These dictionaries must be updated/extended everytime a new city and its corresponding IATA code needs to be 
added to the airports-code.csv file. This need for addition will be identified when processing raw data.

The entry in the dictioaries are in the form {'city' : 'IATA_code'} and {'IATA-code' : 'city'} respectively

'''

city_codes = pd.read_csv('./data/raw/airports-code.csv')

city_to_codes = dict(zip(city_codes.city, city_codes.iata_code))
with open('./data/processed/city_to_codes_dict.txt', 'w+b') as handle:
    pickle.dump(city_to_codes, handle)
    
codes_to_city = dict(zip(city_codes.iata_code, city_codes.city))
with open('./data/processed/codes_to_city_dict.txt', 'w+b') as handle:
    pickle.dump(codes_to_city, handle)

In [21]:
'''

This dictionary must be updated/extended when the passenger load factors of individual carriers change or a 
new carrier is added. The need to include a carrier will be identified when a new carrier is 
encountered when processing raw data.

The load factor data is available on the DGCA website https://dgca.gov.in/digigov-portal/ and is updated
monthly. The current data is for Jan 2020. DGCA does not separately report data for Alliance Air so assumed 
the same load factor as its parent i.e. Air India.

The entry in the dictionary is in the form {'carrier' : 'passenger_load_factor_in_decimels'}

'''

carrier_plf = {'IND' : 0.878, 'GOW' : 0.887, 'AAS' : 0.780, 'AI' : 0.780, 'I5' : 0.793, 'TRJ' : 0.798, 
               'OG' : 0.797, 'UK' : 0.811, 'SG' : 0.915}

with open('./data/processed/carrier_plf_dict.txt', 'w+b') as handle:
    pickle.dump(carrier_plf, handle)

In [22]:
'''

This dictionary must be updated/extended everytime a new carrier starts service or a new aircraft type is 
added by an existing carrier to their fleet. This need for addition will be identified when processing 
raw data.

The entry in the dictionary is in the form {'aircraft_type' : 'capacity'}

'''

craft_capacity = pd.read_csv('./data/raw/aircraft_capacity.csv')
craft_capacity = dict(zip(craft_capacity.operator + '-' + craft_capacity.aircraft, craft_capacity.capacity))
with open('./data/processed/carrier_aircraft_combo_capacity_dict.txt', 'w+b') as handle:
    pickle.dump(craft_capacity, handle)