# Cleaning Airport Departure Data

This notebook is used to clean data retrieved from the ITA National Travel and Tourism Office (https://travel.trade.gov/research/monthly/departures/).  Originally, it came as a mix of csv and xlx files, which can be found in the "raw dowloaded data" folder.  The data was combined together into a single csv for ease of use, manually (with actual direct copying from the website in the case of some incomplete files, namely 1999/2000).  This file is "airline_departures_all.csv" in the "airline data" folder.

### Dependencies

First, we will load a number of useful packages.

In [1]:
#load dependencies
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt

### Load File
Now, we can load in the file for viewing.

In [2]:
#create filepath
filepath = os.path.join('..', 'airline data', 'airport_departures_all.csv')
print(filepath)

../airline data/airport_departures_all.csv


In [3]:
#load data into data frame
raw_data = pd.read_csv(filepath, header = None)
raw_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,255,256,257,258,259,260,261,262,263,264
0,Europe,500.1,490.1,710.6,645.1,918.1,1005.0,984.0,894.9,885.8,...,1064429,1212898,1684659,2139814,1867812,1627172,1676165,1201375,928855,1068009
1,Caribbean,312.8,341.1,357.7,343.3,325.9,348.0,409.7,381.8,244.9,...,822751,783076,714117,865847,959510,718092,356693,473086,567498,754622
2,Asia,259.6,222.6,266.4,243.4,293.1,282.5,271.3,268.5,247.5,...,513034,495884,477954,525875,504326,419586,420099,504961,507576,537118
3,South America,117.9,116.7,118.0,98.7,104.5,129.9,145.9,137.9,102.6,...,165836,148540,154436,193673,182796,161435,119148,137624,157642,222500
4,Central America,85.6,86.3,100.9,78.1,75.5,98.1,107.7,90.4,58.7,...,310182,252392,232366,336150,332634,222045,147044,169255,216656,319918


### Add column headers
We will add column headers, which are region and month (Jan. 1996 - Dec. 2017)

In [4]:
#create a list of years (1996 - 2017)
years = np.arange(1996, 2018, 1)

#create a list of months
months = ["January", "February", "March", "April", "May", "June", "July", "August", 
          "September", "October", "November", "December"]

In [5]:
#create a headers list
headers = []
headers.append("Region")

for year in years:
    for month in months:
        headers.append(month + " " + str(year))
        
headers

['Region',
 'January 1996',
 'February 1996',
 'March 1996',
 'April 1996',
 'May 1996',
 'June 1996',
 'July 1996',
 'August 1996',
 'September 1996',
 'October 1996',
 'November 1996',
 'December 1996',
 'January 1997',
 'February 1997',
 'March 1997',
 'April 1997',
 'May 1997',
 'June 1997',
 'July 1997',
 'August 1997',
 'September 1997',
 'October 1997',
 'November 1997',
 'December 1997',
 'January 1998',
 'February 1998',
 'March 1998',
 'April 1998',
 'May 1998',
 'June 1998',
 'July 1998',
 'August 1998',
 'September 1998',
 'October 1998',
 'November 1998',
 'December 1998',
 'January 1999',
 'February 1999',
 'March 1999',
 'April 1999',
 'May 1999',
 'June 1999',
 'July 1999',
 'August 1999',
 'September 1999',
 'October 1999',
 'November 1999',
 'December 1999',
 'January 2000',
 'February 2000',
 'March 2000',
 'April 2000',
 'May 2000',
 'June 2000',
 'July 2000',
 'August 2000',
 'September 2000',
 'October 2000',
 'November 2000',
 'December 2000',
 'January 2001',
 '

In [6]:
#add column headers
raw_data.columns = headers
raw_data.head()

Unnamed: 0,Region,January 1996,February 1996,March 1996,April 1996,May 1996,June 1996,July 1996,August 1996,September 1996,...,March 2017,April 2017,May 2017,June 2017,July 2017,August 2017,September 2017,October 2017,November 2017,December 2017
0,Europe,500.1,490.1,710.6,645.1,918.1,1005.0,984.0,894.9,885.8,...,1064429,1212898,1684659,2139814,1867812,1627172,1676165,1201375,928855,1068009
1,Caribbean,312.8,341.1,357.7,343.3,325.9,348.0,409.7,381.8,244.9,...,822751,783076,714117,865847,959510,718092,356693,473086,567498,754622
2,Asia,259.6,222.6,266.4,243.4,293.1,282.5,271.3,268.5,247.5,...,513034,495884,477954,525875,504326,419586,420099,504961,507576,537118
3,South America,117.9,116.7,118.0,98.7,104.5,129.9,145.9,137.9,102.6,...,165836,148540,154436,193673,182796,161435,119148,137624,157642,222500
4,Central America,85.6,86.3,100.9,78.1,75.5,98.1,107.7,90.4,58.7,...,310182,252392,232366,336150,332634,222045,147044,169255,216656,319918


In [7]:
transpo_data = os.path.join('..','Transpo Data','Travel_Spending.csv')

transpo_df = pd.read_csv(transpo_data, header=None)

transpo_df.columns = headers
#transpo_df[transpo_df["Region"]=="NaN"].drop

clean_transpo = transpo_df.loc[0:2]

clean_transpo

Unnamed: 0,Region,January 1996,February 1996,March 1996,April 1996,May 1996,June 1996,July 1996,August 1996,September 1996,...,March 2017,April 2017,May 2017,June 2017,July 2017,August 2017,September 2017,October 2017,November 2017,December 2017
0,Total U.S. Travel and Tourism,"$7,066","$7,078","$7,685","$7,053","$8,005","$7,739","$6,913","$7,156","$7,025",...,"$16,137","$16,227","$16,229","$16,305","$16,488","$16,283","$16,575","$16,435","$16,604","$16,568"
1,Travel,"$5,442","$5,434","$5,938","$5,487","$6,218","$6,017","$5,309","$5,493","$5,392",...,"$12,789","$12,942","$12,924","$12,917","$13,012","$12,975","$13,105","$13,001","$13,153","$13,163"
2,Passenger fares,"$1,624","$1,644","$1,747","$1,566","$1,787","$1,722","$1,604","$1,663","$1,633",...,"$3,348","$3,285","$3,305","$3,388","$3,476","$3,308","$3,470","$3,434","$3,451","$3,405"


In [8]:
#save cleaned transportation data as a csv
clean_transpo_path = os.path.join('..', 'Transpo Data', 'Travel_Spending_cleaned.csv')

clean_transpo.to_csv(clean_transpo_path, index = False, encoding = 'utf-8')