# Project Proposal (Matt)

The ETL Project involved performing an Extract-Transform-Load (ETL) process on airline flight performance data and customer tweet data from February 2015. The members of the group are Sedra Kurdi, Myles Bridges, Natalie Myers, and Matthew Kennedy. The datasets chosen were found at https://www.kaggle.com/usdot/flight-delays and https://www.kaggle.com/crowdflower/twitter-airline-sentiment, and were in CSV and SQLite format respectively. The approach was to use the SQLAlchemy and Pandas modules in Python to extract and transform the data. SQLAlchemy was then used to load the transformed data into a PostgreSQL database.

# Project Code

In [1]:
import numpy as np
import pandas as pd
import datetime as dt
import sqlalchemy
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine, inspect, func

## Data Extraction

### Flight Data (Myles & Sedra & Matt)

In [2]:
#Set File Paths for Flight Data CSV Files
flight_path = "Flight_Data/flights.csv"
airline_path = "Flight_Data/airlines.csv"
airport_path = "Flight_Data/airports.csv"

In [3]:
#Import Flight Information CSV to Pandas Data Frame
flights = pd.read_csv(flight_path)

#Display Flight Information Data Frame
flights

Unnamed: 0,YEAR,MONTH,DAY,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,...,ARRIVAL_TIME,ARRIVAL_DELAY,DIVERTED,CANCELLED,CANCELLATION_REASON,AIR_SYSTEM_DELAY,SECURITY_DELAY,AIRLINE_DELAY,LATE_AIRCRAFT_DELAY,WEATHER_DELAY
0,2015,2,1,7,AA,2400,N3JKAA,LAX,DFW,5,...,452.0,-2.0,0,0,,,,,,
1,2015,2,1,7,AS,98,N794AS,ANC,SEA,5,...,501.0,32.0,0,0,,3.0,0.0,29.0,0.0,0.0
2,2015,2,1,7,AA,258,N3FEAA,LAX,MIA,20,...,849.0,45.0,0,0,,45.0,0.0,0.0,0.0,0.0
3,2015,2,1,7,DL,806,N962DN,SFO,MSP,20,...,548.0,-12.0,0,0,,,,,,
4,2015,2,1,7,NK,612,N604NK,LAS,MSP,25,...,515.0,-11.0,0,0,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
429186,2015,2,28,6,B6,1503,N593JB,JFK,SJU,2359,...,512.0,32.0,0,0,,0.0,0.0,3.0,29.0,0.0
429187,2015,2,28,6,US,1770,N801AW,SLC,PHL,2359,...,607.0,10.0,0,0,,,,,,
429188,2015,2,28,6,US,467,N601AW,PHX,MSP,2359,...,331.0,-28.0,0,0,,,,,,
429189,2015,2,28,6,F9,300,N223FR,DEN,TPA,2359,...,504.0,-7.0,0,0,,,,,,


In [4]:
#Import Airport Information CSV to Pandas Data Frame
airports = pd.read_csv(airport_path)

#Display Airport Information Data Frame
airports

Unnamed: 0,IATA_CODE,AIRPORT,CITY,STATE,COUNTRY,LATITUDE,LONGITUDE
0,ABE,Lehigh Valley International Airport,Allentown,PA,USA,40.65236,-75.44040
1,ABI,Abilene Regional Airport,Abilene,TX,USA,32.41132,-99.68190
2,ABQ,Albuquerque International Sunport,Albuquerque,NM,USA,35.04022,-106.60919
3,ABR,Aberdeen Regional Airport,Aberdeen,SD,USA,45.44906,-98.42183
4,ABY,Southwest Georgia Regional Airport,Albany,GA,USA,31.53552,-84.19447
...,...,...,...,...,...,...,...
317,WRG,Wrangell Airport,Wrangell,AK,USA,56.48433,-132.36982
318,WYS,Westerly State Airport,West Yellowstone,MT,USA,44.68840,-111.11764
319,XNA,Northwest Arkansas Regional Airport,Fayetteville/Springdale/Rogers,AR,USA,36.28187,-94.30681
320,YAK,Yakutat Airport,Yakutat,AK,USA,59.50336,-139.66023


In [5]:
#Import Airline Information CSV to Pandas Data Frame
airlines = pd.read_csv(airline_path)

#Display Airline Information Data Frame
airlines

Unnamed: 0,IATA_CODE,AIRLINE
0,UA,United Air Lines Inc.
1,AA,American Airlines Inc.
2,US,US Airways Inc.
3,F9,Frontier Airlines Inc.
4,B6,JetBlue Airways
5,OO,Skywest Airlines Inc.
6,AS,Alaska Airlines Inc.
7,NK,Spirit Air Lines
8,WN,Southwest Airlines Co.
9,DL,Delta Air Lines Inc.


### Tweet Data (Natalie)

In [6]:
sqlite_engine = create_engine("sqlite:///Tweet_Data/database.sqlite")

In [7]:
inspector = inspect(sqlite_engine)
inspector.get_table_names()

['Tweets']

In [8]:
data_tweets = pd.read_sql_table('Tweets',sqlite_engine)

data_tweets

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,567588278875213824,neutral,1.0000,,,Delta,,JetBlueNews,,0,@JetBlue's new CEO seeks the right balance to ...,,2015-02-16 23:36:05 -0800,USA,Sydney
1,567590027375702016,negative,1.0000,Can't Tell,0.6503,Delta,,nesi_1992,,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,,2015-02-16 23:43:02 -0800,undecided,Pacific Time (US & Canada)
2,567591480085463040,negative,1.0000,Late Flight,0.346,United,,CPoutloud,,0,@united yes. We waited in line for almost an h...,,2015-02-16 23:48:48 -0800,"Washington, DC",
3,567592368451248130,negative,1.0000,Late Flight,1,United,,brenduch,,0,@united the we got into the gate at IAH on tim...,,2015-02-16 23:52:20 -0800,,Buenos Aires
4,567594449874587648,negative,1.0000,Customer Service Issue,0.3451,Southwest,,VahidESQ,,0,@SouthwestAir its cool that my bags take a bit...,,2015-02-17 00:00:36 -0800,"Los Angeles, CA",Pacific Time (US & Canada)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14480,570309308937842688,neutral,0.6869,,,Delta,,Oneladyyouadore,,0,@JetBlue I hope so because I fly very often an...,,2015-02-24 11:48:29 -0800,Georgia,Quito
14481,570309340952993796,neutral,1.0000,,,US Airways,,DebbiMcGinnis,,0,@USAirways is a DM possible if you aren't foll...,,2015-02-24 11:48:37 -0800,Missourah,Hawaii
14482,570309345281486848,positive,0.6469,,,Delta,,jaxbra,,0,@JetBlue Yesterday on my way from EWR to FLL j...,,2015-02-24 11:48:38 -0800,"east brunswick, nj",Atlantic Time (Canada)
14483,570310144459972608,negative,1.0000,Customer Service Issue,1,US Airways,,GAKotsch,,0,@USAirways and when will one of these agents b...,,2015-02-24 11:51:48 -0800,,Atlantic Time (Canada)


## Data Transformation

### Flight Data (Myles & Sedra & Matt)

In [9]:
#Drop Latitude & Longitude Columns from Airport Information Data Frame
airports_df = airports.drop(['LATITUDE', 'LONGITUDE'], axis = 1)

#Display Airport Information Data Frame
airports_df

Unnamed: 0,IATA_CODE,AIRPORT,CITY,STATE,COUNTRY
0,ABE,Lehigh Valley International Airport,Allentown,PA,USA
1,ABI,Abilene Regional Airport,Abilene,TX,USA
2,ABQ,Albuquerque International Sunport,Albuquerque,NM,USA
3,ABR,Aberdeen Regional Airport,Aberdeen,SD,USA
4,ABY,Southwest Georgia Regional Airport,Albany,GA,USA
...,...,...,...,...,...
317,WRG,Wrangell Airport,Wrangell,AK,USA
318,WYS,Westerly State Airport,West Yellowstone,MT,USA
319,XNA,Northwest Arkansas Regional Airport,Fayetteville/Springdale/Rogers,AR,USA
320,YAK,Yakutat Airport,Yakutat,AK,USA


In [10]:
#Filter Airline Information Data Frame By Airline
airline_df = airlines[airlines['IATA_CODE'].isin(['UA', 'AA', 'DL', 'US', 'WN', 'VX'])]        

#Reset Data Frame Index
airline_df = airline_df.reset_index().drop(['index'], axis = 1)

#Display Airline Information Data Frame
airline_df

Unnamed: 0,IATA_CODE,AIRLINE
0,UA,United Air Lines Inc.
1,AA,American Airlines Inc.
2,US,US Airways Inc.
3,WN,Southwest Airlines Co.
4,DL,Delta Air Lines Inc.
5,VX,Virgin America


In [11]:
#Drop Columns with Large Amounts of Missing Data or Irrelevant Data from Flight Information Data Frame
flights_df = flights.drop(['CANCELLATION_REASON', 'AIR_SYSTEM_DELAY', 'SECURITY_DELAY', 'AIRLINE_DELAY',
                           'LATE_AIRCRAFT_DELAY', 'WEATHER_DELAY', 'DIVERTED', 'CANCELLED'], axis = 1)

#Filter Flight Information Data Frame by Month
flights_df = flights_df[flights_df['MONTH'] == 2]

#Filter Flight Information Data Frame by Day
flights_df = flights_df[(flights_df['DAY'] >= 16) & (flights_df['DAY'] <= 24)]

#Filter Flight Information Data Frame by Airline
flights_df = flights_df[flights_df['AIRLINE'].isin(['UA', 'AA', 'DL', 'US', 'WN', 'VX'])]

#Drop Rows with Missing Data from Flight Information Data Frame
flights_df = flights_df.dropna()

#Reset Data Frame Index
flights_df = flights_df.reset_index().drop(['index'], axis = 1)

#Create Column of Calculated Total Delay in Flight Information Data Frame
flights_df['TOTAL_DELAY'] = flights_df['DEPARTURE_DELAY'] + flights_df['ARRIVAL_DELAY']

#Create Column of Combined Date in Flight Information Data Frame
flights_df.loc[flights_df['DAY'] <= 9, 'DATE'] = flights_df['YEAR'].astype(str) + '-0' + flights_df['MONTH'].astype(str) + '-0' + flights_df['DAY'].astype(str)
flights_df.loc[flights_df['DAY'] > 9, 'DATE'] = flights_df['YEAR'].astype(str) + '-0' + flights_df['MONTH'].astype(str) + '-' + flights_df['DAY'].astype(str)

#Drop Year & Month & Day Columns from Flight Information Data Frame
flights_df = flights_df.drop(['YEAR', 'MONTH', 'DAY'], axis = 1)

#Create Blank List for Day of Week Information
day = []

#Loop Through Numeric Day of Week Values & Convert to Day of Week Name & Append to List
for item in flights_df['DAY_OF_WEEK'].astype(int):
    if item == 1:
        day.append('Monday')
    elif item == 2:
        day.append('Tuesday')
    elif item == 3:
        day.append('Wednesday')
    elif item == 4:
        day.append('Thursday')
    elif item == 5:
        day.append('Friday')
    elif item == 6:
        day.append('Saturday')
    elif item == 7:
        day.append('Sunday')

#Overwrite Day of Week Numeric Values with Day of Week Names
flights_df['DAY_OF_WEEK'] = day

#Display Flight Information Data Frame
flights_df

Unnamed: 0,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,DEPARTURE_TIME,DEPARTURE_DELAY,TAXI_OUT,...,ELAPSED_TIME,AIR_TIME,DISTANCE,WHEELS_ON,TAXI_IN,SCHEDULED_ARRIVAL,ARRIVAL_TIME,ARRIVAL_DELAY,TOTAL_DELAY,DATE
0,Monday,AA,2400,N5ESAA,LAX,DFW,5,3.0,-2.0,13.0,...,166.0,147.0,1235,443.0,6.0,453,449.0,-4.0,-6.0,2015-02-16
1,Monday,DL,1745,N365NW,SMF,MSP,5,2359.0,-6.0,9.0,...,195.0,183.0,1517,511.0,3.0,529,514.0,-15.0,-21.0,2015-02-16
2,Monday,DL,2579,N693DL,DEN,ATL,15,20.0,5.0,49.0,...,195.0,135.0,1199,524.0,11.0,511,535.0,24.0,29.0,2015-02-16
3,Monday,US,2020,N917US,PHX,CLT,15,13.0,-2.0,12.0,...,208.0,186.0,1773,531.0,10.0,600,541.0,-19.0,-21.0,2015-02-16
4,Monday,AA,258,N3DEAA,LAX,MIA,20,20.0,0.0,16.0,...,278.0,254.0,2342,750.0,8.0,804,758.0,-6.0,-6.0,2015-02-16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82750,Tuesday,UA,1104,N33294,ANC,DEN,2357,2350.0,-7.0,11.0,...,291.0,272.0,2405,633.0,8.0,705,641.0,-24.0,-31.0,2015-02-24
82751,Tuesday,US,467,N601AW,PHX,MSP,2359,2358.0,-1.0,10.0,...,175.0,162.0,1276,350.0,3.0,359,353.0,-6.0,-7.0,2015-02-24
82752,Tuesday,US,1770,N102UW,SLC,PHL,2359,2352.0,-7.0,9.0,...,231.0,210.0,1927,531.0,12.0,557,543.0,-14.0,-21.0,2015-02-24
82753,Tuesday,UA,1130,N77530,SEA,IAH,2359,2.0,3.0,16.0,...,242.0,217.0,1874,555.0,9.0,609,604.0,-5.0,-2.0,2015-02-24


In [12]:
#Create Data Frame of Early & On-Time Flights
early_df = flights_df[flights_df['TOTAL_DELAY'] <= 0]

#Create Blank List for Scheduled Departure Times
scheduled_departure = []

#Loop Through Scheduled Departure Time Integers & Convert to Time Strings & Append to List
for item in early_df['SCHEDULED_DEPARTURE'].astype(int):
    if item == 2400:
        scheduled_departure.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_departure.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_departure.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_departure.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_departure.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank list fo Departure Times
departure_time = []

#Loop Through Departure Time Integers & Convert to Time Strings & Append to List
for item in early_df['DEPARTURE_TIME'].astype(int):
    if item == 2400:
        departure_time.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        departure_time.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        departure_time.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        departure_time.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        departure_time.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Wheels Off Times
wheels_off = []

#Loop Through Wheels Off Time Integers & Convert to Time Strings & Append to List
for item in early_df['WHEELS_OFF'].astype(int):
    if item == 2400:
        wheels_off.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        wheels_off.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        wheels_off.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        wheels_off.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        wheels_off.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))     

#Create Blank List for Wheels On Times
wheels_on = []

#Loop Through Wheels On Time Integers & Convert to Time Strings & Append to List
for item in early_df['WHEELS_ON'].astype(int):
    if item == 2400:
        wheels_on.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        wheels_on.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        wheels_on.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        wheels_on.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        wheels_on.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M')) 

#Create Blank List for Scheduled Arrival Times
scheduled_arrival = []

#Loop Through Scheduled Arrival Time Integers & Convert to Time Strings & Append to List
for item in early_df['SCHEDULED_ARRIVAL'].astype(int):
    if item == 2400:
        scheduled_arrival.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_arrival.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_arrival.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_arrival.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_arrival.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Arrival Times
arrival_time = []

#Loop Through Arrival Time Integers & Convert to Time Strings & Append to List
for item in early_df['ARRIVAL_TIME'].astype(int):
    if item == 2400:
        arrival_time.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        arrival_time.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        arrival_time.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        arrival_time.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        arrival_time.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#
early_df['SCHEDULED_DEPARTURE'] = scheduled_departure
early_df['DEPARTURE_TIME'] = departure_time
early_df['WHEELS_OFF'] = wheels_off
early_df['WHEELS_ON'] = wheels_on
early_df['SCHEDULED_ARRIVAL'] = scheduled_arrival
early_df['ARRIVAL_TIME'] = arrival_time

#Create Blank List for Departure Delay Times
departure_delay = []

#Loop Through Departure Delay Time Integers & Convert to Time Strings & Append to List
for item in early_df['DEPARTURE_DELAY'].astype(int):
    if item < 0:
        departure_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        departure_delay.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Arrival Delay Time Integers
arrival_delay = []

#Loop Through Arrival Delay Time Integers & Convert to Time Strings & Append to List
for item in early_df['ARRIVAL_DELAY'].astype(int):
    if item < 0:
        arrival_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        arrival_delay.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Total Delay Time Integers
total_delay = []

#Loop Through Total Delay Time Integers & Convert to Time Strings & Append to List
for item in early_df['TOTAL_DELAY'].astype(int):
    if item < 0:
        total_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        total_delay.append(str(dt.timedelta(minutes = item))[:-3])

#
early_df['DEPARTURE_DELAY'] = departure_delay
early_df['ARRIVAL_DELAY'] = arrival_delay
early_df['TOTAL_DELAY'] = total_delay

#Create Blank List for Taxi Out Times
taxi_out = []

#Loop Through Taxi Out Time Integers & Convert to Time Strings & Append to List
for item in early_df['TAXI_OUT'].astype(int):
    taxi_out.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Scheduled Times
scheduled_time = []

#Loop Through Scheduled Time Integers & Convert to Time Strings & Append to List
for item in early_df['SCHEDULED_TIME'].astype(int):
    scheduled_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Elapsed Times
elapsed_time = []

#Loop Through Elapsed Time Integers & Convert to Time Strings & Append to List
for item in early_df['ELAPSED_TIME'].astype(int):
    elapsed_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Air Times
air_time = []

#Loop Through Air Time Integers & Convert to Time Strings & Append to List
for item in early_df['AIR_TIME'].astype(int):
    air_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Taxi In Times
taxi_in = []

#Loop Through Scheduled Departure Time Integers & Convert to Time Strings & Append to List
for item in early_df['TAXI_IN'].astype(int):
    taxi_in.append(str(dt.timedelta(minutes = item))[:-3])

#
early_df['TAXI_OUT'] = taxi_out
early_df['SCHEDULED_TIME'] = scheduled_time
early_df['ELAPSED_TIME'] = elapsed_time
early_df['AIR_TIME'] = air_time
early_df['TAXI_IN'] = taxi_in

#Reset Data Frame Index
early_df = early_df.reset_index().drop(['index'], axis = 1)

#Display Early & On-Time Flights Data Frame
early_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_gui

Unnamed: 0,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,DEPARTURE_TIME,DEPARTURE_DELAY,TAXI_OUT,...,ELAPSED_TIME,AIR_TIME,DISTANCE,WHEELS_ON,TAXI_IN,SCHEDULED_ARRIVAL,ARRIVAL_TIME,ARRIVAL_DELAY,TOTAL_DELAY,DATE
0,Monday,AA,2400,N5ESAA,LAX,DFW,00:05,00:03,-0:02,0:13,...,2:46,2:27,1235,04:43,0:06,04:53,04:49,-0:04,-0:06,2015-02-16
1,Monday,DL,1745,N365NW,SMF,MSP,00:05,23:59,-0:06,0:09,...,3:15,3:03,1517,05:11,0:03,05:29,05:14,-0:15,-0:21,2015-02-16
2,Monday,US,2020,N917US,PHX,CLT,00:15,00:13,-0:02,0:12,...,3:28,3:06,1773,05:31,0:10,06:00,05:41,-0:19,-0:21,2015-02-16
3,Monday,AA,258,N3DEAA,LAX,MIA,00:20,00:20,0:00,0:16,...,4:38,4:14,2342,07:50,0:08,08:04,07:58,-0:06,-0:06,2015-02-16
4,Monday,US,1905,N173US,SFO,CLT,00:25,00:20,-0:05,0:11,...,4:55,4:12,2296,07:43,0:32,08:11,08:15,0:04,-0:01,2015-02-16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41951,Tuesday,UA,1104,N33294,ANC,DEN,23:57,23:50,-0:07,0:11,...,4:51,4:32,2405,06:33,0:08,07:05,06:41,-0:24,-0:31,2015-02-24
41952,Tuesday,US,467,N601AW,PHX,MSP,23:59,23:58,-0:01,0:10,...,2:55,2:42,1276,03:50,0:03,03:59,03:53,-0:06,-0:07,2015-02-24
41953,Tuesday,US,1770,N102UW,SLC,PHL,23:59,23:52,-0:07,0:09,...,3:51,3:30,1927,05:31,0:12,05:57,05:43,-0:14,-0:21,2015-02-24
41954,Tuesday,UA,1130,N77530,SEA,IAH,23:59,00:02,0:03,0:16,...,4:02,3:37,1874,05:55,0:09,06:09,06:04,-0:05,-0:02,2015-02-24


In [13]:
early_groups = early_df.groupby('AIRLINE')

for name, group in early_groups:
    if name == 'UA':
        UA_Early = pd.DataFrame(group)
        
        UA_Early = UA_Early.reset_index().drop(['index'], axis = 1)
    elif name == 'AA':
        AA_Early = pd.DataFrame(group)
        
        AA_Early = AA_Early.reset_index().drop(['index'], axis = 1)
    elif name == 'US':
        US_Early = pd.DataFrame(group)
        
        US_Early = US_Early.reset_index().drop(['index'], axis = 1)
    elif name == 'WN':
        WN_Early = pd.DataFrame(group)
        
        WN_Early = WN_Early.reset_index().drop(['index'], axis = 1)
    elif name == 'DL':
        DL_Early = pd.DataFrame(group)
        
        DL_Early = DL_Early.reset_index().drop(['index'], axis = 1)
    elif name == 'VX':
        VX_Early = pd.DataFrame(group)
        
        VX_Early = VX_Early.reset_index().drop(['index'], axis = 1)

     DAY_OF_WEEK AIRLINE  FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT  \
0         Monday      UA           1200      N68805            SFO   
1         Monday      UA           1496      N17245            LAX   
2         Monday      UA           1162      N37287            BQN   
3         Monday      UA            288      N424UA            EWR   
4         Monday      UA            260      N435UA            ORD   
...          ...     ...            ...         ...            ...   
5670     Tuesday      UA           1557      N75426            LAX   
5671     Tuesday      UA           1604      N37293            HNL   
5672     Tuesday      UA           1104      N33294            ANC   
5673     Tuesday      UA           1130      N77530            SEA   
5674     Tuesday      UA           1159      N23708            DEN   

     DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY  \
0                    IAH               00:45          00:39           -0:06   
1

In [None]:
#Display UA Early & On -Time Flights Data Frame
UA_Early

In [None]:
#Display AA Early & On-Time Flights Data Frame
AA_Early

In [None]:
#Display US Early & On-Time Flights Data Frame
US_Early

In [None]:
#Display WN Early & On-Time Flights Data Frame
WN_Early

In [None]:
#Display DL Early & On-Time Flights Data Frame
DL_Early

In [None]:
#Display VX Early & On-Time Flights Data Frame
VX_Early

In [14]:
#Create Data Frame of Late Flights
late_df = flights_df[flights_df['TOTAL_DELAY'] > 0]

#Create Blank List for Scheduled Departure Times
scheduled_departure = []

#Loop Through Scheduled Departure Time Integers & Convert to Time Strings & Append to List
for item in late_df['SCHEDULED_DEPARTURE'].astype(int):
    if item == 2400:
        scheduled_departure.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_departure.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_departure.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_departure.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_departure.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Departure Times
departure_time = []

#Loop Through Departure Time Integers & Convert to Time Strings & Append to List
for item in late_df['DEPARTURE_TIME'].astype(int):
    if item == 2400:
        departure_time.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        departure_time.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        departure_time.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        departure_time.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        departure_time.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Wheels Off Times
wheels_off = []

#Loop Through Wheels Off Time Integers & Convert to Time Strings & Append to List
for item in late_df['WHEELS_OFF'].astype(int):
    if item == 2400:
        wheels_off.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        wheels_off.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        wheels_off.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        wheels_off.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        wheels_off.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))     

#Create Blank List for Wheels On Times
wheels_on = []

#Loop Through Wheels On Time Integers & Convert to Time Strings & Append to List
for item in late_df['WHEELS_ON'].astype(int):
    if item == 2400:
        wheels_on.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        wheels_on.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        wheels_on.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        wheels_on.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        wheels_on.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M')) 

#Create Blank List for Scheduled Arrival Times
scheduled_arrival = []

#Loop Through Scheduled Arrival Time Integers & Convert to Time Strings & Append to List
for item in late_df['SCHEDULED_ARRIVAL'].astype(int):
    if item == 2400:
        scheduled_arrival.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_arrival.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_arrival.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_arrival.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_arrival.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Arrival Times
arrival_time = []

#Loop Through Arrival Time Integers & Convert to Time Strings & Append to List
for item in late_df['ARRIVAL_TIME'].astype(int):
    if item == 2400:
        arrival_time.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        arrival_time.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        arrival_time.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        arrival_time.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        arrival_time.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#
late_df['SCHEDULED_DEPARTURE'] = scheduled_departure
late_df['DEPARTURE_TIME'] = departure_time
late_df['WHEELS_OFF'] = wheels_off
late_df['WHEELS_ON'] = wheels_on
late_df['SCHEDULED_ARRIVAL'] = scheduled_arrival
late_df['ARRIVAL_TIME'] = arrival_time

#Create Blank List for Departure Delay Times
departure_delay = []

#Loop Through Departure Delay Time Integers & Convert to Time Strings & Append to List
for item in late_df['DEPARTURE_DELAY'].astype(int):
    if item < 0:
        departure_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        departure_delay.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Arrival Delay Times
arrival_delay = []

#Loop Through Arrival Delay Time Integers & Convert to Time Strings & Append to List
for item in late_df['ARRIVAL_DELAY'].astype(int):
    if item < 0:
        arrival_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        arrival_delay.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Total Delay Time Integers
total_delay = []

#Loop Through Total Delay Time Integers & Convert to Time Strings & Append to List
for item in late_df['TOTAL_DELAY'].astype(int):
    if item < 0:
        total_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        total_delay.append(str(dt.timedelta(minutes = item))[:-3])

#
late_df['DEPARTURE_DELAY'] = departure_delay
late_df['ARRIVAL_DELAY'] = arrival_delay
late_df['TOTAL_DELAY'] = total_delay

#Create Blank List for Taxi Out Times
taxi_out = []

#Loop Through Taxi Out Time Integers & Convert to Time Strings & Append to List
for item in late_df['TAXI_OUT'].astype(int):
    taxi_out.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Scheduled Times
scheduled_time = []

#Loop Through Scheduled Time Integers & Convert to Time Strings & Append to List
for item in late_df['SCHEDULED_TIME'].astype(int):
    scheduled_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Elapsed Times
elapsed_time = []

#Loop Through Elapsed Time Integers & Convert to Time Strings & Append to List
for item in late_df['ELAPSED_TIME'].astype(int):
    elapsed_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Air Times
air_time = []

#Loop Through Air Time Integers & Convert to Time Strings & Append to List
for item in late_df['AIR_TIME'].astype(int):
    air_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Taxi In Times
taxi_in = []

#Loop Through Taxi In Time Integers & Convert to Time Strings & Append to List
for item in late_df['TAXI_IN'].astype(int):
    taxi_in.append(str(dt.timedelta(minutes = item))[:-3])

#
late_df['TAXI_OUT'] = taxi_out
late_df['SCHEDULED_TIME'] = scheduled_time
late_df['ELAPSED_TIME'] = elapsed_time
late_df['AIR_TIME'] = air_time
late_df['TAXI_IN'] = taxi_in

#Reset Data Frame Index
late_df = late_df.reset_index().drop(['index'], axis = 1)

#Display Late Flights Data Frame
late_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_gui

Unnamed: 0,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,DEPARTURE_TIME,DEPARTURE_DELAY,TAXI_OUT,...,ELAPSED_TIME,AIR_TIME,DISTANCE,WHEELS_ON,TAXI_IN,SCHEDULED_ARRIVAL,ARRIVAL_TIME,ARRIVAL_DELAY,TOTAL_DELAY,DATE
0,Monday,DL,2579,N693DL,DEN,ATL,00:15,00:20,0:05,0:49,...,3:15,2:15,1199,05:24,0:11,05:11,05:35,0:24,0:29,2015-02-16
1,Monday,AA,1234,N3LVAA,LAS,ORD,00:27,01:09,0:42,0:14,...,3:13,2:52,1514,06:15,0:07,05:50,06:22,0:32,1:14,2015-02-16
2,Monday,DL,1722,N328NW,SLC,DTW,00:30,00:49,0:19,0:10,...,2:59,2:40,1481,05:39,0:09,05:58,05:48,-0:10,0:09,2015-02-16
3,Monday,US,1866,N561UW,LAX,CLT,00:35,00:31,-0:04,0:22,...,4:46,3:57,2125,07:50,0:27,08:04,08:17,0:13,0:09,2015-02-16
4,Monday,UA,1512,N33292,LAS,IAH,00:38,00:42,0:04,0:17,...,2:40,2:18,1222,05:17,0:05,05:23,05:22,-0:01,0:03,2015-02-16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40794,Tuesday,DL,1264,N6704Z,SLC,JFK,23:40,23:35,-0:05,0:23,...,4:35,3:38,1990,05:36,0:34,05:58,06:10,0:12,0:07,2015-02-24
40795,Tuesday,AA,2200,N860AA,MIA,MCO,23:49,23:50,0:01,0:25,...,1:07,0:33,192,00:48,0:09,00:49,00:57,0:08,0:09,2015-02-24
40796,Tuesday,US,2026,N918US,PHX,PHL,23:50,23:48,-0:02,0:16,...,4:23,3:55,2075,05:59,0:12,06:03,06:11,0:08,0:06,2015-02-24
40797,Tuesday,UA,1720,N76519,PHX,EWR,23:50,00:12,0:22,0:16,...,4:19,3:51,2133,06:19,0:12,06:29,06:31,0:02,0:24,2015-02-24


In [15]:
late_groups = late_df.groupby('AIRLINE')

for name, group in late_groups:
    if name == 'UA':
        UA_Late = pd.DataFrame(group)
        
        UA_Late = UA_Late.reset_index().drop(['index'], axis = 1)
    elif name == 'AA':
        AA_Late = pd.DataFrame(group)
        
        AA_Late = AA_Late.reset_index().drop(['index'], axis = 1)
    elif name == 'US':
        US_Late = pd.DataFrame(group)
        
        US_Late = US_Late.reset_index().drop(['index'], axis = 1)
    elif name == 'WN':
        WN_Late = pd.DataFrame(group)
        
        WN_Late = WN_Late.reset_index().drop(['index'], axis = 1)
    elif name == 'DL':
        DL_Late = pd.DataFrame(group)
        
        DL_Late = DL_Late.reset_index().drop(['index'], axis = 1)
    elif name == 'VX':
        VX_Late = pd.DataFrame(group)
        
        VX_Late = VX_Late.reset_index().drop(['index'], axis = 1)

     DAY_OF_WEEK AIRLINE  FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT  \
0         Monday      UA           1512      N33292            LAS   
1         Monday      UA           1204      N37273            SJU   
2         Monday      UA           1447      N27733            PHL   
3         Monday      UA           1545      N17233            DFW   
4         Monday      UA            572      N842UA            AUS   
...          ...     ...            ...         ...            ...   
6325     Tuesday      UA           1562      N69806            SFO   
6326     Tuesday      UA           1581      N14219            SFO   
6327     Tuesday      UA           1493      N76522            SFO   
6328     Tuesday      UA            383      N212UA            HNL   
6329     Tuesday      UA           1720      N76519            PHX   

     DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY  \
0                    IAH               00:38          00:42            0:04   
1

In [None]:
#Display UA Late Flights Data Frame
UA_Late

In [None]:
#Display AA Late Flights Data Frame
AA_Late

In [None]:
#Display US Late Flights Data Frame
UA_Late

In [None]:
#Display WN Late Flights Data Frame
WN_Late

In [None]:
#Display DL Late Flights Data Frame
DL_Late

In [None]:
#Display VX Late Flights Data Frame
VX_Late

In [16]:
#Create Cancelled Flights Data Frame
cancelled_df = flights[flights['CANCELLED'] == 1]

cancelled_df = cancelled_df[cancelled_df['MONTH'] == 2]

cancelled_df = cancelled_df[(cancelled_df['DAY'] >= 16) & (cancelled_df['DAY'] <= 24)]

cancelled_df = cancelled_df.drop(['CANCELLATION_REASON', 'AIR_SYSTEM_DELAY', 'SECURITY_DELAY', 'AIRLINE_DELAY',
                                  'LATE_AIRCRAFT_DELAY', 'WEATHER_DELAY', 'DIVERTED', 'CANCELLED'], axis = 1)

cancelled_df = cancelled_df.drop(['DEPARTURE_TIME', 'DEPARTURE_DELAY', 'TAXI_OUT', 'WHEELS_OFF', 'ELAPSED_TIME',
                                  'AIR_TIME', 'WHEELS_ON', 'TAXI_IN', 'ARRIVAL_TIME', 'ARRIVAL_DELAY'], axis = 1)

cancelled_df.loc[cancelled_df['DAY'] <= 9, 'DATE'] = cancelled_df['YEAR'].astype(str) + '-0' + cancelled_df['MONTH'].astype(str) + '-0' + cancelled_df['DAY'].astype(str)
cancelled_df.loc[cancelled_df['DAY'] > 9, 'DATE'] = cancelled_df['YEAR'].astype(str) + '-0' + cancelled_df['MONTH'].astype(str) + '-' + cancelled_df['DAY'].astype(str)

cancelled_df = cancelled_df.drop(['YEAR', 'MONTH', 'DAY'], axis = 1)

day = []

for item in cancelled_df['DAY_OF_WEEK'].astype(int):
    if item == 1:
        day.append('Monday')
    elif item == 2:
        day.append('Tuesday')
    elif item == 3:
        day.append('Wednesday')
    elif item == 4:
        day.append('Thursday')
    elif item == 5:
        day.append('Friday')
    elif item == 6:
        day.append('Saturday')
    elif item == 7:
        day.append('Sunday')
        
cancelled_df['DAY_OF_WEEK'] = day

cancelled_df = cancelled_df.dropna()

#Reset Data Frame Index
cancelled_df = cancelled_df.reset_index().drop(['index'], axis = 1)

#Create Blank List for Scheduled Departure Times
scheduled_departure = []

#Loop Through Scheduled Departure Time Integers & Convert to Time Strings & Append to List
for item in cancelled_df['SCHEDULED_DEPARTURE'].astype(int):
    if item == 2400:
        scheduled_departure.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_departure.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_departure.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_departure.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_departure.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Scheduled Arrival Times
scheduled_arrival = []

#Loop Through Scheduled Arrival Time Integers & Convert to Time Strings & Append to List
for item in cancelled_df['SCHEDULED_ARRIVAL'].astype(int):
    if item == 2400:
        scheduled_arrival.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_arrival.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_arrival.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_arrival.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_arrival.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#
cancelled_df['SCHEDULED_DEPARTURE'] = scheduled_departure
cancelled_df['SCHEDULED_ARRIVAL'] = scheduled_arrival

#Create Blank List for Scheduled Times
scheduled_time = []

#Loop Through Scheduled Time Integers & Convert to Time Strings & Append to List
for item in cancelled_df['SCHEDULED_TIME'].astype(int):
    scheduled_time.append(str(dt.timedelta(minutes = item))[:-3])

#
cancelled_df['SCHEDULED_TIME'] = scheduled_time

#Display Cancelled Flights Data Frame
cancelled_df

Unnamed: 0,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,SCHEDULED_TIME,DISTANCE,SCHEDULED_ARRIVAL,DATE
0,Monday,OO,5501,N709SK,MFE,IAH,05:20,1:10,316,06:30,2015-02-16
1,Monday,DL,1780,N967DL,BNA,ATL,05:25,1:10,214,07:35,2015-02-16
2,Monday,MQ,3659,N636MQ,SGF,DFW,05:25,1:32,364,06:57,2015-02-16
3,Monday,EV,6022,N23139,BNA,ORD,05:35,1:48,409,07:23,2015-02-16
4,Monday,DL,2079,N329NW,BOS,DTW,05:45,2:14,632,07:59,2015-02-16
...,...,...,...,...,...,...,...,...,...,...,...
5120,Tuesday,EV,5022,N878AS,ATL,ILM,22:05,1:18,377,23:23,2015-02-24
5121,Tuesday,WN,1890,N714CB,BWI,DAY,22:15,1:30,406,23:45,2015-02-24
5122,Tuesday,AA,1413,N4WMAA,DFW,SAT,22:50,1:02,247,23:52,2015-02-24
5123,Tuesday,OO,5586,N118SY,SFO,DFW,23:00,3:25,1464,04:25,2015-02-24


In [17]:
cancelled_groups = cancelled_df.groupby('AIRLINE')

for name, group in cancelled_groups:
    if name == 'UA':
        UA_Cancelled = pd.DataFrame(group)
        
        UA_Cancelled = UA_Cancelled.reset_index().drop(['index'], axis = 1)
    elif name == 'AA':
        AA_Cancelled = pd.DataFrame(group)
        
        AA_Cancelled = AA_Cancelled.reset_index().drop(['index'], axis = 1)
    elif name == 'US':
        US_Cancelled = pd.DataFrame(group)
        
        US_Cancelled = US_Cancelled.reset_index().drop(['index'], axis = 1)
    elif name == 'WN':
        WN_Cancelled = pd.DataFrame(group)
        
        WN_Cancelled = WN_Cancelled.reset_index().drop(['index'], axis = 1)
    elif name == 'DL':
        DL_Cancelled = pd.DataFrame(group)
        
        DL_Cancelled = DL_Cancelled.reset_index().drop(['index'], axis = 1)
    elif name == 'VX':
        VX_Cancelled = pd.DataFrame(group)
        
        VX_Cancelled = VX_Cancelled.reset_index().drop(['index'], axis = 1)

  DAY_OF_WEEK AIRLINE  FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT  \
0      Monday      UA            627      N818UA            DCA   
1      Monday      UA           1429      N37470            IAD   
2   Wednesday      UA           1159      N13248            EWR   
3    Saturday      UA           1220      N27901            DEN   
4    Saturday      UA            217      N808UA            IAD   

  DESTINATION_AIRPORT SCHEDULED_DEPARTURE SCHEDULED_TIME  DISTANCE  \
0                 ORD               17:45           2:14       612   
1                 LAS               19:49           5:27      2065   
2                 SJU               20:00           3:53      1608   
3                 IAH               14:45           2:16       862   
4                 SFO               16:29           6:09      2419   

  SCHEDULED_ARRIVAL        DATE  
0             18:59  2015-02-16  
1             22:16  2015-02-16  
2             00:53  2015-02-18  
3             18:01  2015-02-21  
4     

In [None]:
#Display UA Cancelled Flights Data Frame
UA_Cancelled

In [None]:
#Display AA Cancelled Flights Data Frame
AA_Cancelled

In [None]:
#Display US Cancelled Flights Data Frame
US_Cancelled

In [None]:
#Display WN Cancelled Flights Data Frame
WN_Cancelled

In [None]:
#Display DL_Cancelled Flights Data Frame
DL_Cancelled

In [None]:
#Display VX Cancelled Flights Data Frame
VX_Cancelled

In [18]:
#Create Diverted Flights Data Frame
diverted_df = flights[flights['DIVERTED'] == 1]

diverted_df = diverted_df[diverted_df['MONTH'] == 2]

diverted_df = diverted_df[(diverted_df['DAY'] >= 16) & (diverted_df['DAY'] <= 24)]

diverted_df = diverted_df.drop(['CANCELLATION_REASON', 'AIR_SYSTEM_DELAY', 'SECURITY_DELAY', 'AIRLINE_DELAY',
                                  'LATE_AIRCRAFT_DELAY', 'WEATHER_DELAY', 'DIVERTED', 'CANCELLED'], axis = 1)

diverted_df = diverted_df.drop(['ELAPSED_TIME', 'AIR_TIME', 'ARRIVAL_DELAY'], axis = 1)

diverted_df.loc[diverted_df['DAY'] <= 9, 'DATE'] = diverted_df['YEAR'].astype(str) + '-0' + diverted_df['MONTH'].astype(str) + '-0' + diverted_df['DAY'].astype(str)
diverted_df.loc[diverted_df['DAY'] > 9, 'DATE'] = diverted_df['YEAR'].astype(str) + '-0' + diverted_df['MONTH'].astype(str) + '-' + diverted_df['DAY'].astype(str)

diverted_df = diverted_df.drop(['YEAR', 'MONTH', 'DAY'], axis = 1)

day = []

for item in diverted_df['DAY_OF_WEEK'].astype(int):
    if item == 1:
        day.append('Monday')
    elif item == 2:
        day.append('Tuesday')
    elif item == 3:
        day.append('Wednesday')
    elif item == 4:
        day.append('Thursday')
    elif item == 5:
        day.append('Friday')
    elif item == 6:
        day.append('Saturday')
    elif item == 7:
        day.append('Sunday')
        
diverted_df['DAY_OF_WEEK'] = day

diverted_df = diverted_df.dropna()

#Reset Data Frame Index
diverted_df = diverted_df.reset_index().drop(['index'], axis = 1)

#Create Blank List for Scheduled Departure Times
scheduled_departure = []

#Loop Through Scheduled Departure Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['SCHEDULED_DEPARTURE'].astype(int):
    if item == 2400:
        scheduled_departure.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_departure.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_departure.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_departure.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_departure.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Departure Times
departure_time = []

#Loop Through Departure Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['DEPARTURE_TIME'].astype(int):
    if item == 2400:
        departure_time.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        departure_time.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        departure_time.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        departure_time.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        departure_time.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Wheels Off Times
wheels_off = []

#Loop Through Wheels Off Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['WHEELS_OFF'].astype(int):
    if item == 2400:
        wheels_off.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        wheels_off.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        wheels_off.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        wheels_off.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        wheels_off.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))     

#Create Blank List for Wheels On Times
wheels_on = []

#Loop Through Wheels On Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['WHEELS_ON'].astype(int):
    if item == 2400:
        wheels_on.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        wheels_on.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        wheels_on.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        wheels_on.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        wheels_on.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M')) 

#Create Blank List for Scheduled Arrival Times
scheduled_arrival = []

#Loop Through Scheduled Arrival Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['SCHEDULED_ARRIVAL'].astype(int):
    if item == 2400:
        scheduled_arrival.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        scheduled_arrival.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        scheduled_arrival.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        scheduled_arrival.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        scheduled_arrival.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))

#Create Blank List for Arrival Times
arrival_time = []

#Loop Through Arrival Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['ARRIVAL_TIME'].astype(int):
    if item == 2400:
        arrival_time.append(dt.datetime.strptime('0000', '%H%M').strftime('%H:%M'))
    elif item >= 1000:
        arrival_time.append(dt.datetime.strptime(str(item), '%H%M').strftime('%H:%M'))
    elif item >= 100 and item <= 999:
        arrival_time.append(dt.datetime.strptime('0' + str(item), '%H%M').strftime('%H:%M'))
    elif item >= 10 and item <= 99:
        arrival_time.append(dt.datetime.strptime('00' + str(item), '%H%M').strftime('%H:%M'))
    elif item <= 9:
        arrival_time.append(dt.datetime.strptime('000' + str(item), '%H%M').strftime('%H:%M'))
        
#
diverted_df['SCHEDULED_DEPARTURE'] = scheduled_departure
diverted_df['DEPARTURE_TIME'] = departure_time
diverted_df['WHEELS_OFF'] = wheels_off
diverted_df['WHEELS_ON'] = wheels_on
diverted_df['SCHEDULED_ARRIVAL'] = scheduled_arrival
diverted_df['ARRIVAL_TIME'] = arrival_time

#Create Blank List for Departure Delay Times
departure_delay = []

#Loop Through Departure Delay Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['DEPARTURE_DELAY'].astype(int):
    if item < 0:
        departure_delay.append('-' + str(dt.timedelta(minutes = (-1 * item)))[:-3])
    else:
        departure_delay.append(str(dt.timedelta(minutes = item))[:-3])

#
diverted_df['DEPARTURE_DELAY'] = departure_delay

#Create Blank List for Taxi Out Times
taxi_out = []

#Loop Through Taxi Out Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['TAXI_OUT'].astype(int):
    taxi_out.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Scheduled Times
scheduled_time = []

#Loop Through Scheduled Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['SCHEDULED_TIME'].astype(int):
    scheduled_time.append(str(dt.timedelta(minutes = item))[:-3])

#Create Blank List for Taxi In Times
taxi_in = []

#Loop Through Taxi In Time Integers & Convert to Time Strings & Append to List
for item in diverted_df['TAXI_IN'].astype(int):
    taxi_in.append(str(dt.timedelta(minutes = item))[:-3])

#
diverted_df['TAXI_OUT'] = taxi_out
diverted_df['SCHEDULED_TIME'] = scheduled_time
diverted_df['TAXI_IN'] = taxi_in

#Display Diverted Flights Data Frame
diverted_df

Unnamed: 0,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,DEPARTURE_TIME,DEPARTURE_DELAY,TAXI_OUT,WHEELS_OFF,SCHEDULED_TIME,DISTANCE,WHEELS_ON,TAXI_IN,SCHEDULED_ARRIVAL,ARRIVAL_TIME,DATE
0,Monday,NK,705,N588NK,LGA,FLL,06:30,07:00,0:30,0:14,07:14,3:03,1076,16:59,0:09,09:33,17:08,2015-02-16
1,Monday,EV,5076,N850AS,ATL,SGF,08:36,08:34,-0:02,0:10,08:44,1:55,563,15:21,0:06,09:31,15:27,2015-02-16
2,Monday,EV,2559,N908EV,DFW,SPS,08:55,08:49,-0:06,0:20,09:09,0:50,113,12:37,0:05,09:45,12:42,2015-02-16
3,Monday,AA,74,N436AA,DFW,HOU,09:20,09:27,0:07,0:17,09:44,1:06,247,12:48,0:04,10:26,12:52,2015-02-16
4,Monday,B6,411,N708JB,JFK,LAS,09:57,14:14,4:17,0:22,14:36,5:51,2248,22:35,0:09,12:48,22:44,2015-02-16
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
213,Tuesday,OO,5156,N727SK,BFL,IAH,06:45,06:35,-0:10,0:14,06:49,3:20,1428,16:32,0:10,12:05,16:42,2015-02-24
214,Tuesday,UA,236,N473UA,LAS,IAD,06:57,06:48,-0:09,0:12,07:00,4:25,2065,15:22,0:03,14:22,15:25,2015-02-24
215,Tuesday,WN,466,N770SA,FLL,BWI,12:20,12:18,-0:02,0:11,12:29,2:40,925,17:58,0:03,15:00,18:01,2015-02-24
216,Tuesday,DL,1543,N314NB,ALB,ATL,12:40,14:32,1:52,0:09,14:41,2:50,853,18:28,0:06,15:30,18:34,2015-02-24


In [19]:
diverted_groups = diverted_df.groupby('AIRLINE')

for name, group in diverted_groups:
    if name == 'UA':
        UA_Diverted = pd.DataFrame(group)
        
        UA_Diverted = UA_Diverted.reset_index().drop(['index'], axis = 1)
    elif name == 'AA':
        AA_Diverted = pd.DataFrame(group)
        
        AA_Diverted = AA_Diverted.reset_index().drop(['index'], axis = 1)
    elif name == 'US':
        US_Diverted = pd.DataFrame(group)
        
        US_Diverted = US_Diverted.reset_index().drop(['index'], axis = 1)
    elif name == 'WN':
        WN_Diverted = pd.DataFrame(group)
        
        WN_Diverted = WN_Diverted.reset_index().drop(['index'], axis = 1)
    elif name == 'DL':
        DL_Diverted = pd.DataFrame(group)
        
        DL_Diverted = DL_Diverted.reset_index().drop(['index'], axis = 1)
    elif name == 'VX':
        VX_Diverted = pd.DataFrame(group)
        
        VX_Diverted = VX_Diverted.reset_index().drop(['index'], axis = 1)
        
print(UA_Diverted)
print(AA_Diverted)
print(US_Diverted)
print(WN_Diverted)
print(DL_Diverted)
print(VX_Diverted)

  DAY_OF_WEEK AIRLINE  FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT  \
0      Monday      UA            536      N430UA            MIA   
1      Monday      UA           1261      N75425            IAH   
2     Tuesday      UA            253      N772UA            IAH   
3    Thursday      UA            687      N471UA            SFO   
4      Friday      UA           1142      N66831            MCO   
5      Friday      UA           1071      N76529            EWR   
6    Saturday      UA           1109      N36472            EWR   
7      Sunday      UA           1501      N14237            IAH   
8     Tuesday      UA            236      N473UA            LAS   

  DESTINATION_AIRPORT SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY  \
0                 DEN               16:50          17:12            0:22   
1                 DEN               17:43          18:04            0:21   
2                 HNL               09:20          17:40            8:20   
3                 LAX    

### Tweet Data (Natalie)

In [20]:
data_tweets = data_tweets.drop(['negativereason', 'negativereason_confidence', 'airline_sentiment_gold', 
                                'negativereason_gold', 
                                'tweet_coord', 'tweet_location', 'user_timezone'],axis=1)
data_tweets = data_tweets.dropna()
data_tweets.sort_values("tweet_created")
data_tweets["tweet_id"] = data_tweets["tweet_id"].astype(str)
data_tweets

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_created
0,567588278875213824,neutral,1.0000,Delta,JetBlueNews,0,@JetBlue's new CEO seeks the right balance to ...,2015-02-16 23:36:05 -0800
1,567590027375702016,negative,1.0000,Delta,nesi_1992,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,2015-02-16 23:43:02 -0800
2,567591480085463040,negative,1.0000,United,CPoutloud,0,@united yes. We waited in line for almost an h...,2015-02-16 23:48:48 -0800
3,567592368451248130,negative,1.0000,United,brenduch,0,@united the we got into the gate at IAH on tim...,2015-02-16 23:52:20 -0800
4,567594449874587648,negative,1.0000,Southwest,VahidESQ,0,@SouthwestAir its cool that my bags take a bit...,2015-02-17 00:00:36 -0800
...,...,...,...,...,...,...,...,...
14480,570309308937842688,neutral,0.6869,Delta,Oneladyyouadore,0,@JetBlue I hope so because I fly very often an...,2015-02-24 11:48:29 -0800
14481,570309340952993796,neutral,1.0000,US Airways,DebbiMcGinnis,0,@USAirways is a DM possible if you aren't foll...,2015-02-24 11:48:37 -0800
14482,570309345281486848,positive,0.6469,Delta,jaxbra,0,@JetBlue Yesterday on my way from EWR to FLL j...,2015-02-24 11:48:38 -0800
14483,570310144459972608,negative,1.0000,US Airways,GAKotsch,0,@USAirways and when will one of these agents b...,2015-02-24 11:51:48 -0800


In [21]:
data_tweets['airline'] = data_tweets["airline"].replace("United","UA")
data_tweets['airline'] = data_tweets["airline"].replace("Delta","DL")
data_tweets['airline'] = data_tweets["airline"].replace("Southwest","WN")
data_tweets['airline'] = data_tweets["airline"].replace("American","AA")
data_tweets['airline'] = data_tweets["airline"].replace("US Airways","US")
data_tweets['airline'] = data_tweets["airline"].replace("Virgin America","VX")
data_tweets

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_created
0,567588278875213824,neutral,1.0000,DL,JetBlueNews,0,@JetBlue's new CEO seeks the right balance to ...,2015-02-16 23:36:05 -0800
1,567590027375702016,negative,1.0000,DL,nesi_1992,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,2015-02-16 23:43:02 -0800
2,567591480085463040,negative,1.0000,UA,CPoutloud,0,@united yes. We waited in line for almost an h...,2015-02-16 23:48:48 -0800
3,567592368451248130,negative,1.0000,UA,brenduch,0,@united the we got into the gate at IAH on tim...,2015-02-16 23:52:20 -0800
4,567594449874587648,negative,1.0000,WN,VahidESQ,0,@SouthwestAir its cool that my bags take a bit...,2015-02-17 00:00:36 -0800
...,...,...,...,...,...,...,...,...
14480,570309308937842688,neutral,0.6869,DL,Oneladyyouadore,0,@JetBlue I hope so because I fly very often an...,2015-02-24 11:48:29 -0800
14481,570309340952993796,neutral,1.0000,US,DebbiMcGinnis,0,@USAirways is a DM possible if you aren't foll...,2015-02-24 11:48:37 -0800
14482,570309345281486848,positive,0.6469,DL,jaxbra,0,@JetBlue Yesterday on my way from EWR to FLL j...,2015-02-24 11:48:38 -0800
14483,570310144459972608,negative,1.0000,US,GAKotsch,0,@USAirways and when will one of these agents b...,2015-02-24 11:51:48 -0800


In [22]:
data_tweets_new = data_tweets['tweet_created'].str.split(" ",n=2,expand=True)

In [23]:
data_tweets["tweet_date"] = data_tweets_new[0]
data_tweets

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_created,tweet_date
0,567588278875213824,neutral,1.0000,DL,JetBlueNews,0,@JetBlue's new CEO seeks the right balance to ...,2015-02-16 23:36:05 -0800,2015-02-16
1,567590027375702016,negative,1.0000,DL,nesi_1992,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,2015-02-16 23:43:02 -0800,2015-02-16
2,567591480085463040,negative,1.0000,UA,CPoutloud,0,@united yes. We waited in line for almost an h...,2015-02-16 23:48:48 -0800,2015-02-16
3,567592368451248130,negative,1.0000,UA,brenduch,0,@united the we got into the gate at IAH on tim...,2015-02-16 23:52:20 -0800,2015-02-16
4,567594449874587648,negative,1.0000,WN,VahidESQ,0,@SouthwestAir its cool that my bags take a bit...,2015-02-17 00:00:36 -0800,2015-02-17
...,...,...,...,...,...,...,...,...,...
14480,570309308937842688,neutral,0.6869,DL,Oneladyyouadore,0,@JetBlue I hope so because I fly very often an...,2015-02-24 11:48:29 -0800,2015-02-24
14481,570309340952993796,neutral,1.0000,US,DebbiMcGinnis,0,@USAirways is a DM possible if you aren't foll...,2015-02-24 11:48:37 -0800,2015-02-24
14482,570309345281486848,positive,0.6469,DL,jaxbra,0,@JetBlue Yesterday on my way from EWR to FLL j...,2015-02-24 11:48:38 -0800,2015-02-24
14483,570310144459972608,negative,1.0000,US,GAKotsch,0,@USAirways and when will one of these agents b...,2015-02-24 11:51:48 -0800,2015-02-24


In [24]:
data_tweets_time= data_tweets_new[1].str.split(":",n=2, expand=True)
data_tweets["tweet_time"]=data_tweets_time[0]  + ":" + data_tweets_time[1]
data_tweets

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_created,tweet_date,tweet_time
0,567588278875213824,neutral,1.0000,DL,JetBlueNews,0,@JetBlue's new CEO seeks the right balance to ...,2015-02-16 23:36:05 -0800,2015-02-16,23:36
1,567590027375702016,negative,1.0000,DL,nesi_1992,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,2015-02-16 23:43:02 -0800,2015-02-16,23:43
2,567591480085463040,negative,1.0000,UA,CPoutloud,0,@united yes. We waited in line for almost an h...,2015-02-16 23:48:48 -0800,2015-02-16,23:48
3,567592368451248130,negative,1.0000,UA,brenduch,0,@united the we got into the gate at IAH on tim...,2015-02-16 23:52:20 -0800,2015-02-16,23:52
4,567594449874587648,negative,1.0000,WN,VahidESQ,0,@SouthwestAir its cool that my bags take a bit...,2015-02-17 00:00:36 -0800,2015-02-17,00:00
...,...,...,...,...,...,...,...,...,...,...
14480,570309308937842688,neutral,0.6869,DL,Oneladyyouadore,0,@JetBlue I hope so because I fly very often an...,2015-02-24 11:48:29 -0800,2015-02-24,11:48
14481,570309340952993796,neutral,1.0000,US,DebbiMcGinnis,0,@USAirways is a DM possible if you aren't foll...,2015-02-24 11:48:37 -0800,2015-02-24,11:48
14482,570309345281486848,positive,0.6469,DL,jaxbra,0,@JetBlue Yesterday on my way from EWR to FLL j...,2015-02-24 11:48:38 -0800,2015-02-24,11:48
14483,570310144459972608,negative,1.0000,US,GAKotsch,0,@USAirways and when will one of these agents b...,2015-02-24 11:51:48 -0800,2015-02-24,11:51


In [25]:
data_tweets= data_tweets.drop(["tweet_created"],axis=1)
data_tweets

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567588278875213824,neutral,1.0000,DL,JetBlueNews,0,@JetBlue's new CEO seeks the right balance to ...,2015-02-16,23:36
1,567590027375702016,negative,1.0000,DL,nesi_1992,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,2015-02-16,23:43
2,567591480085463040,negative,1.0000,UA,CPoutloud,0,@united yes. We waited in line for almost an h...,2015-02-16,23:48
3,567592368451248130,negative,1.0000,UA,brenduch,0,@united the we got into the gate at IAH on tim...,2015-02-16,23:52
4,567594449874587648,negative,1.0000,WN,VahidESQ,0,@SouthwestAir its cool that my bags take a bit...,2015-02-17,00:00
...,...,...,...,...,...,...,...,...,...
14480,570309308937842688,neutral,0.6869,DL,Oneladyyouadore,0,@JetBlue I hope so because I fly very often an...,2015-02-24,11:48
14481,570309340952993796,neutral,1.0000,US,DebbiMcGinnis,0,@USAirways is a DM possible if you aren't foll...,2015-02-24,11:48
14482,570309345281486848,positive,0.6469,DL,jaxbra,0,@JetBlue Yesterday on my way from EWR to FLL j...,2015-02-24,11:48
14483,570310144459972608,negative,1.0000,US,GAKotsch,0,@USAirways and when will one of these agents b...,2015-02-24,11:51


In [26]:
#creating a groupby for airline codes
data_tweets_grouped = data_tweets.groupby(["airline"])

In [27]:
#creating separate dataframes for different airlines
AA_df = data_tweets_grouped.get_group("AA")

In [28]:
#Creating new dataframes based on type of review 

AA_grouped = AA_df.groupby("airline_sentiment")
AA_positive = AA_grouped.get_group("positive")
AA_positive=AA_positive.reset_index(drop=True)

AA_positive

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,568551906634797056,positive,0.6242,AA,byunsamuel,0,@AmericanAir Hopefully you ll see bad ones as ...,2015-02-19,15:25
1,569587686496825344,positive,0.3487,AA,KristenReenders,0,@AmericanAir thank you we got on a different f...,2015-02-22,12:01
2,569588473050611712,positive,1.0000,AA,Laurelinesblog,0,@AmericanAir Thanks! He is.,2015-02-22,12:04
3,569588651925098496,positive,1.0000,AA,jlhalldc,0,Thank you. “@AmericanAir: @jlhalldc Customer R...,2015-02-22,12:04
4,569589643487928321,positive,1.0000,AA,DrCaseyJRudkin,0,@AmericanAir Flight 236 was great. Fantastic c...,2015-02-22,12:08
...,...,...,...,...,...,...,...,...,...
302,570299824760860672,positive,0.6666,AA,COVRTER,0,"@AmericanAir Great, thanks. Followed.",2015-02-24,11:10
303,570300355843661824,positive,0.6712,AA,dcathomedad,0,@AmericanAir I might look into that. My wife t...,2015-02-24,11:12
304,570302358242115584,positive,0.7047,AA,JohnMHaaland,0,@AmericanAir thanks,2015-02-24,11:20
305,570305264613765122,positive,1.0000,AA,jamucsb,0,@AmericanAir thank you!,2015-02-24,11:32


In [29]:
#Creating new dataframes based on type of review 

AA_negative = AA_grouped.get_group("negative")
AA_negative=AA_negative.reset_index(drop=True)

AA_negative

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,568265091226800130,negative,0.9240,AA,beaubertke,0,"@AmericanAir Okay, I think 1565 has waited lon...",2015-02-18,20:25
1,568824537338417154,negative,1.0000,AA,KaiserSnowse,0,@AmericanAir - how long does it take to get cr...,2015-02-20,09:28
2,569047438880841728,negative,1.0000,AA,ohmal,0,@AmericanAir you need to work harder on the di...,2015-02-21,00:14
3,569587188687634433,negative,1.0000,AA,SraJackson,0,"@AmericanAir you have my money, you change my ...",2015-02-22,11:59
4,569587371693355008,negative,1.0000,AA,itsropes,0,@AmericanAir leaving over 20 minutes Late Flig...,2015-02-22,11:59
...,...,...,...,...,...,...,...,...,...
1859,570307390752608257,negative,1.0000,AA,barbararwill,0,"@AmericanAir no thanks. As I said, being deni...",2015-02-24,11:40
1860,570307434113310720,negative,0.6547,AA,LauraMolito,0,"@AmericanAir stranded for 24 hours in MIA, Pat...",2015-02-24,11:41
1861,570307948171423745,negative,0.6846,AA,SweeLoTmac,0,@AmericanAir why would I pay $200 to reactivat...,2015-02-24,11:43
1862,570307949614256128,negative,0.6316,AA,ELLLORRAC,0,@AmericanAir thanks for getting back to me. Bu...,2015-02-24,11:43


In [30]:
#Creating new dataframes based on type of review 

AA_neutral = AA_grouped.get_group("neutral")
AA_neutral=AA_neutral.reset_index(drop=True)

AA_neutral

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,569587140490866689,neutral,0.6771,AA,daviddtwu,0,@AmericanAir we have 8 ppl so we need 2 know h...,2015-02-22,11:58
1,569587242672398336,neutral,1.0000,AA,sanyabun,0,@AmericanAir Please bring American Airlines to...,2015-02-22,11:59
2,569587813856841728,neutral,0.6760,AA,Chad_SMFYM,0,"“@AmericanAir: @TilleyMonsta George, that does...",2015-02-22,12:01
3,569591730506371072,neutral,1.0000,AA,TrueChief77,0,"@AmericanAir guarantee no retribution? If so, ...",2015-02-22,12:17
4,569592270866878464,neutral,1.0000,AA,WishUpon_26,0,@AmericanAir i need someone to help me out,2015-02-22,12:19
...,...,...,...,...,...,...,...,...,...
428,570305051819941889,neutral,1.0000,AA,Chandrafaythe,0,@AmericanAir my flight got Cancelled Flightled...,2015-02-24,11:31
429,570305365159632899,neutral,0.3474,AA,penyu1818,0,"@AmericanAir DM the locator code, thanks.",2015-02-24,11:32
430,570306423818723328,neutral,0.6767,AA,sammy575,0,@AmericanAir is the new 9:45 time confirmed or...,2015-02-24,11:37
431,570306662575300611,neutral,0.6742,AA,gjeaviation,0,@AmericanAir 767 seconds from touchdown at Mad...,2015-02-24,11:37


In [31]:
#creating separate dataframes for different airlines
UA_df = data_tweets_grouped.get_group("UA")
#Creating new dataframes based on type of review 

UA_grouped = UA_df.groupby("airline_sentiment")
UA_positive = UA_grouped.get_group("positive")
UA_positive = UA_positive.reset_index(drop=True)

UA_positive

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567701830805618688,positive,1.0000,UA,scherzva,0,"@united New Apple crâpe, amazing! Live from UA...",2015-02-17,07:07
1,567729696662511616,positive,1.0000,UA,mchooyah,0,@united Thank you for the new Club at O'Hare. ...,2015-02-17,08:58
2,567733177590874114,positive,1.0000,UA,ClaudiaStClair,0,@united I appreciate the follow up.,2015-02-17,09:11
3,567733228690083841,positive,1.0000,UA,jsumiyasu,0,@united @jsumiyasu I am thankful to the Unite...,2015-02-17,09:12
4,567733609130233856,positive,0.7097,UA,BhutanOrient,0,@united no worries about the tweets. We all sh...,2015-02-17,09:13
...,...,...,...,...,...,...,...,...,...
487,570299819610251265,positive,1.0000,UA,BK_TheBri,0,@united Thanks. It is on the same ticket.,2015-02-24,11:10
488,570299889688702976,positive,0.6634,UA,nydia376,0,@united thanks,2015-02-24,11:11
489,570306733010264064,positive,0.3441,UA,rombaa,0,@united thanks -- we filled it out. How's our ...,2015-02-24,11:38
490,570307847281614848,positive,1.0000,UA,CoreyAStewart,0,@united Thanks for taking care of that MR!! Ha...,2015-02-24,11:42


In [32]:
#Creating new dataframes based on type of review 

UA_negative = UA_grouped.get_group("negative")
UA_negative = UA_negative.reset_index(drop=True)

UA_negative

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567591480085463040,negative,1.0000,UA,CPoutloud,0,@united yes. We waited in line for almost an h...,2015-02-16,23:48
1,567592368451248130,negative,1.0000,UA,brenduch,0,@united the we got into the gate at IAH on tim...,2015-02-16,23:52
2,567594579310825473,negative,1.0000,UA,brenduch,0,@united and don't hope for me having a nicer f...,2015-02-17,00:01
3,567595670463205376,negative,1.0000,UA,CRomerDome,0,@united I like delays less than you because I'...,2015-02-17,00:05
4,567614049425555457,negative,1.0000,UA,JustOGG,0,"@united, link to current status of flights/air...",2015-02-17,01:18
...,...,...,...,...,...,...,...,...,...
2628,570302023993831425,negative,0.6735,UA,slandail,0,@united Gate agent hooked me up with alternate...,2015-02-24,11:19
2629,570304912468402177,negative,0.6667,UA,andycheco,0,@united you think you boarded flight AU1066 to...,2015-02-24,11:31
2630,570306217001799680,negative,0.3475,UA,samidip,0,@united Your ERI-ORD express connections are h...,2015-02-24,11:36
2631,570307026263384064,negative,1.0000,UA,lsalazarll,0,@united Delayed due to lack of crew and now de...,2015-02-24,11:39


In [33]:
#Creating new dataframes based on type of review 

UA_neutral = UA_grouped.get_group("neutral")
UA_neutral = UA_neutral.reset_index(drop=True)

UA_neutral

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567634106058821632,neutral,1.0000,UA,gwaki,0,@united even though technically after I land I...,2015-02-17,02:38
1,567686845903826947,neutral,0.6579,UA,AsianAdidasGirl,0,@united clicked on the link and got this? #con...,2015-02-17,06:07
2,567711860938772480,neutral,1.0000,UA,Jamie_Fisher886,0,@united how much does it cost to check in an a...,2015-02-17,07:47
3,567729246198837248,neutral,0.6702,UA,CWWMUK,0,@united Filled in the form you sent the link t...,2015-02-17,08:56
4,567730143536226304,neutral,0.6919,UA,napsareforkids,0,@united all day travel. #swag #ijustwanttosleep,2015-02-17,08:59
...,...,...,...,...,...,...,...,...,...
692,570298027103162368,neutral,0.6768,UA,nydia376,0,"@united no I don't, but I'm sure United have m...",2015-02-24,11:03
693,570299388670701568,neutral,0.6700,UA,LarkAfterDark,0,@united why not? Is it a law or a policy?,2015-02-24,11:09
694,570301670892146688,neutral,1.0000,UA,karenmcgregor86,0,@united flying gla-mco in a few weeks. How lon...,2015-02-24,11:18
695,570302375510056960,neutral,0.6761,UA,hmansfield,0,"@united I understand, but it's tough when ther...",2015-02-24,11:20


In [34]:
#creating separate dataframes for different airlines
US_df = data_tweets_grouped.get_group("US")
#Creating new dataframes based on type of review 

US_grouped = US_df.groupby("airline_sentiment")
US_positive = US_grouped.get_group("positive")
US_positive = US_positive.reset_index(drop=True)

US_positive

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567725387904720896,positive,0.6664,US,Forsyth_Factor,0,"@USAirways thanks for the reply, hoping everyt...",2015-02-17,08:40
1,567727159096385536,positive,0.6729,US,DAngel082,0,@USAirways thanks I hope I get to my destination,2015-02-17,08:47
2,567727479633477632,positive,1.0000,US,northerninsgr,0,@USAirways please give Tara G a pat on the bac...,2015-02-17,08:49
3,567729241312489472,positive,0.6939,US,imayfan,0,@USAirways thanks,2015-02-17,08:56
4,567730323988172800,positive,0.6718,US,ShiningLghtPE,0,@USAirways will do. Hoping for a voucher for a...,2015-02-17,09:00
...,...,...,...,...,...,...,...,...,...
264,570290421336813568,positive,1.0000,US,KenzieC_,0,@USAirways thank you!!,2015-02-24,10:33
265,570293289045364736,positive,1.0000,US,ToxicLayge,0,"@USAirways getting sorted, thanks",2015-02-24,10:44
266,570295658185408512,positive,0.6778,US,KenzieC_,0,@USAirways thank you! I will be calling you! #...,2015-02-24,10:54
267,570302023968694272,positive,0.6652,US,RickAdamek,0,@USAirways Well I did miss it. But gate agents...,2015-02-24,11:19


In [35]:
#Creating new dataframes based on type of review 

US_negative = US_grouped.get_group("negative")
US_negative = US_negative.reset_index(drop=True)

US_negative

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567670985403285504,negative,1.0000,US,sevnthstar,0,@USAirways @AmericanAir How r u supposed to ch...,2015-02-17,05:04
1,567679487383699456,negative,1.0000,US,DonnyYardas,0,@USAirways reservations had me on hold for 2 h...,2015-02-17,05:38
2,567698031081160704,negative,1.0000,US,MarkKersten,0,.@USAirways we have no choice but to pay anoth...,2015-02-17,06:52
3,567710245053407232,negative,1.0000,US,CharNewsJunkie,0,@USAirways I have been on hold with your Gold ...,2015-02-17,07:40
4,567712600772050945,negative,0.6716,US,sankeshw,0,@USAirways we are on the 2pm flight FLL to PHL...,2015-02-17,07:50
...,...,...,...,...,...,...,...,...,...
2258,570307109218340865,negative,0.7020,US,jeremyleewhite,0,@USAirways is not the new @AmericanAir is more...,2015-02-24,11:39
2259,570307605631012864,negative,1.0000,US,Matt_Bernanke,0,@USAirways you're killing me from the inside,2015-02-24,11:41
2260,570308799950692353,negative,1.0000,US,retardedlarry,0,@USAirways just hung up on me again. Another ...,2015-02-24,11:46
2261,570310144459972608,negative,1.0000,US,GAKotsch,0,@USAirways and when will one of these agents b...,2015-02-24,11:51


In [36]:
#Creating new dataframes based on type of review 

US_neutral = US_grouped.get_group("neutral")
US_neutral = US_neutral.reset_index(drop=True)

US_neutral

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567643252753694721,neutral,1.0000,US,ashenfaced,0,@USAirways how's us 1797 looking today?,2015-02-17,03:14
1,567724943375626240,neutral,0.6383,US,TullamoreEims,0,@USAirways You need to contact me ASAP. #Furious,2015-02-17,08:39
2,567728017196457987,neutral,0.6379,US,laura_crom,0,@USAirways on your website and on your boards ...,2015-02-17,08:51
3,567732372997963777,neutral,1.0000,US,JasonPlizga,0,@USAirways flight #3900 fro ORF to PHL.,2015-02-17,09:08
4,567735186881003520,neutral,1.0000,US,portugrad,0,@USAirways How can I change without penalty an...,2015-02-17,09:19
...,...,...,...,...,...,...,...,...,...
376,570304740279640064,neutral,0.6665,US,eec_x3,0,@USAirways shout out to Cathy at the Vegas air...,2015-02-24,11:30
377,570306867135696897,neutral,1.0000,US,AshleyKAtherton,0,@USAirways agree! Richard P. Literally ripped ...,2015-02-24,11:38
378,570308156699611137,neutral,1.0000,US,AshleyKAtherton,0,@USAirways never received such horrible servic...,2015-02-24,11:43
379,570309000279023616,neutral,1.0000,US,AshleyKAtherton,0,@USAirways Fortunately you have staff like Lyn...,2015-02-24,11:47


In [37]:
#creating separate dataframes for different airlines
WN_df = data_tweets_grouped.get_group("WN")
#Creating new dataframes based on type of review 

WN_grouped = WN_df.groupby("airline_sentiment")
WN_positive = WN_grouped.get_group("positive")
WN_positive = WN_positive.reset_index(drop=True)

WN_positive

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567655489119326209,positive,1.0000,WN,rjp1208,0,@SouthwestAir nice work on the update!,2015-02-17,04:03
1,567713338873118722,positive,0.6803,WN,DaxJeter,0,@SouthwestAir thanks do yall expect to be oper...,2015-02-17,07:53
2,567720839408533504,positive,1.0000,WN,RachFee,0,"@SouthwestAir Beautiful, thanks a ton!",2015-02-17,08:22
3,567722937545814018,positive,0.6834,WN,christooma,0,@SouthwestAir finally!,2015-02-17,08:31
4,567723096186945538,positive,1.0000,WN,spstpierre,0,@SouthwestAir + @twitter = outstanding custom...,2015-02-17,08:31
...,...,...,...,...,...,...,...,...,...
565,570293527982301184,positive,0.7004,WN,PiersonStone,0,"@SouthwestAir never mind, I moved my flight to...",2015-02-24,10:45
566,570294002077061120,positive,1.0000,WN,catjubs,0,@SouthwestAir thanks Southwest for saving our...,2015-02-24,10:47
567,570299642887442433,positive,0.6809,WN,tonybrancato,0,@SouthwestAir thx. Make it right. Help Meagan ...,2015-02-24,11:10
568,570302532955844608,positive,1.0000,WN,Shpressyourself,0,@SouthwestAir love them! Always get the best d...,2015-02-24,11:21


In [38]:
#Creating new dataframes based on type of review 

WN_negative = WN_grouped.get_group("negative")
WN_negative = WN_negative.reset_index(drop=True)

WN_negative

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567594449874587648,negative,1.0000,WN,VahidESQ,0,@SouthwestAir its cool that my bags take a bit...,2015-02-17,00:00
1,567617081336950784,negative,1.0000,WN,mrshossruns,0,@SouthwestAir you guys there? Are we on hour 2...,2015-02-17,01:30
2,567663504102940672,negative,1.0000,WN,followkashyap,0,@SouthwestAir We have been stuck in SJU for se...,2015-02-17,04:35
3,567676626855419904,negative,1.0000,WN,WorkingWify,0,@SouthwestAir won't answer their phones #Horri...,2015-02-17,05:27
4,567688411289755648,negative,1.0000,WN,kabell87,0,@SouthwestAir flight was Cancelled Flightled a...,2015-02-17,06:13
...,...,...,...,...,...,...,...,...,...
1181,570305078470557697,negative,1.0000,WN,Tim535353,0,@SouthwestAir still no update text #2053 &amp;...,2015-02-24,11:31
1182,570305647759265793,negative,1.0000,WN,cindyjwhitaker,0,@SouthwestAir Very frustrated for the loooooon...,2015-02-24,11:33
1183,570307615189835777,negative,1.0000,WN,cindyjwhitaker,0,@SouthwestAir Hello - been on hold for extreme...,2015-02-24,11:41
1184,570309145276125185,negative,0.6361,WN,tomcblock,0,@SouthwestAir although I'm not happy you Cance...,2015-02-24,11:47


In [39]:
#Creating new dataframes based on type of review 

WN_neutral = WN_grouped.get_group("neutral")
WN_neutral = WN_neutral.reset_index(drop=True)

WN_neutral

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567686758708817921,neutral,0.6890,WN,Tinygami,0,@SouthwestAir Is it a temporary site glitch or...,2015-02-17,06:07
1,567688325276770306,neutral,0.6957,WN,DubCook,0,"@SouthwestAir Guys, we've got to do something ...",2015-02-17,06:13
2,567717985092395008,neutral,0.9633,WN,RachFee,0,@southwestair - kind of early but any idea whe...,2015-02-17,08:11
3,567721738155622401,neutral,1.0000,WN,momof43s,0,“@SouthwestAir:Southwest mobile boarding passe...,2015-02-17,08:26
4,567721764285726720,neutral,1.0000,WN,matthewhyah,0,@SouthwestAir Can you link the article where i...,2015-02-17,08:26
...,...,...,...,...,...,...,...,...,...
659,570298362907508738,neutral,1.0000,WN,NardosKing2,0,@SouthwestAir I have been on hold for over 28 ...,2015-02-24,11:05
660,570299331707854848,neutral,0.6681,WN,MarsalaHolla,0,@SouthwestAir can you update me on the emergen...,2015-02-24,11:08
661,570302030742470658,neutral,0.6825,WN,HaleyOstrander,0,@SouthwestAir once again on Glassdoor's Best P...,2015-02-24,11:19
662,570302460235026433,neutral,1.0000,WN,brittanylinnes,0,@SouthwestAir can you follow me so I can send ...,2015-02-24,11:21


In [40]:
#creating separate dataframes for different airlines
DL_df = data_tweets_grouped.get_group("DL")
#Creating new dataframes based on type of review 

DL_grouped = DL_df.groupby("airline_sentiment")
DL_positive = DL_grouped.get_group("positive")
DL_positive = DL_positive.reset_index(drop=True)

DL_positive

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567671602280923136,positive,1.0000,DL,twinkletaters,0,@JetBlue Thanks! Her flight leaves at 2 but sh...,2015-02-17,05:07
1,567680108002291712,positive,0.6645,DL,TravellerLukose,0,@JetBlue No worries. Delay was minor and dealt...,2015-02-17,05:40
2,567724178317402112,positive,0.6429,DL,JetBlueNews,0,@JetBlue to offer service from Daytona Beach t...,2015-02-17,08:36
3,567727329783582720,positive,1.0000,DL,JessonaJourney,0,@JetBlue Thank you!,2015-02-17,08:48
4,567727526777073664,positive,1.0000,DL,gigirey1,0,@JetBlue thank you!! Miss you all so so much!!...,2015-02-17,08:49
...,...,...,...,...,...,...,...,...,...
539,570287704728084480,positive,1.0000,DL,MmmRubin,0,@JetBlue flight 117. proud to fly Jet Blue!,2015-02-24,10:22
540,570288803849637888,positive,0.6551,DL,Fendog75,0,@JetBlue thanks great recap. I wouldn't have b...,2015-02-24,10:27
541,570289518194126849,positive,1.0000,DL,MmmRubin,0,@JetBlue great.,2015-02-24,10:29
542,570294460476760064,positive,0.6994,DL,Sujecuevas19,0,@JetBlue thank you for the information.,2015-02-24,10:49


In [41]:
#Creating new dataframes based on type of review 

DL_negative = DL_grouped.get_group("negative")
DL_negative = DL_negative.reset_index(drop=True)

DL_negative

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567590027375702016,negative,1.0000,DL,nesi_1992,0,@JetBlue is REALLY getting on my nerves !! 😡😡 ...,2015-02-16,23:43
1,567695310860730369,negative,1.0000,DL,EBlaikie,0,@JetBlue sitting on the plane in JFK waiting t...,2015-02-17,06:41
2,567696188602712064,negative,1.0000,DL,elowthers,0,@JetBlue A flight delay due to pilots overslee...,2015-02-17,06:44
3,567719957686136832,negative,0.7201,DL,Darshan7Patel,0,@JetBlue please provide me your direct email f...,2015-02-17,08:19
4,567724950556258304,negative,1.0000,DL,JessonaJourney,0,@JetBlue Is today's JetBlue Flight 918 (NYC-&g...,2015-02-17,08:39
...,...,...,...,...,...,...,...,...,...
950,570297402281893888,negative,0.6913,DL,chuck_martin,0,@JetBlue ’s Marty St. George really has zero c...,2015-02-24,11:01
951,570299924555956227,negative,1.0000,DL,erinkphares,0,@JetBlue 2 aisles of empty #evermoreroom seats...,2015-02-24,11:11
952,570303683872886784,negative,1.0000,DL,heyheyman,0,"@JetBlue Hey guys, why did my last flight earn...",2015-02-24,11:26
953,570305363859406848,negative,1.0000,DL,Oneladyyouadore,0,@JetBlue everyone is here but our pilots are n...,2015-02-24,11:32


In [42]:
#Creating new dataframes based on type of review 

DL_neutral = DL_grouped.get_group("neutral")
DL_neutral = DL_neutral.reset_index(drop=True)

DL_neutral

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567588278875213824,neutral,1.0000,DL,JetBlueNews,0,@JetBlue's new CEO seeks the right balance to ...,2015-02-16,23:36
1,567667301067915264,neutral,1.0000,DL,BritishAirNews,0,"@JetBlue CEO weighs profits, flyers - @Chronic...",2015-02-17,04:50
2,567690417265975296,neutral,0.6739,DL,JulianBOGJB,0,@JetBlue really caring??,2015-02-17,06:21
3,567716378681933825,neutral,1.0000,DL,JetBlueNews,0,"@JetBlue CEO weighs profits, flyers - @Chronic...",2015-02-17,08:05
4,567718292530688000,neutral,1.0000,DL,Karimilrodz,0,@JetBlue I hear that the new thing in your pla...,2015-02-17,08:12
...,...,...,...,...,...,...,...,...,...
718,570296110633381889,neutral,1.0000,DL,johnfmartin67,0,"@JetBlue Submitted, hoping for quick decision,...",2015-02-24,10:56
719,570304873620746240,neutral,0.6939,DL,kbosspotter,0,@JetBlue check DM please :),2015-02-24,11:30
720,570305098557091840,neutral,0.6472,DL,culinarymindz,0,@JetBlue update on Flight 462 would be appreci...,2015-02-24,11:31
721,570308513181904901,neutral,1.0000,DL,Oneladyyouadore,0,"@JetBlue flight 1041 to Savannah, GA",2015-02-24,11:45


In [43]:
#creating separate dataframes for different airlines
VX_df = data_tweets_grouped.get_group("VX")
#Creating new dataframes based on type of review 

VX_grouped = VX_df.groupby("airline_sentiment")
VX_positive = VX_grouped.get_group("positive")
VX_positive = VX_positive.reset_index(drop=True)

VX_positive

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567727292739092480,positive,1.0000,VX,BBahreman,4,@VirginAmerica Flying LAX to SFO and after loo...,2015-02-17,08:48
1,567742578561650688,positive,1.0000,VX,ryan_kravontka,0,@VirginAmerica just got on the 1pm in Newark h...,2015-02-17,09:49
2,567753757702647810,positive,1.0000,VX,Perceptions,0,@VirginAmerica really wish you'd fly out of #F...,2015-02-17,10:33
3,567759012806029312,positive,1.0000,VX,kellitweets,0,@VirginAmerica thanks! Y'all have some of the ...,2015-02-17,10:54
4,567762071258152960,positive,1.0000,VX,onerockgypsy,0,@VirginAmerica you guys are perfect as always!...,2015-02-17,11:06
...,...,...,...,...,...,...,...,...,...
147,570289724453216256,positive,1.0000,VX,HyperCamiLax,0,@VirginAmerica I &lt;3 pretty graphics. so muc...,2015-02-24,10:30
148,570295459631263746,positive,1.0000,VX,YupitsTate,0,"@VirginAmerica it was amazing, and arrived an ...",2015-02-24,10:53
149,570299953286942721,positive,0.6559,VX,dhepburn,0,"@virginamerica Well, I didn't…but NOW I DO! :-D",2015-02-24,11:11
150,570300616901320704,positive,0.6745,VX,cjmcginnis,0,"@VirginAmerica yes, nearly every time I fly VX...",2015-02-24,11:13


In [44]:
#Creating new dataframes based on type of review 

VX_negative = VX_grouped.get_group("negative")
VX_negative = VX_negative.reset_index(drop=True)

VX_negative

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567744381432516608,negative,1.0000,VX,texasjuls,0,@VirginAmerica my group got their Cancelled Fl...,2015-02-17,09:56
1,567745903474540545,negative,0.3573,VX,jonovoss,0,@VirginAmerica my flight (6000) scheduled for ...,2015-02-17,10:02
2,567748973910163457,negative,0.6778,VX,tan_talize,0,@VirginAmerica mood lighting on point🙌 Reclini...,2015-02-17,10:14
3,567770107062284288,negative,0.6515,VX,paulhting,0,@virginamerica Trying to make the change in ad...,2015-02-17,11:38
4,567772685472915456,negative,0.6579,VX,cayocrazy,0,@VirginAmerica Umm so no reason as to why this...,2015-02-17,11:48
...,...,...,...,...,...,...,...,...,...
176,570276917301137409,negative,1.0000,VX,heatherovieda,0,@VirginAmerica I flew from NYC to SFO last we...,2015-02-24,09:39
177,570282469121007616,negative,0.6842,VX,smartwatermelon,0,@VirginAmerica SFO-PDX schedule is still MIA.,2015-02-24,10:01
178,570300767074181121,negative,1.0000,VX,jnardino,0,@VirginAmerica seriously would pay $30 a fligh...,2015-02-24,11:14
179,570300817074462722,negative,1.0000,VX,jnardino,0,@VirginAmerica and it's a really big bad thing...,2015-02-24,11:14


In [45]:
#Creating new dataframes based on type of review 

VX_neutral = VX_grouped.get_group("neutral")
VX_neutral = VX_neutral.reset_index(drop=True)

VX_neutral

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,airline,name,retweet_count,text,tweet_date,tweet_time
0,567726092518031360,neutral,1.0000,VX,PlayjtLV,0,“@VirginAmerica: Book out of town with fares f...,2015-02-17,08:43
1,567728227465310208,neutral,0.6376,VX,Todd1HHD,0,@VirginAmerica was wondering if you guys recie...,2015-02-17,08:52
2,567735941846937600,neutral,0.6694,VX,waytogopdx,0,@VirginAmerica still waiting to see @Starryey...,2015-02-17,09:22
3,567742148603170816,neutral,1.0000,VX,WWJAYD,0,@VirginAmerica morning. If I have a question r...,2015-02-17,09:47
4,567742937325260801,neutral,1.0000,VX,loungesong,0,@VirginAmerica Are there any sign up bonuses t...,2015-02-17,09:50
...,...,...,...,...,...,...,...,...,...
166,570258822297579520,neutral,1.0000,VX,rjlynch21086,0,@VirginAmerica will you be making BOS&gt;LAS n...,2015-02-24,08:27
167,570294189143031808,neutral,0.6769,VX,idk_but_youtube,0,@VirginAmerica did you know that suicide is th...,2015-02-24,10:48
168,570300248553349120,neutral,0.6340,VX,pilot,0,@VirginAmerica Really missed a prime opportuni...,2015-02-24,11:12
169,570301083672813571,neutral,0.6837,VX,yvonnalynn,0,@VirginAmerica I didn't today... Must mean I n...,2015-02-24,11:15


## Data Loading (Matt)

![Database_ERD.PNG](attachment:Database_ERD.PNG)

In [46]:
#Import Database Key
from db_keys import db_key

#Connect to PostgreSQL Database
pg_engine = create_engine('postgresql://' + db_key + '@localhost:5432/flights_tweets_db1')

In [47]:
#Create Airlines Table
pg_engine.execute('CREATE TABLE "Airlines" ("IATA_CODE" VARCHAR(2) PRIMARY KEY, "AIRLINE" VARCHAR(50) NOT NULL);')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd3360b8>

In [48]:
#Add Airlines Data Frame to Airlines Table
airline_df.to_sql("Airlines", pg_engine, if_exists = 'append', index = False)

In [49]:
#Create Airports Table
pg_engine.execute('CREATE TABLE "Airports" ("IATA_CODE" VARCHAR(3) PRIMARY KEY, "AIRPORT" VARCHAR(100) NOT NULL, "CITY" VARCHAR(50) NOT NULL, "STATE" VARCHAR(2) NOT NULL, "COUNTRY" VARCHAR(3) NOT NULL);')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfc82a470>

In [50]:
#Add Airports Data Frame to Airports Table
airports_df.to_sql("Airports", pg_engine, if_exists = 'append', index = False)

In [51]:
#Create UA Early Table
pg_engine.execute('CREATE TABLE "UA_Early" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd336e10>

In [52]:
#Add UA Early Data Frame to UA Early Table
UA_Early.to_sql("UA_Early", pg_engine, if_exists = 'append', index = False)

In [53]:
#Create AA Early Table
pg_engine.execute('CREATE TABLE "AA_Early" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd3477f0>

In [54]:
#Add AA Early Data Frame to AA Early Table
AA_Early.to_sql("AA_Early", pg_engine, if_exists = 'append', index = False)

In [55]:
#Create DL Early Table
pg_engine.execute('CREATE TABLE "DL_Early" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd36d390>

In [56]:
#Add DL Early Data Frame to DL Early Table
DL_Early.to_sql("DL_Early", pg_engine, if_exists = 'append', index = False)

In [57]:
#Create US Early Table
pg_engine.execute('CREATE TABLE "US_Early" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd35c3c8>

In [58]:
#Add US Early Data Frame to US Early Table
US_Early.to_sql("US_Early", pg_engine, if_exists = 'append', index = False)

In [59]:
#Create WN Early Table
pg_engine.execute('CREATE TABLE "WN_Early" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd35c748>

In [60]:
#Add WN Early Data Frame to WN Early Table
WN_Early.to_sql("WN_Early", pg_engine, if_exists = 'append', index = False)

In [61]:
#Create VX Early Table
pg_engine.execute('CREATE TABLE "VX_Early" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd36d048>

In [62]:
#Add VX Early Data Frame to VX Early Table
VX_Early.to_sql("VX_Early", pg_engine, if_exists = 'append', index = False)

In [63]:
#Create UA Late Table
pg_engine.execute('CREATE TABLE "UA_Late" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

<sqlalchemy.engine.result.ResultProxy at 0x1ecfd34e390>

In [64]:
#Add UA Late Data Frame to UA Late Table
UA_Late.to_sql("UA_Late", pg_engine, if_exists = 'append', index = False)

DataError: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(5)

[SQL: INSERT INTO "UA_Late" ("DAY_OF_WEEK", "AIRLINE", "FLIGHT_NUMBER", "TAIL_NUMBER", "ORIGIN_AIRPORT", "DESTINATION_AIRPORT", "SCHEDULED_DEPARTURE", "DEPARTURE_TIME", "DEPARTURE_DELAY", "TAXI_OUT", "WHEELS_OFF", "SCHEDULED_TIME", "ELAPSED_TIME", "AIR_TIME", "DISTANCE", "WHEELS_ON", "TAXI_IN", "SCHEDULED_ARRIVAL", "ARRIVAL_TIME", "ARRIVAL_DELAY", "TOTAL_DELAY", "DATE") VALUES (%(DAY_OF_WEEK)s, %(AIRLINE)s, %(FLIGHT_NUMBER)s, %(TAIL_NUMBER)s, %(ORIGIN_AIRPORT)s, %(DESTINATION_AIRPORT)s, %(SCHEDULED_DEPARTURE)s, %(DEPARTURE_TIME)s, %(DEPARTURE_DELAY)s, %(TAXI_OUT)s, %(WHEELS_OFF)s, %(SCHEDULED_TIME)s, %(ELAPSED_TIME)s, %(AIR_TIME)s, %(DISTANCE)s, %(WHEELS_ON)s, %(TAXI_IN)s, %(SCHEDULED_ARRIVAL)s, %(ARRIVAL_TIME)s, %(ARRIVAL_DELAY)s, %(TOTAL_DELAY)s, %(DATE)s)]
[parameters: ({'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1512, 'TAIL_NUMBER': 'N33292', 'ORIGIN_AIRPORT': 'LAS', 'DESTINATION_AIRPORT': 'IAH', 'SCHEDULED_DEPARTURE': '00:38', 'DEPARTURE_TIME': '00:42', 'DEPARTURE_DELAY': '0:04', 'TAXI_OUT': '0:17', 'WHEELS_OFF': '00:59', 'SCHEDULED_TIME': '2:45', 'ELAPSED_TIME': '2:40', 'AIR_TIME': '2:18', 'DISTANCE': 1222, 'WHEELS_ON': '05:17', 'TAXI_IN': '0:05', 'SCHEDULED_ARRIVAL': '05:23', 'ARRIVAL_TIME': '05:22', 'ARRIVAL_DELAY': '-0:01', 'TOTAL_DELAY': '0:03', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1204, 'TAIL_NUMBER': 'N37273', 'ORIGIN_AIRPORT': 'SJU', 'DESTINATION_AIRPORT': 'EWR', 'SCHEDULED_DEPARTURE': '01:48', 'DEPARTURE_TIME': '03:11', 'DEPARTURE_DELAY': '1:23', 'TAXI_OUT': '0:13', 'WHEELS_OFF': '03:24', 'SCHEDULED_TIME': '4:15', 'ELAPSED_TIME': '4:16', 'AIR_TIME': '3:51', 'DISTANCE': 1608, 'WHEELS_ON': '06:15', 'TAXI_IN': '0:12', 'SCHEDULED_ARRIVAL': '05:03', 'ARRIVAL_TIME': '06:27', 'ARRIVAL_DELAY': '1:24', 'TOTAL_DELAY': '2:47', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1447, 'TAIL_NUMBER': 'N27733', 'ORIGIN_AIRPORT': 'PHL', 'DESTINATION_AIRPORT': 'IAH', 'SCHEDULED_DEPARTURE': '05:15', 'DEPARTURE_TIME': '05:19', 'DEPARTURE_DELAY': '0:04', 'TAXI_OUT': '0:09', 'WHEELS_OFF': '05:28', 'SCHEDULED_TIME': '3:53', 'ELAPSED_TIME': '3:47', 'AIR_TIME': '3:31', 'DISTANCE': 1325, 'WHEELS_ON': '07:59', 'TAXI_IN': '0:07', 'SCHEDULED_ARRIVAL': '08:08', 'ARRIVAL_TIME': '08:06', 'ARRIVAL_DELAY': '-0:02', 'TOTAL_DELAY': '0:02', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1545, 'TAIL_NUMBER': 'N17233', 'ORIGIN_AIRPORT': 'DFW', 'DESTINATION_AIRPORT': 'IAH', 'SCHEDULED_DEPARTURE': '05:30', 'DEPARTURE_TIME': '05:36', 'DEPARTURE_DELAY': '0:06', 'TAXI_OUT': '0:13', 'WHEELS_OFF': '05:49', 'SCHEDULED_TIME': '1:10', 'ELAPSED_TIME': '1:04', 'AIR_TIME': '0:47', 'DISTANCE': 224, 'WHEELS_ON': '06:36', 'TAXI_IN': '0:04', 'SCHEDULED_ARRIVAL': '06:40', 'ARRIVAL_TIME': '06:40', 'ARRIVAL_DELAY': '0:00', 'TOTAL_DELAY': '0:06', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 572, 'TAIL_NUMBER': 'N842UA', 'ORIGIN_AIRPORT': 'AUS', 'DESTINATION_AIRPORT': 'IAH', 'SCHEDULED_DEPARTURE': '05:30', 'DEPARTURE_TIME': '05:34', 'DEPARTURE_DELAY': '0:04', 'TAXI_OUT': '0:11', 'WHEELS_OFF': '05:45', 'SCHEDULED_TIME': '0:56', 'ELAPSED_TIME': '0:51', 'AIR_TIME': '0:34', 'DISTANCE': 140, 'WHEELS_ON': '06:19', 'TAXI_IN': '0:06', 'SCHEDULED_ARRIVAL': '06:26', 'ARRIVAL_TIME': '06:25', 'ARRIVAL_DELAY': '-0:01', 'TOTAL_DELAY': '0:03', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1215, 'TAIL_NUMBER': 'N69830', 'ORIGIN_AIRPORT': 'PDX', 'DESTINATION_AIRPORT': 'DEN', 'SCHEDULED_DEPARTURE': '05:35', 'DEPARTURE_TIME': '05:45', 'DEPARTURE_DELAY': '0:10', 'TAXI_OUT': '0:07', 'WHEELS_OFF': '05:52', 'SCHEDULED_TIME': '2:35', 'ELAPSED_TIME': '2:30', 'AIR_TIME': '2:11', 'DISTANCE': 991, 'WHEELS_ON': '09:03', 'TAXI_IN': '0:12', 'SCHEDULED_ARRIVAL': '09:10', 'ARRIVAL_TIME': '09:15', 'ARRIVAL_DELAY': '0:05', 'TOTAL_DELAY': '0:15', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1203, 'TAIL_NUMBER': 'N33209', 'ORIGIN_AIRPORT': 'BOS', 'DESTINATION_AIRPORT': 'IAD', 'SCHEDULED_DEPARTURE': '05:40', 'DEPARTURE_TIME': '06:00', 'DEPARTURE_DELAY': '0:20', 'TAXI_OUT': '0:14', 'WHEELS_OFF': '06:14', 'SCHEDULED_TIME': '1:46', 'ELAPSED_TIME': '1:39', 'AIR_TIME': '1:20', 'DISTANCE': 413, 'WHEELS_ON': '07:34', 'TAXI_IN': '0:05', 'SCHEDULED_ARRIVAL': '07:26', 'ARRIVAL_TIME': '07:39', 'ARRIVAL_DELAY': '0:13', 'TOTAL_DELAY': '0:33', 'DATE': '2015-02-16'}, {'DAY_OF_WEEK': 'Monday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1543, 'TAIL_NUMBER': 'N54241', 'ORIGIN_AIRPORT': 'SMF', 'DESTINATION_AIRPORT': 'DEN', 'SCHEDULED_DEPARTURE': '05:41', 'DEPARTURE_TIME': '05:51', 'DEPARTURE_DELAY': '0:10', 'TAXI_OUT': '0:11', 'WHEELS_OFF': '06:02', 'SCHEDULED_TIME': '2:22', 'ELAPSED_TIME': '2:31', 'AIR_TIME': '2:07', 'DISTANCE': 909, 'WHEELS_ON': '09:09', 'TAXI_IN': '0:13', 'SCHEDULED_ARRIVAL': '09:03', 'ARRIVAL_TIME': '09:22', 'ARRIVAL_DELAY': '0:19', 'TOTAL_DELAY': '0:29', 'DATE': '2015-02-16'}  ... displaying 10 of 6330 total bound parameter sets ...  {'DAY_OF_WEEK': 'Tuesday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 383, 'TAIL_NUMBER': 'N212UA', 'ORIGIN_AIRPORT': 'HNL', 'DESTINATION_AIRPORT': 'DEN', 'SCHEDULED_DEPARTURE': '22:55', 'DEPARTURE_TIME': '23:00', 'DEPARTURE_DELAY': '0:05', 'TAXI_OUT': '0:20', 'WHEELS_OFF': '23:20', 'SCHEDULED_TIME': '6:35', 'ELAPSED_TIME': '6:36', 'AIR_TIME': '6:10', 'DISTANCE': 3365, 'WHEELS_ON': '08:30', 'TAXI_IN': '0:06', 'SCHEDULED_ARRIVAL': '08:30', 'ARRIVAL_TIME': '08:36', 'ARRIVAL_DELAY': '0:06', 'TOTAL_DELAY': '0:11', 'DATE': '2015-02-24'}, {'DAY_OF_WEEK': 'Tuesday', 'AIRLINE': 'UA', 'FLIGHT_NUMBER': 1720, 'TAIL_NUMBER': 'N76519', 'ORIGIN_AIRPORT': 'PHX', 'DESTINATION_AIRPORT': 'EWR', 'SCHEDULED_DEPARTURE': '23:50', 'DEPARTURE_TIME': '00:12', 'DEPARTURE_DELAY': '0:22', 'TAXI_OUT': '0:16', 'WHEELS_OFF': '00:28', 'SCHEDULED_TIME': '4:39', 'ELAPSED_TIME': '4:19', 'AIR_TIME': '3:51', 'DISTANCE': 2133, 'WHEELS_ON': '06:19', 'TAXI_IN': '0:12', 'SCHEDULED_ARRIVAL': '06:29', 'ARRIVAL_TIME': '06:31', 'ARRIVAL_DELAY': '0:02', 'TOTAL_DELAY': '0:24', 'DATE': '2015-02-24'})]
(Background on this error at: http://sqlalche.me/e/9h9h)

In [None]:
#Create AA Late Table
pg_engine.execute('CREATE TABLE "AA_Late" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add AA Late Data Frame to AA Late Table
AA_Late.to_sql("AA_Late", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create DL Late Table
pg_engine.execute('CREATE TABLE "DL_Late" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add DL Late Data Frame to DL Late Table
DL_Late.to_sql("DL_Late", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create US Late Table
pg_engine.execute('CREATE TABLE "US_Late" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add US Late Data Frame to US Late Table
US_Late.to_sql("US_Late", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create WN Late Table
pg_engine.execute('CREATE TABLE "WN_Late" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add WN Late Data Frame to WN Late Table
WN_Late.to_sql("WN_Late", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create VX Late Table
pg_engine.execute('CREATE TABLE "VX_Late" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "ELAPSED_TIME" VARCHAR(8), "AIR_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "ARRIVAL_DELAY" VARCHAR(8), "TOTAL_DELAY" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add VX Late Data Frame to VX Late Table
VX_Late.to_sql("VX_Late", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create UA Diverted Table
pg_engine.execute('CREATE TABLE "UA_Diverted" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add UA Diverted Data Frame to UA Diverted Table
UA_Diverted.to_sql("UA_Diverted", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create AA Diverted Table
pg_engine.execute('CREATE TABLE "AA_Diverted" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add AA Diverted Data Frame to AA Diverted Table
AA_Diverted.to_sql("AA_Diverted", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create DL Diverted Table
pg_engine.execute('CREATE TABLE "DL_Diverted" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add DL Diverted Data Frame to DL Diverted Table
DL_Diverted.to_sql("DL_Diverted", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create US Diverted Table
pg_engine.execute('CREATE TABLE "US_Diverted" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add US Diverted Data Frame to US Diverted Table
US_Diverted.to_sql("US_Diverted", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create WN Diverted Table
pg_engine.execute('CREATE TABLE "WN_Diverted" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add WN Diverted Data Frame to WN Diverted Table
WN_Diverted.to_sql("WN_Diverted", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create VX Diverted Table
pg_engine.execute('CREATE TABLE "VX_Diverted" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "DEPARTURE_TIME" VARCHAR(8), "DEPARTURE_DELAY" VARCHAR(8), "TAXI_OUT" VARCHAR(8), "WHEELS_OFF" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "WHEELS_ON" VARCHAR(8), "TAXI_IN" VARCHAR(8), "SCHEDULED_ARRIVAL" VARCHAR(8), "ARRIVAL_TIME" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add VX Diverted Data Frame to VX Diverted Table
VX_Diverted.to_sql("VX_Diverted", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create UA Cancelled Table
pg_engine.execute('CREATE TABLE "UA_Cancelled" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "SCHEDULED_ARRIVAL" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add UA Cancelled Data Frame to UA Cancelled Table
UA_Cancelled.to_sql("UA_Cancelled", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create AA Cancelled Table
pg_engine.execute('CREATE TABLE "AA_Cancelled" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "SCHEDULED_ARRIVAL" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add AA Cancelled Data Frame to AA Cancelled Table
AA_Cancelled.to_sql("AA_Cancelled", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create DL Cancelled Table
pg_engine.execute('CREATE TABLE "DL_Cancelled" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "SCHEDULED_ARRIVAL" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add DL Cancelled Data Frame to DL Cancelled Table
DL_Cancelled.to_sql("DL_Cancelled", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create US Cancelled Table
pg_engine.execute('CREATE TABLE "US_Cancelled" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "SCHEDULED_ARRIVAL" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add US Cancelled Data Frame to US Cancelled Table
US_Cancelled.to_sql("US_Cancelled", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create WN Cancelled Table
pg_engine.execute('CREATE TABLE "WN_Cancelled" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "SCHEDULED_ARRIVAL" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add WN Cancelled Data Frame to WN Cancelled Table
WN_Cancelled.to_sql("WN_Cancelled", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create VX Cancelled Table
pg_engine.execute('CREATE TABLE "VX_Cancelled" ("DAY_OF_WEEK" VARCHAR(10), "AIRLINE" VARCHAR(2), "FLIGHT_NUMBER" INT, "TAIL_NUMBER" VARCHAR(6), "ORIGIN_AIRPORT" VARCHAR(3), "DESTINATION_AIRPORT" VARCHAR(3), "SCHEDULED_DEPARTURE" VARCHAR(8), "SCHEDULED_TIME" VARCHAR(8), "DISTANCE" INT, "SCHEDULED_ARRIVAL" VARCHAR(8), "DATE" VARCHAR(10), FOREIGN KEY ("ORIGIN_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("DESTINATION_AIRPORT") REFERENCES "Airports" ("IATA_CODE"), FOREIGN KEY ("AIRLINE") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add VX Cancelled Data Frame to VX Cancelled Table
VX_Cancelled.to_sql("VX_Cancelled", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create UA Positive Table
pg_engine.execute('CREATE TABLE "UA_Positive" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add UA Positive Data Frame to UA Positive Table
UA_positive.to_sql("UA_Positive", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create AA Positive Table
pg_engine.execute('CREATE TABLE "AA_Positive" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add AA Positive Data Frame to AA Positive Table
AA_positive.to_sql("AA_Positive", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create DL Positive Table
pg_engine.execute('CREATE TABLE "DL_Positive" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add DL Positive Data Frame to DL Positive Table
DL_positive.to_sql("DL_Positive", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create US Positive Table
pg_engine.execute('CREATE TABLE "US_Positive" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add US Positive Data Frame to US Positive Table
US_positive.to_sql("US_Positive", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create WN Positive Table
pg_engine.execute('CREATE TABLE "WN_Positive" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add WN Positive Data Frame to WN Positive Table
WN_positive.to_sql("WN_Positive", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create VX Positive Table
pg_engine.execute('CREATE TABLE "VX_Positive" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add VX Positive Data Frame to VX Positive Table
VX_positive.to_sql("VX_Positive", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create UA Neutral Table
pg_engine.execute('CREATE TABLE "UA_Neutral" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add UA Neutral Data Frame to UA Neutral Table
UA_neutral.to_sql("UA_Neutral", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create AA Neutral Table
pg_engine.execute('CREATE TABLE "AA_Neutral" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add AA Neutral Data Frame to AA Neutral Table
AA_neutral.to_sql("AA_Neutral", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create DL Neutral Table
pg_engine.execute('CREATE TABLE "DL_Neutral" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add DL Neutral Data Frame to DL Neutral Table
DL_neutral.to_sql("DL_Neutral", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create US Neutral Table
pg_engine.execute('CREATE TABLE "US_Neutral" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add US Neutral Data Frame to US Neutral Table
US_neutral.to_sql("US_Neutral", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create WN Neutral Table
pg_engine.execute('CREATE TABLE "WN_Neutral" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add WN Neutral Data Frame to WN Neutral Table
WN_neutral.to_sql("WN_Neutral", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create VX Neutral Table
pg_engine.execute('CREATE TABLE "VX_Neutral" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add VX Neutral Data Frame to VX Neutral Table
VX_neutral.to_sql("VX_Neutral", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create UA Negative Table
pg_engine.execute('CREATE TABLE "UA_Negative" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add UA Negative Data Frame to UA Negative Table
UA_negative.to_sql("UA_Negative", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create AA Negative Table
pg_engine.execute('CREATE TABLE "AA_Negative" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add AA Neutral Data Frame to AA Neutral Table
AA_negative.to_sql("AA_Negative", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create DL Negative Table
pg_engine.execute('CREATE TABLE "DL_Negative" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add DL Neutral Data Frame to DL Neutral Table
DL_negative.to_sql("DL_Negative", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create US Negative Table
pg_engine.execute('CREATE TABLE "US_Negative" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add US Neutral Data Frame to US Neutral Table
US_negative.to_sql("US_Negative", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create WN Negative Table
pg_engine.execute('CREATE TABLE "WN_Negative" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add WN Neutral Data Frame to WN Neutral Table
WN_negative.to_sql("WN_Negative", pg_engine, if_exists = 'append', index = False)

In [None]:
#Create VX Negative Table
pg_engine.execute('CREATE TABLE "VX_Negative" ("tweet_id" VARCHAR(100) PRIMARY KEY, "airline_sentiment" VARCHAR(10), "airline_sentiment_confidence" FLOAT, "airline" VARCHAR(2), "name" VARCHAR(50), "retweet_count" INT, "text" VARCHAR(300), "tweet_date" VARCHAR(10), "tweet_time" VARCHAR(5), FOREIGN KEY ("airline") REFERENCES "Airlines" ("IATA_CODE"));')

In [None]:
#Add VX Neutral Data Frame to VX Neutral Table
VX_negative.to_sql("VX_Negative", pg_engine, if_exists = 'append', index = False)

# Project Analysis (Matt)

The purpose of this project was to perform an Extract-Transform-Load (ETL) process on Feruary 2015 airline flight performance and customer tweets, based on datasets obtained from Kaggle. The SQLAlchemy and Pandas modules in Python were used to import, clean, and process the data, while the former was used to output the transformed data into an SQL database. Discussed below are the specific actions taken to perform each step of the ETL process.

## Data Extraction

The extraction phase of the ETL process included importing and inspecting the data, with the latter including the identification all issues and inconsistencies with the source data. To begin, there were four files that needed to be imported, which included three CSV files and one SQLite database file. A Pandas data frame was created for each of the former, which contained flight, airline, and airport information respectively. Before importing the latter, a connection was established through the SQLite engine to the database file and the table name was exported. The tweet information table was then extracted from the database directly to a Pandas data frame for inspection.

With the import process complete, the next step was to thoroughly inspect each of the data frames. Upon the completion of this process, several issues and inconsistencies were found. First, the date range of the customer tweet data was narrower than the date range of the flight performance data. Second, several columns in the flight information and tweet information data frames were missing large amounts of data. Third, the date and time values were not consistent between the flight and tweet information data frames. Fourth, the tweet information data frame contained fewer airlines than the flight information data frame. Finally, the latitude and longitude information for certain airports was incomplete in the airport information data frame.

## Data Transformation

Having established the issues with the imported data, the next phase of the ETL process was to clean and transform the data into a format able to be loaded into a database. This was a multi-step process for each data frame, with some steps being more complex than others. The flight information data frame was filtered into individual data frames for early, late, diverted, and cancelled flights. Each of these data frames was then cleaned by dropping columns missing large amounts of or with irrelevant data, followed by removing rows with missing values. Next, the transformation process began by creating a column of calculated total delay (early and late dataframe only), and replacing the year, month, and day columns with a single date column. The numeric values representing the day of the week for each date were also replaced with the actual names of each day. Since it was noticed in the inspection of the flight information data frame that the times were in HHMM numeric format, the datetime module was used to convert those integer values to actual 24H HH:MM strings in the appropriate columns. Finally, each of the four data frames was split into groups based on airline and each of the groups was saved as a unique data frame. These many new data frames represented individual database tables for early, late, cancelled, and diverted flights for each airline.

In contrast to the lengthy process for the flight information data frame, the airline and airport information data frames didn't require much work. The former only needed to be filtered by airline, while the latter only needed two columns removed. Moving on to the tweet information data frame, 

## Data Loading

TBD