## How to use Itinerary Builder

Itinerary Builder is a module used to query a database of possible itineraries, and return a dataframe with information on each. The critical outputs include the origins and destinations of each two flight itinerary, their times, airlines, and durations. Included as well is the next best second leg flight for each if the connecting flight is missed.

A machine learning algorithm is used to weight the time cost of a missed connection based on the liklihood of the missed connnection occuring.

Users of the module must provide a connection time assumption. This is the minimum time between connecting flights which the user finds acceptable (e.g., allow 45 minutes minimum for connection).

Examples of each function's usage is below.

In [1]:
from itineraryBuilder import *

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Itinerary Builder main function call

With default options, the resulting dataframe is ordered by risk as shown below. Note that "arr_no_later_date" is restricting the results such that itineraries which arrive at their destination the following day are ommitted. This occurs if this parameter equals "flight_date".

In [3]:
origin = 'ATL'
destination = 'SEA'
flight_date = '12/3/2019'

dne = '700'
dnl = '1000'
ane = '1'
anl = '2000'
tc = 45
arr_no_later_date = '12/3/2019'

df = itineraryBuilder('faa_2019_12', origin, destination, flight_date, arr_no_later_date, tc, dne, dnl, ane, anl); df.head(10)

Unnamed: 0,FIRST_LEG_AIRLINE,FIRST_LEG_ORIG,FIRST_LEG_ORIG_CITY,FIRST_LEG_DEST,FIRST_LEG_DEST_CITY,FIRST_LEG_DATE,FIRST_LEG_DEP_TIME,FIRST_LEG_ARR_TIME,SECOND_LEG_AIRLINE,SECOND_LEG_ORIG,SECOND_LEG_ORIG_CITY,SECOND_LEG_DEST,SECOND_LEG_DEST_CITY,SECOND_LEG_DATE,SECOND_LEG_DEP_TIME,SECOND_LEG_ARR_TIME,NEXT_BEST_SECOND_LEG_DATE,NEXT_BEST_SECOND_LEG_DEP_TIME,NEXT_BEST_SECOND_LEG_ARR_TIME,FIRST_LEG_ORIG_TZ,FIRST_LEG_DEST_TZ,SECOND_LEG_ORIG_TZ,SECOND_LEG_DEST_TZ,FIRST_LEG_DEP_TIMESTAMP,FIRST_LEG_ARR_TIMESTAMP,SECOND_LEG_DEP_TIMESTAMP,SECOND_LEG_ARR_TIMESTAMP,FIRST_LEG_PRED15,FIRST_LEG_PRED30,FIRST_LEG_PRED45,FIRST_LEG_PRED60,FIRST_LEG_PRED75,FIRST_LEG_PRED90,FIRST_LEG_PRED105,FIRST_LEG_PRED120NEXT_BEST_SECOND_LEG_DEP_TIMESTAMP,NEXT_BEST_SECOND_LEG_ARR_TIMESTAMP,overnight_bool_1,overnight_bool_2,overnight_bool_3,FIRST_FLIGHT_DURATION,SECOND_FLIGHT_DURATION,CONNECT_TIME,TRIP_TIME,RISK_MISSED_CONNECTION,NEXT_FLIGHT_TIMELOSS,TOTAL_RISK


In this example "arr_no_later" is set to the following day, so more results are presented.

In [4]:
origin = 'SAN'
destination = 'ANC'
flight_date = '12/3/2019'
arr_no_later_date = '12/4/2019'

dne = '1'
dnl = '2359'
ane = '1'
anl = '2359'
tc = 45

df = itineraryBuilder('faa_2019_12', origin, destination, flight_date, arr_no_later_date, tc, dne, dnl, ane, anl); df.head(20)

Unnamed: 0,FIRST_LEG_AIRLINE,FIRST_LEG_ORIG,FIRST_LEG_ORIG_CITY,FIRST_LEG_DEST,FIRST_LEG_DEST_CITY,FIRST_LEG_DATE,FIRST_LEG_DEP_TIME,FIRST_LEG_ARR_TIME,SECOND_LEG_AIRLINE,SECOND_LEG_ORIG,SECOND_LEG_ORIG_CITY,SECOND_LEG_DEST,SECOND_LEG_DEST_CITY,SECOND_LEG_DATE,SECOND_LEG_DEP_TIME,SECOND_LEG_ARR_TIME,NEXT_BEST_SECOND_LEG_DATE,NEXT_BEST_SECOND_LEG_DEP_TIME,NEXT_BEST_SECOND_LEG_ARR_TIME,FIRST_LEG_ORIG_TZ,FIRST_LEG_DEST_TZ,SECOND_LEG_ORIG_TZ,SECOND_LEG_DEST_TZ,FIRST_LEG_DEP_TIMESTAMP,FIRST_LEG_ARR_TIMESTAMP,SECOND_LEG_DEP_TIMESTAMP,SECOND_LEG_ARR_TIMESTAMP,FIRST_LEG_PRED15,FIRST_LEG_PRED30,FIRST_LEG_PRED45,FIRST_LEG_PRED60,FIRST_LEG_PRED75,FIRST_LEG_PRED90,FIRST_LEG_PRED105,FIRST_LEG_PRED120NEXT_BEST_SECOND_LEG_DEP_TIMESTAMP,NEXT_BEST_SECOND_LEG_ARR_TIMESTAMP,overnight_bool_1,overnight_bool_2,overnight_bool_3,FIRST_FLIGHT_DURATION,SECOND_FLIGHT_DURATION,CONNECT_TIME,TRIP_TIME,RISK_MISSED_CONNECTION,NEXT_FLIGHT_TIMELOSS,TOTAL_RISK


You can also order by trip duration as shown. Other options include to sort by "earliest_arrival" and "min_connection_time".

In [5]:
df = itineraryBuilder('faa_2019_12', origin, destination, flight_date, arr_no_later_date, tc, dne, dnl, ane, anl, orderby='duration'); df.head(10)

Unnamed: 0,FIRST_LEG_AIRLINE,FIRST_LEG_ORIG,FIRST_LEG_ORIG_CITY,FIRST_LEG_DEST,FIRST_LEG_DEST_CITY,FIRST_LEG_DATE,FIRST_LEG_DEP_TIME,FIRST_LEG_ARR_TIME,SECOND_LEG_AIRLINE,SECOND_LEG_ORIG,SECOND_LEG_ORIG_CITY,SECOND_LEG_DEST,SECOND_LEG_DEST_CITY,SECOND_LEG_DATE,SECOND_LEG_DEP_TIME,SECOND_LEG_ARR_TIME,NEXT_BEST_SECOND_LEG_DATE,NEXT_BEST_SECOND_LEG_DEP_TIME,NEXT_BEST_SECOND_LEG_ARR_TIME,FIRST_LEG_ORIG_TZ,FIRST_LEG_DEST_TZ,SECOND_LEG_ORIG_TZ,SECOND_LEG_DEST_TZ,FIRST_LEG_DEP_TIMESTAMP,FIRST_LEG_ARR_TIMESTAMP,SECOND_LEG_DEP_TIMESTAMP,SECOND_LEG_ARR_TIMESTAMP,FIRST_LEG_PRED15,FIRST_LEG_PRED30,FIRST_LEG_PRED45,FIRST_LEG_PRED60,FIRST_LEG_PRED75,FIRST_LEG_PRED90,FIRST_LEG_PRED105,FIRST_LEG_PRED120NEXT_BEST_SECOND_LEG_DEP_TIMESTAMP,NEXT_BEST_SECOND_LEG_ARR_TIMESTAMP,overnight_bool_1,overnight_bool_2,overnight_bool_3,FIRST_FLIGHT_DURATION,SECOND_FLIGHT_DURATION,CONNECT_TIME,TRIP_TIME,RISK_MISSED_CONNECTION,NEXT_FLIGHT_TIMELOSS,TOTAL_RISK


### Query Flights Function Call

Query flights includes many of the same parameters as Itinerary Builder and is in fact called by Itinerary Builder. This returns the initial query with the dates and times as string values.

In [8]:
origin = 'OMA'
destination = 'ABQ'
flight_date = '12/1/2019'

dne = '1'
dnl = '2359'
ane = '1207'
anl = '534'
ane_date = '12/1/2019'
anl_date = '12/2/2019'

df = queryFlights('faa_2019_12', origin, destination, flight_date, ane_date, anl_date, dne, dnl, ane, anl); df

Unnamed: 0,FIRST_LEG_AIRLINE,FIRST_LEG_ORIG,FIRST_LEG_ORIG_CITY,FIRST_LEG_DEST,FIRST_LEG_DEST_CITY,FIRST_LEG_DATE,FIRST_LEG_DEP_TIME,FIRST_LEG_ARR_TIME,FIRST_LEG_PRED15,FIRST_LEG_PRED30,FIRST_LEG_PRED45,FIRST_LEG_PRED60,FIRST_LEG_PRED75,FIRST_LEG_PRED90,FIRST_LEG_PRED105,FIRST_LEG_PRED120,SECOND_LEG_AIRLINE,SECOND_LEG_ORIG,SECOND_LEG_ORIG_CITY,SECOND_LEG_DEST,SECOND_LEG_DEST_CITY,SECOND_LEG_DATE,SECOND_LEG_DEP_TIME,SECOND_LEG_ARR_TIME,NEXT_BEST_SECOND_LEG_DATE,NEXT_BEST_SECOND_LEG_DEP_TIME,NEXT_BEST_SECOND_LEG_ARR_TIME
0,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,1835,2144,0.409947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,1411,1530,12/1/2019,2028,2152
1,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,1031,1344,0.136991,0.03245,0.007325,0.015796,0.0,0.018902,0.017825,0.008911,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,1411,1530,12/1/2019,2028,2152
2,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,548,905,0.468975,0.487372,0.55621,0.582898,0.496602,0.204529,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,1411,1530,12/1/2019,2028,2152
3,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,1310,1620,0.708821,0.774287,0.507276,0.019376,0.0,0.0,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,1411,1530,12/1/2019,2028,2152
4,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,700,1021,0.032795,0.05,0.029705,0.028336,0.03,0.027666,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,1411,1530,12/1/2019,2028,2152
5,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,1835,2144,0.409947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,2028,2152,12/2/2019,954,1120
6,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,1031,1344,0.136991,0.03245,0.007325,0.015796,0.0,0.018902,0.017825,0.008911,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,2028,2152,12/2/2019,954,1120
7,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,548,905,0.468975,0.487372,0.55621,0.582898,0.496602,0.204529,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,2028,2152,12/2/2019,954,1120
8,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,1310,1620,0.708821,0.774287,0.507276,0.019376,0.0,0.0,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,2028,2152,12/2/2019,954,1120
9,DL,OMA,"Omaha, NE",ATL,"Atlanta, GA",12/1/2019,700,1021,0.032795,0.05,0.029705,0.028336,0.03,0.027666,0.0,0.0,DL,ATL,"Atlanta, GA",ABQ,"Albuquerque, NM",12/1/2019,2028,2152,12/2/2019,954,1120


### getValidDestinations function call

Use this function to get a list of locations which can be reached with exactly two flights from the origin city, within a two day period.

In [7]:
df2 = getValidDestinations('faa_2019_12', origin, flight_date); df2

Unnamed: 0,AIRPORT,CITY
0,ABR,"Aberdeen, SD"
1,BQN,"Aguadilla, PR"
2,CAK,"Akron, OH"
3,ABY,"Albany, GA"
4,ALB,"Albany, NY"
5,ABQ,"Albuquerque, NM"
6,AEX,"Alexandria, LA"
7,ABE,"Allentown/Bethlehem/Easton, PA"
8,APN,"Alpena, MI"
9,AMA,"Amarillo, TX"
