**Important note**! Before you turn in this lab notebook, make sure everything runs as expected:

- First, restart the kernel -- in the menubar, select Kernel → Restart.
- Then run all cells -- in the menubar, select Cell → Run All.

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE."

# Air and Ground Travel Times

In this problem, you analyze travel times between cities in the United States. You will use several datasets, including a subset of data from a [Kaggle dataset](https://www.kaggle.com/giovamata/airlinedelaycauses/data).

The goals of this problem are to find the average flight time between two cities and to find the roundtrip time between two cities using ground travel.

The first set of code cells sets up the problem and loads the Kaggle dataset into a Pandas dataframe (`flighttimes`). The `flighttimes` table contains the flight departure and arrival times of over 1 million flights. There are four columns in the table: **DepTime**, which is the departure time of the flight (in Army time); **ArrTime**, which is the arrival time of the flight (in Army time); **Origin**, which is the three-letter airport code for the origin (or departure) airport; and **Dest**, which is the three-letter airport code for the destination airport.  

In [1]:
# Load flighttimes dataset
from cse6040utils import download_all, canonicalize_tibble, tibbles_are_equivalent
import pandas as pd
import numpy as np

datasets = {'FlightInfo.csv': '64ac75c61dc09a3a7bb2a856a27f9584',
            'airports.csv': '07349facc5ac5e73a34f084f1a261148',
            'city_average_times_soln.csv': 'fccce0d257ba51d9518469e67963696b',
            'city_ids.csv': 'b78508ea9768a41fc2bcfa3f10056a6d',
            'city_travel_times_soln.csv': '104213b86e5082c22176d3b811cbf094',
            'flight_times_soln.csv': '967dd2f4e66999c76889c6e159be0169',
            'flights.csv': 'd9313f61c4689f20184bccb9e89afd6d',
            'ground_distances_cities.csv': 'ac64e84c460ea41244d7b4f254311b1c'}
datapaths = download_all(datasets, local_suffix='flight-paths/', url_suffix='flight-paths/')

print('Loading flighttimes dataset as DataFrame...')
# Data preprocessing on flighttimes dataset
FlightInfo = pd.read_csv(datapaths['FlightInfo.csv'])
d = FlightInfo['DepTime'].tolist()
a = FlightInfo['ArrTime'].tolist()

for i,j in enumerate(d):
    if len(str(int(j))) == 3:
        d[i] = '0{}:{}'.format(str(int(j))[0], str(int(j))[1:])
    elif len(str(int(j))) == 4:
        d[i] = '{}:{}'.format(str(int(j))[:2], str(int(j))[2:])
    else:
        d[i] = np.nan

for i,j in enumerate(a):
    if len(str(int(j))) == 3:
        a[i] = '0{}:{}'.format(str(int(j))[0], str(int(j))[1:])
    elif len(str(int(j))) == 4:
        a[i] = '{}:{}'.format(str(int(j))[:2], str(int(j))[2:])
    else:
        a[i] = np.nan
        
FlightInfo['DepTime'] = d
FlightInfo['ArrTime'] = a

flighttimes = FlightInfo.dropna()

print('flighttimes dataset successfully loaded as Pandas DataFrame!')
print('The First 5 Lines of the flighttimes Dataset: ')
print(flighttimes.head())

'FlightInfo.csv' is ready!
'airports.csv' is ready!
'city_average_times_soln.csv' is ready!
'city_ids.csv' is ready!
'city_travel_times_soln.csv' is ready!
'flight_times_soln.csv' is ready!
'flights.csv' is ready!
'ground_distances_cities.csv' is ready!
Loading flighttimes dataset as DataFrame...
flighttimes dataset successfully loaded as Pandas DataFrame!
The First 5 Lines of the flighttimes Dataset: 
  DepTime ArrTime Origin Dest
0   20:03   22:11    IAD  TPA
1   07:54   10:02    IAD  TPA
2   06:28   08:04    IND  BWI
3   18:29   19:59    IND  BWI
4   19:40   21:21    IND  JAX


To find the average time between unique flights, we must first compute the time between each of the flights in the data. 

**Exercise 0** (2 points). Create a dataframe, `flight_times`, that includes the time in minutes between the `ArrTime` and `DepTime` of each flight in the `flighttimes` dataset. The final result should have three columns:

* **`'Origin'`**: the origin airport three-letter code;
* **`'Dest'`**: the destination airport three-letter code; and
* **`'Time'`**: the time between `ArrTime` and `DepTime` in minutes. 

Note that some of the **Time** values may be negative, or even zero. In such cases, the most likely explanation is a "wraparound" effect, where `ArrTime` appears to occur before `DepTime`. **For simplicity, any such negative (or even any zero) values should be removed from the final dataFrame.**

In [2]:
flighttimes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1007071 entries, 0 to 1048574
Data columns (total 4 columns):
DepTime    1007071 non-null object
ArrTime    1007071 non-null object
Origin     1007071 non-null object
Dest       1007071 non-null object
dtypes: object(4)
memory usage: 38.4+ MB


In [3]:
from datetime import datetime, time 
from dateutil import parser

In [4]:
flighttimes[flighttimes['ArrTime'] != flighttimes['DepTime']].shape

(1006568, 4)

In [5]:
flighttimes[flighttimes['ArrTime'] > flighttimes['DepTime']].shape

(981478, 4)

In [6]:
#
flight_times = flighttimes[flighttimes['ArrTime'] > flighttimes['DepTime']]
flight_times = flight_times[~(flight_times['ArrTime'] == flight_times['DepTime'])]
flight_times.reset_index(drop=True, inplace=True)
display(flight_times.head())
#

Unnamed: 0,DepTime,ArrTime,Origin,Dest
0,20:03,22:11,IAD,TPA
1,07:54,10:02,IAD,TPA
2,06:28,08:04,IND,BWI
3,18:29,19:59,IND,BWI
4,19:40,21:21,IND,JAX


In [7]:
flight_times['DepHr'] = [int(time.split(':')[0]) for time in flight_times['DepTime']]
flight_times['DepMin'] =[int(time.split(':')[1]) for time in flight_times['DepTime']]

flight_times['ArrHr'] = [int(time.split(':')[0]) for time in flight_times['ArrTime']]
flight_times['ArrMin'] =[int(time.split(':')[1]) for time in flight_times['ArrTime']]

flight_times['DepHr(inMin)'] = [hr * 60 for hr in flight_times['DepHr']]
flight_times['ArrHr(inMin)'] = [hr * 60 for hr in flight_times['ArrHr']]

flight_times['Time'] = (flight_times['ArrHr(inMin)'] + flight_times['ArrMin']) - (flight_times['DepHr(inMin)'] + flight_times['DepMin'])

flight_times = flight_times[['Origin', 'Dest', 'Time']]

In [8]:
## TEST CODE EXERCISE 0 - flight_times

flight_times_soln = pd.read_csv(datapaths['flight_times_soln.csv'])

print('===== First 5 Lines of Your Solution =====')
print(flight_times.head())

print('\n')
print('===== First 5 Lines of Instructor Solution =====')
print(flight_times_soln.head())

print('\n Checking if DataFrames Match...')
assert tibbles_are_equivalent(flight_times, flight_times_soln) == True, print('\n DataFrames do not match')
print("\n(Passed!)")

===== First 5 Lines of Your Solution =====
  Origin Dest  Time
0    IAD  TPA   128
1    IAD  TPA   128
2    IND  BWI    96
3    IND  BWI    90
4    IND  JAX   101


===== First 5 Lines of Instructor Solution =====
  Origin Dest  Time
0    IAD  TPA   128
1    IAD  TPA   128
2    IND  BWI    96
3    IND  BWI    90
4    IND  JAX   101

 Checking if DataFrames Match...

(Passed!)


For the next part of this problem, we will load in a dataset containing the city names corresponding to each of the airports.

In [9]:
# Load airports dataset into Pandas DataFrame
airports = pd.read_csv(datapaths['airports.csv'])
print(airports.head())

  Code                                 Name         City State
0  ABE  Lehigh Valley International Airport    Allentown    PA
1  ABI             Abilene Regional Airport      Abilene    TX
2  ABQ    Albuquerque International Sunport  Albuquerque    NM
3  ABR            Aberdeen Regional Airport     Aberdeen    SD
4  ABY   Southwest Georgia Regional Airport       Albany    GA


**Exercise 1** (2 points). Replace the airport codes from execise 0 with their city and state using the airports dataset. Store the result in a DataFrame named `city_travel_times`. The final result should have three columns:

* **`'origin_city'`**: the origin city of the flight, in the form "City, State";
* **`'destination_city'`**: the destination city of the flight, in the form "City, State";
* and **`'Time'`**: same column and values as the previous exercise, i.e., the time between `'ArrTime'` and `'DepTime'` in minutes.

This final dataframe should only have the rows in which the origin and destination are present in your `flight_times` dataframe as well as the `airports` dataframe.

Note that some airports have the same city and state. For the purposes of this problem, you do NOT have to differentiate between those airports. For example, `IAD` and `DCA` will both have the same `origin_city`, `"Washington, DC"`. 

In [10]:
display(flight_times.head())
display(flight_times.Origin.nunique())
display(flight_times.Dest.nunique())

Unnamed: 0,Origin,Dest,Time
0,IAD,TPA,128
1,IAD,TPA,128
2,IND,BWI,96
3,IND,BWI,90
4,IND,JAX,101


296

296

In [11]:
display(airports.head())
display(airports.Code.nunique())

Unnamed: 0,Code,Name,City,State
0,ABE,Lehigh Valley International Airport,Allentown,PA
1,ABI,Abilene Regional Airport,Abilene,TX
2,ABQ,Albuquerque International Sunport,Albuquerque,NM
3,ABR,Aberdeen Regional Airport,Aberdeen,SD
4,ABY,Southwest Georgia Regional Airport,Albany,GA


712

In [12]:
# Create a dictionary for city, state corresponding to airport codes
from collections import defaultdict

airport_city_by_code = defaultdict(str)

for i in range(len(airports)):
    code = airports.loc[i, 'Code']
    airport_city_by_code[code] = airports.loc[i, 'City'] + ', ' + airports.loc[i, 'State']
    
assert len(airport_city_by_code) == len(airports['Code'].unique().tolist())


# Create the city_travel_times dataframe
city_travel_times = flight_times.copy()
city_travel_times.drop_duplicates(subset=(['Origin', 'Dest']))

origin_city = []
destination_city = []
for i in range(len(city_travel_times)):
    orig = city_travel_times.loc[i, 'Origin']
    dest = city_travel_times.loc[i, 'Dest']
    
    origin_city.append(airport_city_by_code[orig])
    destination_city.append(airport_city_by_code[dest])
    
city_travel_times['origin_city'] = origin_city
city_travel_times['destination_city'] = destination_city
# city_travel_times.drop(['Origin', 'Dest'], axis=1, inplace=True)

display(city_travel_times.head())
display(city_travel_times.shape)

Unnamed: 0,Origin,Dest,Time,origin_city,destination_city
0,IAD,TPA,128,"Washington, DC","Tampa, FL"
1,IAD,TPA,128,"Washington, DC","Tampa, FL"
2,IND,BWI,96,"Indianapolis, IN","BWI Airport, MD"
3,IND,BWI,90,"Indianapolis, IN","BWI Airport, MD"
4,IND,JAX,101,"Indianapolis, IN","Jacksonville, FL"


(981478, 5)

In [13]:
# Final dataframe should only have the rows in which the origin and destination 
# are present in your flight_times dataframe as well as the airports dataframe.

flight_times_dests = flight_times['Dest'].unique().tolist()
flight_times_origins = flight_times['Origin'].unique().tolist()

city_travel_times = city_travel_times[city_travel_times['Origin'].isin(flight_times_dests)]
city_travel_times = city_travel_times[city_travel_times['Dest'].isin(flight_times_origins)]


airport_codes = airports['Code'].unique().tolist()

city_travel_times = city_travel_times[city_travel_times['Origin'].isin(airport_codes)]
city_travel_times = city_travel_times[city_travel_times['Dest'].isin(airport_codes)]
city_travel_times.drop(['Origin', 'Dest'], axis=1, inplace=True)


In [14]:
## TEST CODE EXERCISE 1 - city_travel_times
city_travel_times_soln = pd.read_csv(datapaths['city_travel_times_soln.csv'])

print('===== First 5 Lines of Your Solution =====')
print(city_travel_times.head())

print('\n')
print('====== First 5 Lines of Instructor Solution =====')
print(city_travel_times_soln.head())

print('\n Checking if DataFrames Match...')
assert tibbles_are_equivalent(city_travel_times, city_travel_times_soln) == True, print("\n DataFrames do not match")
print("\n(Passed!)")

===== First 5 Lines of Your Solution =====
   Time       origin_city  destination_city
0   128    Washington, DC         Tampa, FL
1   128    Washington, DC         Tampa, FL
2    96  Indianapolis, IN   BWI Airport, MD
3    90  Indianapolis, IN   BWI Airport, MD
4   101  Indianapolis, IN  Jacksonville, FL


      origin_city destination_city  Time
0  Washington, DC        Tampa, FL   128
1  Washington, DC        Tampa, FL   128
2  Washington, DC        Tampa, FL   126
3  Washington, DC        Tampa, FL   137
4  Washington, DC        Tampa, FL   133

 Checking if DataFrames Match...

(Passed!)


Finally, we will get the average flight time for each unique city to city flight.

**Exercise 2** (2 points). Create a new DataFrame, `city_average_times`, which lists the average flight time of each unique city to city flight in the `flighttimes` dataset. The final result should be a DataFrame with three columns:  **origin_city**: the origin city of the flight, in the form "City, State" ; **destination_city**: the destination city of the flight, in the form "City, State" ; and **average_time**: the average flight time between the origin and destination city. Round the results to the nearest two decimal places.

In [15]:
city_travel_times.head(3)

Unnamed: 0,Time,origin_city,destination_city
0,128,"Washington, DC","Tampa, FL"
1,128,"Washington, DC","Tampa, FL"
2,96,"Indianapolis, IN","BWI Airport, MD"


In [16]:
#
city_average_times = city_travel_times.groupby(['origin_city', 'destination_city']).mean()
city_average_times.rename(columns={'Time':'average_time'},
                          inplace=True)
city_average_times.reset_index(inplace=True)
city_average_times.head()
#


Unnamed: 0,origin_city,destination_city,average_time
0,"Abilene, TX","Dallas, TX",59.84434
1,"Adak Island, AK","Anchorage, AK",220.043478
2,"Aguadilla, PR","New York, NY",200.634146
3,"Aguadilla, PR","Newark, NJ",220.542857
4,"Aguadilla, PR","Orlando, FL",143.209302


In [17]:
## TEST CODE EXERCISE 2 - city_average_times
city_average_times_soln = pd.read_csv(datapaths['city_average_times_soln.csv'])

print('===== First 5 Lines of Your Solution =====')
print(city_average_times.head())

print('\n')
print('====== First 5 Lines of Instructor Solution =====')
print(city_average_times_soln.head())

print('\n Checking if DataFrames Match...')
assert city_average_times.shape == city_average_times_soln.shape, print("Dimensions of your solution do not match the instructor's solution")
soln = pd.merge(city_average_times, city_average_times_soln, how="right", on=["origin_city", "destination_city"])
soln_time = soln["average_time_y"] - soln["average_time_x"]
tolerance = 1
assert max(abs(soln_time)) <=tolerance, print("Your average time is beyond the tolerances provided")
print("\n(Passed!)")

===== First 5 Lines of Your Solution =====
       origin_city destination_city  average_time
0      Abilene, TX       Dallas, TX     59.844340
1  Adak Island, AK    Anchorage, AK    220.043478
2    Aguadilla, PR     New York, NY    200.634146
3    Aguadilla, PR       Newark, NJ    220.542857
4    Aguadilla, PR      Orlando, FL    143.209302


       origin_city destination_city  average_time
0      Abilene, TX       Dallas, TX         59.84
1  Adak Island, AK    Anchorage, AK        220.04
2    Aguadilla, PR     New York, NY        200.63
3    Aguadilla, PR       Newark, NJ        220.54
4    Aguadilla, PR      Orlando, FL        143.21

 Checking if DataFrames Match...

(Passed!)


Next, let's look at ground travel times. In the test cell below, we generate a DataFrame, `ground_distances_cities`, that shows the average travel times (in hours) from one city to another if you did not take a plane. Note that the travel time from `A -> B` may not be the same as the travel time from `B -> A` because of traffic/waiting for trains/etc. 

(Also note: these are not true ground travel times. They are made up based on the distances between cities in terms of latitude/longitude. If you use these times to plan your next road trip, you may be in for a rude surprise!)

In [18]:
ground_distances_cities = pd.read_csv(datapaths['ground_distances_cities.csv'])
ground_distances_cities["Average_Travel_Time"] = ground_distances_cities["Average_Travel_Time"].round(2)
print(ground_distances_cities.head())

  Starting_City      Ending_City  Average_Travel_Time
0   Abilene, TX  Adak Island, AK                80.39
1   Abilene, TX    Aguadilla, PR                43.40
2   Abilene, TX       Albany, GA                19.51
3   Abilene, TX       Albany, NY                29.87
4   Abilene, TX  Albuquerque, NM                12.41


Next, we will assign each city a unique id and make a new DataFrame that shows the starting and ending cities in terms of their ids:

In [19]:
# Read city ids
city_ids = pd.read_csv(datapaths['city_ids.csv'], index_col="City")
city_codes_dict = city_ids.to_dict()["id"]
print('The First 5 Lines of the city_ids Dataset: ')
print(city_ids.head())

The First 5 Lines of the city_ids Dataset: 
                 id
City               
Abilene, TX       0
Adak Island, AK   1
Aguadilla, PR     2
Albany, GA        3
Albany, NY        4


In [20]:
city_ids.shape

(293, 1)

The name of the new DataFrame being generated is `gnd_travel_ids`. The first five lines of the DataFrame can be seen by running the code cell below:

In [21]:
# Generate gnd_travel_ids DataFrame

gnd_travel_ids = ground_distances_cities.copy()
gnd_travel_ids['Starting_City'] = gnd_travel_ids['Starting_City'].map(city_codes_dict)
gnd_travel_ids['Ending_City'] = gnd_travel_ids['Ending_City'].map(city_codes_dict)
print('The First 5 Lines of gnd_travel_ids: ')
print(gnd_travel_ids.head())

The First 5 Lines of gnd_travel_ids: 
   Starting_City  Ending_City  Average_Travel_Time
0              0            1                80.39
1              0            2                43.40
2              0            3                19.51
3              0            4                29.87
4              0            5                12.41


Now, we will put the values in the `gnd_travel_ids` DataFrame into a square table, with equal number of rows and columns which represent the origins and destinations. 

**Exercise 3** (1 point). Create a **pandas `DataFrame`**, named `travel_matrix`, where each element in `travel_matrix`, `[origin_id, destination_id]`, is the average_travel_time for that origin_id, destination_id combination. For instance, the value for `travel_matrix[0][1]` should be 80.387829. It should be noted that there are 293 distinct city ids (ranging from 0 to 292). 

(Note: The function `pivot_table()` in pandas may be helpful here. Also, the diagonal entries in the table represent the same origin and destination. Such entries must be equal to zero since the direct travel time from the origin to itself is 0. In the square table, any missing values must be filled by zero as well.)

In [22]:
#
travel_matrix = pd.pivot_table(gnd_travel_ids, values=['Average_Travel_Time'], 
                               index=['Starting_City'], 
                               columns=['Ending_City'],)

# travel_matrix.reset_index(drop=True, inplace=True)

travel_matrix.rename(index={'Starting_City' : None}, 
                     columns={'Ending_City' : None},
                     inplace=True)
travel_matrix.fillna(0, inplace=True)
travel_matrix.head(3)
#

Unnamed: 0_level_0,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time,Average_Travel_Time
Ending_City,0,1,2,3,4,5,6,7,8,9,...,283,284,285,286,287,288,289,290,291,292
Starting_City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
0,0.0,80.39,43.4,19.51,29.87,12.41,11.22,32.6,11.46,60.96,...,26.78,35.38,3.98,12.7,26.86,31.63,48.6,32.2,56.29,20.93
1,82.39,0.0,115.49,98.66,111.25,82.03,87.57,102.83,80.77,32.22,...,67.92,108.5,81.18,89.48,104.3,111.44,52.52,63.39,42.76,70.95
2,40.4,117.49,0.0,24.48,29.15,50.81,37.47,26.66,42.41,100.21,...,60.2,27.51,42.98,36.85,25.1,29.09,79.5,70.32,88.32,55.54


In [23]:
# .to_records() helped
upd_travel_mat = pd.DataFrame(travel_matrix.to_records())
upd_travel_mat.reset_index(drop=True, inplace=True)
upd_travel_mat.drop('Starting_City', axis=1, inplace=True)

col_names = []
for col in upd_travel_mat.columns:
    stripped = col.strip("(Average_Travel_Time' ,)")
    col_names.append(int(stripped))
    
upd_travel_mat.head()
upd_travel_mat.rename(columns=dict(zip(upd_travel_mat.columns, col_names)), inplace=True)
travel_matrix = upd_travel_mat.copy()

In [24]:
travel_matrix.sample(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,283,284,285,286,287,288,289,290,291,292
274,21.13,92.69,28.72,12.27,22.15,24.92,14.83,15.72,20.04,67.94,...,33.72,18.47,17.79,11.37,19.8,24.77,53.0,44.11,64.87,34.95
269,7.79,89.68,40.75,18.98,25.22,20.72,3.57,27.9,16.91,65.49,...,23.48,28.67,10.53,10.43,22.11,25.94,52.76,36.6,60.58,21.63
21,72.06,36.78,105.03,88.74,96.73,70.88,85.63,92.9,75.82,16.17,...,56.83,98.38,71.19,73.2,92.12,94.06,37.54,50.86,26.77,67.18
139,24.57,105.74,25.84,12.39,20.86,33.97,20.74,25.29,29.62,78.44,...,36.6,27.37,20.21,23.42,20.45,21.61,63.84,54.58,74.63,42.83
147,9.85,87.21,36.57,10.14,24.16,15.26,7.38,30.66,16.89,74.71,...,29.07,30.37,15.53,14.62,25.87,27.69,57.19,34.87,58.95,25.53


In [25]:
## TEST CODE PART 1, EXERCISE 3 - travel_matrix_1

assert type(travel_matrix) is pd.DataFrame, "`type(travel_matrix) == {}` instead of `pd.DataFrame`.".format(type(travel_matrix))

# Test 1 - All Diagonals in the Matrix are 0
print('Test 1: Are all Diagonals 0?')
travel_mat = np.array(travel_matrix)
assert np.all(np.diag(travel_mat) == 0) == True
print('Yes, all Diagonals are 0! \n')
# Test 2 - Dimensions
print('Test 2: Are the dimensions correct?')
row, col = travel_matrix.shape
assert row == col == 293
print('Yes, dimensions are correct! \n')


# Test 3 - Select Values in Matrix are the same
tol = 1
print('Test 3: Checking if Select Values are the Same...')
assert abs(travel_matrix[1][0] - 80.38) < tol
assert abs(travel_matrix[0][1] - 82.38) < tol
assert abs(travel_matrix[30][50] - 24.47) < tol
assert abs(travel_matrix[50][30] - 29.47) < tol
assert abs(travel_matrix[260][118] - 96.85) < tol
assert abs(travel_matrix[118][260] - 95.85) < tol
assert abs(travel_matrix[3][292] - 36.43) < tol
assert abs(travel_matrix[292][3] - 32.43) < tol
assert abs(travel_matrix[279][256] - 15.82) < tol
assert abs(travel_matrix[256][279] - 18.82) < tol
print('Great! Select Values are the Same!')

print('\n(Passed!)')

Test 1: Are all Diagonals 0?
Yes, all Diagonals are 0! 

Test 2: Are the dimensions correct?
Yes, dimensions are correct! 

Test 3: Checking if Select Values are the Same...
Great! Select Values are the Same!

(Passed!)


**Exercise 4** (3 points) Now write some code to compute a **2-D Numpy array** named `round_trip`, which contains the amount of time taken to complete a round trip between all possible origins and destinations as appear in the table `gnd_travel_ids`. Your table should be a square matrix. Any entry `(i, j)` in the matrix must contain the total time to go from `i` to `j` and back to `i`.

In [26]:
#
display(gnd_travel_ids.head())
display(gnd_travel_ids.shape)
#

Unnamed: 0,Starting_City,Ending_City,Average_Travel_Time
0,0,1,80.39
1,0,2,43.4
2,0,3,19.51
3,0,4,29.87
4,0,5,12.41


(85556, 3)

In [27]:
display(gnd_travel_ids['Starting_City'].nunique())
display(gnd_travel_ids['Ending_City'].nunique())

293

293

In [28]:
# Create a dictionary with round trip values
# keys are tuples of starting, and ending_city

round_trip_times = defaultdict(float)
avg_roundtrip_times = []

for i in range(len(gnd_travel_ids)):
    a = gnd_travel_ids.loc[i, 'Starting_City']
    b = gnd_travel_ids.loc[i, 'Ending_City']
    a_to_b = gnd_travel_ids.loc[i, 'Average_Travel_Time']
    
    idx = np.where((gnd_travel_ids['Starting_City'] == b) & (gnd_travel_ids['Ending_City'] == a))[0][0]
    b_to_a = gnd_travel_ids.loc[idx, 'Average_Travel_Time']
    
    round_trip_times[(a, b)] = a_to_b + b_to_a
    avg_roundtrip_times.append(a_to_b + b_to_a)

In [29]:
my_travel_times = gnd_travel_ids.copy()
my_travel_times['RoundTrip_Times'] = avg_roundtrip_times

round_trip = pd.pivot_table(data=my_travel_times, 
                            index='Starting_City',
                            columns='Ending_City', 
                            values='RoundTrip_Times')

round_trip.reset_index(drop=True, inplace=True)
round_trip = pd.DataFrame(round_trip.to_records())
round_trip.drop('index', axis=1, inplace=True)
round_trip.fillna(0, inplace=True)

# Need to format the indices, which are of str type
display(round_trip.index)
display(round_trip.columns)

round_trip.index = [int(i) for i in range(len(round_trip))]
round_trip.rename(columns=dict(zip(round_trip.columns, round_trip.index)), inplace=True)

round_trip.head()

RangeIndex(start=0, stop=293, step=1)

Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
       ...
       '283', '284', '285', '286', '287', '288', '289', '290', '291', '292'],
      dtype='object', length=293)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,283,284,285,286,287,288,289,290,291,292
0,0.0,162.78,83.8,42.02,66.74,20.82,24.44,65.2,23.92,123.92,...,44.56,65.76,14.96,25.4,54.72,70.26,97.2,63.4,106.58,43.86
1,162.78,0.0,232.98,196.32,221.5,159.06,182.14,207.66,165.54,69.44,...,141.84,216.0,162.36,172.96,208.6,220.88,107.04,122.78,81.52,141.9
2,83.8,232.98,0.0,50.96,62.3,101.62,72.94,53.32,82.82,198.42,...,116.4,59.02,79.96,75.7,49.2,56.18,157.0,131.64,180.64,112.08
3,42.02,196.32,50.96,0.0,37.58,55.38,31.72,40.28,42.78,148.34,...,71.92,34.34,32.02,41.16,26.72,43.04,123.5,90.62,139.24,68.86
4,66.74,221.5,62.3,37.58,0.0,74.4,53.9,17.32,67.8,169.78,...,84.74,16.36,69.4,54.34,33.84,13.76,136.32,109.72,140.92,101.06


In [34]:
## TEST CODE PART 2 OF 2, EXERCISE 4 - travel_matrix_2
import random
n_test = 1000
for _ in range(n_test):
    origin = random.randint(0, 292)
    dest = random.randint(0, 292)
    round_travel_time = round_trip.loc[origin, dest]
    o1 = gnd_travel_ids["Starting_City"] == origin
    d1 = gnd_travel_ids["Ending_City"] == dest
    d2 = gnd_travel_ids["Starting_City"] == dest
    o2 = gnd_travel_ids["Ending_City"] == origin
    if origin != dest:
        time = gnd_travel_ids[o1 & d1]["Average_Travel_Time"].values[0] + gnd_travel_ids[o2 & d2]["Average_Travel_Time"].values[0]
        assert time == round_travel_time

print('\n(Passed!)')


(Passed!)


**Fin!** You've reached the end of this problem. Don't forget to restart the
kernel and run the entire notebook from top-to-bottom to make sure you did
everything correctly. If that is working, try submitting this problem. (Recall
that you *must* submit and pass the autograder to get credit for your work!)