# Create Squareform Distance & Travel Time Matrices Using Google Maps Distance Matrix API


__Goal:__ Query `Google Maps Distance Matrix API` for distance and duration values. Create a `pandas` DataFrame with the query results. Save a square form distance matrix DataFrame, travel time matrix in seconds DataFrame, and travel time matrix in `hh:mm` format DataFrame to a comma-separated value (CSV) file.


We utilize the `Google Distance Matrix API` to determine travel distance and duration values. To use this API, you need a Google Maps API key. These keys are available with a Google Account.

You must set up a billing account to use the Google Maps Platform. Fortunately, you get $200 in free usage every month. That is enough to create multiple square form distance and duration matrices for free.

1. Go to the [Google Cloud Platform Console](https://console.cloud.google.com/getting-started).
2. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
3. Select `Billing`.
4. Set up your billing account.

Now, let's create a `distance-matrix` project.

1. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
2. Select `Home`.
3. Click on the project drop-down in the top navigation bar.
4. Click `NEW PROJECT`.
5. Enter `distance-matrix` in the `Project name` field.
6. Click `Create`.

Now, let's enable the necessary APIs.

1. Click on the project drop-down in the top navigation bar.
2. Select the `distance-matrix` project.
3. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
4. Select `APIs & Services`.
5. Click `+ ENABLE APIS & SERVICES`.
6. Search for and select the `Distance Matrix API`.
7. Click `ENABLE`.

Finally, let's create an API key.

1. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
2. Select `APIs & Services > Credentials`.
3. Click `+ CREATE CREDENTIALS`.
4. Select `API key`.
5. Copy your API key. You use this in a second.

Now to a Jupyter notebook. To begin, we import necessary libraries.

In [1]:
# Import necessary libraries
import googlemaps
from itertools import combinations
import numpy as np
import pandas as pd
from scipy.spatial.distance import squareform

We define a constant for the number of meters in a mile and a constant for our Google Maps API key (be sure to replace `YOUR_API_KEY` with your actual API key).

In [2]:
# Define meters in mile constant
METERS_IN_MILE = 1609.34

# Enter Google Maps API key
GOOGLE_MAPS_API_KEY = "AIzaSyDeWrBEIZRJoAEIZase4lXJWflJSRLvgSs"

We need to create a list of locations to include in our distance and travel time matrices. In our situation, we have a [venues text file]('hard_data/all_venues.txt') containing the venue names we want to include. 

In [3]:
# Print contents of venues text file
with open('hard_data/all_venues.txt', 'r') as f:
    
    print(f.read())

Amherst College
Bates College
Bowdoin College
Colby College
Connecticut College
Hamilton College
Middlebury College
Trinity College
Tufts University
Wesleyan University
Williams College



We use a list comprehension to read each line of the file, strip extraneous white spaces, and create a venue names list composed of each stripped line.

In [5]:
# Load venue names from file
with open('hard_data/all_venues.txt', 'r') as f:
    
    # Create list of venue names
    venue_names = [line.strip() for line in f]
    
# Preview list of venue names
venue_names

['Amherst College',
 'Bates College',
 'Bowdoin College',
 'Colby College',
 'Connecticut College',
 'Hamilton College',
 'Middlebury College',
 'Trinity College',
 'Tufts University',
 'Wesleyan University',
 'Williams College']

Looks good. We are now ready to harness the power of the `Google Maps Distance Matrix API`. 

We instantiate a `Google Maps API` session and initialize a data dictionary to hold location and travel distance and duration values. We will eventually use this dictionary to create a `pandas` DataFrame.

In [6]:
# Instantiate Google Maps API session
gmaps = googlemaps.Client(GOOGLE_MAPS_API_KEY)

# Initialize data dictionary to hold values
distance_duration_data = {
    'Venue 1': [],
    'Venue 2': [],
    'Distance (mi)': [],
    'Duration (s)': []
}

Let's check out an example pair of venues and query `Google Distance Matrix API` for the travel distance and duration values.

In [7]:
# Set example values
venue_1 = venue_names[0]
venue_2 = venue_names[1]

# Query Google Maps API for driving distance and duration
trip = gmaps.distance_matrix(
    origins=[venue_1],
    destinations=[venue_2],
    mode="driving",
    units="metric")

# Preview trip
trip

{'destination_addresses': ['2 Andrews Rd, Lewiston, ME 04240, USA'],
 'origin_addresses': ['Amherst, MA 01002, USA'],
 'rows': [{'elements': [{'distance': {'text': '332 km', 'value': 331964},
     'duration': {'text': '3 hours 24 mins', 'value': 12246},
     'status': 'OK'}]}],
 'status': 'OK'}

The query result is formatted using JSON, but how do we extract the distance value? We take it one step at a time.

In [8]:
trip['rows']

[{'elements': [{'distance': {'text': '332 km', 'value': 331964},
    'duration': {'text': '3 hours 24 mins', 'value': 12246},
    'status': 'OK'}]}]

In [9]:
trip['rows'][0]

{'elements': [{'distance': {'text': '332 km', 'value': 331964},
   'duration': {'text': '3 hours 24 mins', 'value': 12246},
   'status': 'OK'}]}

In [10]:
trip['rows'][0]['elements']

[{'distance': {'text': '332 km', 'value': 331964},
  'duration': {'text': '3 hours 24 mins', 'value': 12246},
  'status': 'OK'}]

In [11]:
trip['rows'][0]['elements'][0]

{'distance': {'text': '332 km', 'value': 331964},
 'duration': {'text': '3 hours 24 mins', 'value': 12246},
 'status': 'OK'}

In [12]:
trip['rows'][0]['elements'][0]['distance']

{'text': '332 km', 'value': 331964}

In [13]:
trip['rows'][0]['elements'][0]['distance']['value']

331964

Success. 

We convert the distance from meters to miles and round to the nearest mile.

In [14]:
# Extract driving distance in meters
distance_m = trip['rows'][0]['elements'][0]['distance']['value']

# Convert meters to miles
distance_mi = round(distance_m / METERS_IN_MILE)

# Preview distance in miles
distance_mi

206

We use a similar process to extract the duration value in seconds.

In [15]:
# Extract driving duration in seconds
duration_s = trip['rows'][0]['elements'][0]['duration']['value']

# Preview duration in seconds
duration_s

12246

Looks good, but we need to find the distance and duration values for multiple combinations of locations. Let's take a step back.

We can use the `itertools` standard library `combinations` function to create a list of two-venue tuples without repeated elements. We skip repeated elements because we assume order does not matter for travel. Driving from venue 1 to venue 2 has the same travel distance and duration as driving from venue 2 to venue 1.

In [16]:
# Preview list of location combinations
list(combinations(venue_names, 2))[0:15]

[('Amherst College', 'Bates College'),
 ('Amherst College', 'Bowdoin College'),
 ('Amherst College', 'Colby College'),
 ('Amherst College', 'Connecticut College'),
 ('Amherst College', 'Hamilton College'),
 ('Amherst College', 'Middlebury College'),
 ('Amherst College', 'Trinity College'),
 ('Amherst College', 'Tufts University'),
 ('Amherst College', 'Wesleyan University'),
 ('Amherst College', 'Williams College'),
 ('Bates College', 'Bowdoin College'),
 ('Bates College', 'Colby College'),
 ('Bates College', 'Connecticut College'),
 ('Bates College', 'Hamilton College'),
 ('Bates College', 'Middlebury College')]

We can now iterate over the list of venue combinations, query `Google Maps API` for distance and duration values, and store the extracted query results in our data dictionary. We print an error message if the distance and duration values cannot be found for a venue combination. 

In [17]:
# Collect driving distance and duration data for each venue to venue combination
for (venue_1, venue_2) in combinations(venue_names, 2):

    try:

        # Query Google Maps API for driving distance and duration
        trip = gmaps.distance_matrix(
            origins=[venue_1],
            destinations=[venue_2],
            mode="driving",
            units="metric")

        # Extract driving distance in meters and convert to miles
        distance_m = trip['rows'][0]['elements'][0]['distance']['value']
        distance_mi = round(distance_m / METERS_IN_MILE)

        # Extract driving duration in seconds
        duration_s = trip['rows'][0]['elements'][0]['duration']['value']

        # Add values to data dictionary
        distance_duration_data['Venue 1'].append(venue_1)
        distance_duration_data['Venue 2'].append(venue_2)
        distance_duration_data['Distance (mi)'].append(distance_mi)
        distance_duration_data['Duration (s)'].append(duration_s)

    except Exception:

        raise Exception("Error finding the distance between {} and {}.".format(venue_1, venue_2))

Once we have iterated over all venue combinations, we create and preview a distance and duration DataFrame.

In [28]:
# Create distance and duration DataFrame
distance_duration_df = pd.DataFrame(distance_duration_data)

# Preview DataFrame
distance_duration_df.head()

Unnamed: 0,Venue 1,Venue 2,Distance (mi),Duration (s)
0,Amherst College,Bates College,206,12246
1,Amherst College,Bowdoin College,199,11876
2,Amherst College,Colby College,246,14078
3,Amherst College,Connecticut College,99,6048
4,Amherst College,Hamilton College,206,11888


Not bad.

We ultimately want to create a square form distance matrix in miles DataFrame, travel time matrix in seconds DataFrame, and travel time matrix in `hh:mm` format DataFrame. We have distance values in miles. We have duration values in seconds, but we still need to calculate duration values in `hh:mm` format. 

Let's look at an example duration value in seconds.

In [29]:
distance_duration_df['Duration (s)'][0]

12246

We use the `divmod` function to divide our duration in seconds by 60 and store the resulting quotient in a `minutes` variable and the remainder in a `seconds` variable.

In [30]:
# Determine number of full minutes and remainder seconds
minutes, seconds = divmod(distance_duration_df['Duration (s)'][0], 60)

# Preview minutes and seconds
minutes, seconds

(204, 6)

We use the `divmod` function again to divide the `minutes` value by 60 and store the resulting quotient in an `hour` variable and the remainder in a `minutes` variable.

In [31]:
# Determine number of full hours and remainder minutes
hours, minutes = divmod(minutes, 60)

# Preview hours and minutes
hours, minutes

(3, 24)

We now have our `hour` and `minutes` values, and we can use string formatting to print the result in `hh:mm` format.

In [32]:
# Convert hours and minutes to HH:MM format
hhmm = "{:d}:{:02d}".format(hours, minutes)

# Preview HH:MM format
hhmm

'3:24'

All set. 

To make our lives easier, we can wrap the seconds to `hh:mm` conversion in a function.

In [33]:
def convert_seconds_to_hhmm(seconds):
    """
    Convert a value from seconds to hours and minutes in HH:MM format.
    
    """

    # Determine number of full minutes and remainder seconds
    minutes, seconds = divmod(seconds, 60)

    # Determine number of full hours and remainder minutes
    hours, minutes = divmod(minutes, 60)

    # Convert hours and minutes to HH:MM format
    hhmm = "{:d}:{:02d}".format(hours, minutes)

    return(hhmm)

We now create a new duration in `hh:mm` format column by applying this function to each value in the `Duration (s)` column. 

In [34]:
# Add column for duration in hh:mm format
distance_duration_df['Duration (hh:mm)'] = distance_duration_df['Duration (s)'].apply(lambda x: convert_seconds_to_hhmm(x))

# Preview DataFrame
distance_duration_df.head()

Unnamed: 0,Venue 1,Venue 2,Distance (mi),Duration (s),Duration (hh:mm)
0,Amherst College,Bates College,206,12246,3:24
1,Amherst College,Bowdoin College,199,11876,3:17
2,Amherst College,Colby College,246,14078,3:54
3,Amherst College,Connecticut College,99,6048,1:40
4,Amherst College,Hamilton College,206,11888,3:18


Perfect.

We ultimately want to use our distance and travel time matrices in correspondence with another DataFrame that identifies the same list of venues with a unique integer ID rather than a string venue name. For this to work, we need the venue identifiers in our matrices to align with the venue identifiers in the other DataFrame. This will be our final step before we create distance and travel time matrices. 

The venue name to venue ID mapping in the other DataFrame assigns consecutive IDs to an alphabetically-sorted (A to Z) list of venue names. We can build a corresponding venue name to ID mapping dictionary.

In [35]:
# Sort list of venue names
venue_names.sort()

# Create dictionary for venue name to venue id mapping
name_to_id_mapping = {venue: idx for idx, venue in enumerate(venue_names)}

# Preview dictionary
name_to_id_mapping

{'Amherst College': 0,
 'Bates College': 1,
 'Bowdoin College': 2,
 'Colby College': 3,
 'Connecticut College': 4,
 'Hamilton College': 5,
 'Middlebury College': 6,
 'Trinity College': 7,
 'Tufts University': 8,
 'Wesleyan University': 9,
 'Williams College': 10}

We then use the `replace` function to apply the mapping of venue names to IDs in our distance and duration DataFrame.

In [36]:
# Map venue name to venue id in DataFrame
distance_duration_df = distance_duration_df.replace({
    'Venue 1': name_to_id_mapping,
    'Venue 2': name_to_id_mapping})

# Preview DataFrame
distance_duration_df.head()

Unnamed: 0,Venue 1,Venue 2,Distance (mi),Duration (s),Duration (hh:mm)
0,0,1,206,12246,3:24
1,0,2,199,11876,3:17
2,0,3,246,14078,3:54
3,0,4,99,6048,1:40
4,0,5,206,11888,3:18


Looks good.

While mapping is on the mind, let's build a reciprocal dictionary that maps venue IDs to venue names. This will come into play during testing.

In [37]:
# Create dictionary for venue id to venue name mapping
id_to_name_mapping = {idx: venue for idx, venue in enumerate(venue_names)}

# Preview dictionary
id_to_name_mapping

{0: 'Amherst College',
 1: 'Bates College',
 2: 'Bowdoin College',
 3: 'Colby College',
 4: 'Connecticut College',
 5: 'Hamilton College',
 6: 'Middlebury College',
 7: 'Trinity College',
 8: 'Tufts University',
 9: 'Wesleyan University',
 10: 'Williams College'}

Now we are ready to create a square form distance matrix DataFrame. 

To begin, we create a list of distance values. We know this list is sorted by venue because it is pulled directly from our distance and duration DataFrame that is sorted by venue.

In [46]:
# Create list of distance values sorted by venue
distances = distance_duration_df['Distance (mi)'].tolist()

# Preview distance list
distances[0:5]

[206, 199, 246, 99, 206]

We now create a list of venue IDs. We know this list is sorted by venue since the `NumPy` function `unique` returns a sorted list.

In [47]:
# Create list of venue ids (sorted by venue)
venue_id_list = list(np.unique(distance_duration_df[['Venue 1', 'Venue 2']].values))

# Preview list of venue ids
venue_id_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Let's apply the `SciPy` `squareform` function to our distance list and check out the result.

In [48]:
squareform(distances)

array([[  0, 206, 199, 246,  99, 206, 153,  53,  87,  67,  60],
       [206,   0,  21,  49, 240, 406, 261, 234, 140, 247, 249],
       [199,  21,   0,  50, 233, 400, 255, 227, 134, 240, 242],
       [246,  49,  50,   0, 280, 446, 301, 274, 180, 286, 288],
       [ 99, 240, 233, 280,   0, 267, 245,  52, 111,  40, 158],
       [206, 406, 400, 446, 267,   0, 170, 220, 281, 235, 144],
       [153, 261, 255, 301, 245, 170,   0, 198, 183, 213, 108],
       [ 53, 234, 227, 274,  52, 220, 198,   0, 108,  16, 111],
       [ 87, 140, 134, 180, 111, 281, 183, 108,   0, 119, 130],
       [ 67, 247, 240, 286,  40, 235, 213,  16, 119,   0, 126],
       [ 60, 249, 242, 288, 158, 144, 108, 111, 130, 126,   0]])

Not bad, but we can do better.

We create a distance matrix DataFrame by using the `squareform` function in combination with `pandas`. We assign the list of venue IDs as the index and columns of our DataFrame, and we preview the result. 

In [49]:
# Create distance matrix DataFrame
distance_matrix = pd.DataFrame(
    squareform(distances),
    index=venue_id_list,
    columns=venue_id_list)

# Preview distance matrix DataFrame
distance_matrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,0,206,199,246,99,206,153,53,87,67,60
1,206,0,21,49,240,406,261,234,140,247,249
2,199,21,0,50,233,400,255,227,134,240,242
3,246,49,50,0,280,446,301,274,180,286,288
4,99,240,233,280,0,267,245,52,111,40,158
5,206,406,400,446,267,0,170,220,281,235,144
6,153,261,255,301,245,170,0,198,183,213,108
7,53,234,227,274,52,220,198,0,108,16,111
8,87,140,134,180,111,281,183,108,0,119,130
9,67,247,240,286,40,235,213,16,119,0,126


It looks like our distance matrix is all set.

We use a similar series of steps to create a travel time matrix in seconds DataFrame. To begin, we create a list of travel time in seconds values sorted by venue.

In [50]:
# Create list of travel time in seconds values sorted by venue
durations_s = distance_duration_df['Duration (s)'].tolist()

# Preview travel time in seconds list
durations_s[0:5]

[12246, 11876, 14078, 6048, 11888]

We then use the list of venue IDs created for our distance matrix in combination with `pandas` to create a travel time in seconds matrix DataFrame.

In [51]:
# Create travel time in seconds matrix DataFrame
duration_matrix_s = pd.DataFrame(
    squareform(durations_s),
    index=venue_id_list,
    columns=venue_id_list)

# Preview travel time in seconds matrix DataFrame
duration_matrix_s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,0,12246,11876,14078,6048,11888,10498,3664,6314,4469,5173
1,12246,0,2356,3109,13325,22484,15980,13177,8045,13879,15772
2,11876,2356,0,3163,13081,22239,15736,12933,7801,13635,15528
3,14078,3109,3163,0,15201,24359,17856,15052,9921,15754,17647
4,6048,13325,13081,15201,0,14848,15203,3158,6622,2388,9519
5,11888,22484,22239,24359,14848,0,12569,12379,15630,13184,9181
6,10498,15980,15736,17856,15203,12569,0,12739,11680,13544,7771
7,3664,13177,12933,15052,3158,12379,12739,0,6410,1461,7066
8,6314,8045,7801,9921,6622,15630,11680,6410,0,7080,9822
9,4469,13879,13635,15754,2388,13184,13544,1461,7080,0,7803


Perfect.

Now we are ready to create our final DataFrame—a travel times matrix in `hh:mm` format. Repeating a similar series of steps, we begin by creating a list of travel time in `hh:mm` format values sorted by venue.

In [44]:
# Create list of travel time in hh:mm format values sorted by venue
durations_hhmm = distance_duration_df['Duration (hh:mm)'].tolist()

# Preview travel time in hh:mm format list
durations_hhmm[0:5]

['3:24', '3:17', '3:54', '1:40', '3:18']

We then use the list of venue IDs created for our distance matrix in combination with `pandas` to create a travel time matrix in `hh:mm` format DataFrame.

In [45]:
# Create travel time in hh:mm format matrix DataFrame
duration_matrix_hhmm = pd.DataFrame(
    squareform(durations_hhmm),
    index=venue_id_list,
    columns=venue_id_list)

# Preview travel time in hh:mm format matrix DataFrame
duration_matrix_hhmm

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,,3:24,3:17,3:54,1:40,3:18,2:54,1:01,1:45,1:14,1:26
1,3:24,,0:39,0:51,3:42,6:14,4:26,3:39,2:14,3:51,4:22
2,3:17,0:39,,0:52,3:38,6:10,4:22,3:35,2:10,3:47,4:18
3,3:54,0:51,0:52,,4:13,6:45,4:57,4:10,2:45,4:22,4:54
4,1:40,3:42,3:38,4:13,,4:07,4:13,0:52,1:50,0:39,2:38
5,3:18,6:14,6:10,6:45,4:07,,3:29,3:26,4:20,3:39,2:33
6,2:54,4:26,4:22,4:57,4:13,3:29,,3:32,3:14,3:45,2:09
7,1:01,3:39,3:35,4:10,0:52,3:26,3:32,,1:46,0:24,1:57
8,1:45,2:14,2:10,2:45,1:50,4:20,3:14,1:46,,1:58,2:43
9,1:14,3:51,3:47,4:22,0:39,3:39,3:45,0:24,1:58,,2:10


Looks good.

We now save our distance and travel time matrices DataFrames to CSV so we can access them elsewhere.

In [None]:
# Save distance matrix DataFrame to CSV
distance_matrix.to_csv('data/venue_one_way_distance_matrix.csv')

# Save duration matrix in seconds DataFrame to CSV
duration_matrix_s.to_csv('data/venue_one_way_duration_matrix_seconds.csv')

# Save duration matrix in HH:MM DataFrame to CSV
duration_matrix_hhmm.to_csv('data/venue_one_way_duration_matrix_hhmm.csv')

Done!

One final thought. Let's say you want to test several distance values from the distance matrix to ensure we created the square form matrix correctly. We import the necessary library and define the number of tests to run and the acceptable error threshold in miles. We also initialize a test success boolean.

In [54]:
# Import necessary library
import random

# Define number of tests to run
num_tests = 3

# Define maximum acceptable error threshold in miles
max_acceptable_error = 5

# Initialize test success boolean --> True until proven False
test_success = True

We then run the specified number of tests. We will break down this code piece by piece.

In [55]:
# Run specified number of tests
for test in range(num_tests):

    # Select random venue 1 and venue 2 locations from distance matrix
    random_venue_1 = random.choice(list(distance_matrix.index))
    random_venue_2 = random.choice(list(distance_matrix.index))

    # Find distance value in distance matrix DataFrame
    dm_distance = distance_matrix.iloc[random_venue_1, random_venue_2]

    # Query Google Maps API for distance value in miles
    gmaps = googlemaps.Client(GOOGLE_MAPS_API_KEY)
    trip = gmaps.distance_matrix(
        origins=id_to_name_mapping[random_venue_1],
        destinations=id_to_name_mapping[random_venue_2],
        mode="driving",
        units="metric")
    gmaps_distance = round(trip['rows'][0]['elements'][0]['distance']['value'] / METERS_IN_MILE)

    # Check if distance value falls outside error threshold
    if abs(dm_distance - gmaps_distance) > max_acceptable_error:

        print("Test - FAILURE")
        print("...Venue 1: {}".format(random_venue_1))
        print("...Venue 2: {}".format(random_venue_2))
        print("...Distance Matrix: {}".format(dm_distance))
        print("...Google Maps API Query: {}".format(gmaps_distance))

        # Update test success boolean and stop tests
        test_success = False
        break

if test_success:
    
    print("Test - SUCCESS")

Test - SUCCESS


For each test, we select a random `venue 1` and `venue 2` value from our distance matrix. These random values will be integers as our final distance matrix identified venues using unique integer IDs rather than string venue names.

We then find the corresponding distance value in our distance matrix, and we query `Google Maps Distance Matrix API` for the travel distance. Note to query `Google Maps API`, we need to map our random venue ID to the corresponding venue name. To do this, we use the venue ID to name mapping dictionary that we created earlier.

We calculate the absolute value of our distance matrix distance minus our Google Maps API distance. If the absolute value of the two distances falls outside of our acceptable error threshold, then our test fails. We print an error message, update our test success boolean, and stop running tests. If we make it through our specified number of tests without failure, then our test succeeds, and we print a message accordingly.

We can use a similar block of code to test several travel time values in seconds or `hh:mm` format from the travel time matrices. Note that our acceptable error threshold would be in seconds rather than miles.