In [1]:
from geopy.distance import geodesic
from scipy.spatial import distance

import networkx as nx
import numpy as np
import pandas as pd
import random
import time
import warnings
warnings.filterwarnings('ignore')

## Brief Description of Model

## Initial Data and Setting "User-defined" Variables

In [2]:
california_cities = pd.read_csv('cal_cities_lat_long.csv')

california_cities.columns = ['city', 'latitude', 'longitude']

california_cities['coords'] = list(zip(california_cities['latitude'], california_cities['longitude']))

During a road trip, one can expect to want to take breaks from time to time. If the team had more time to work on this project, then we would like for the website to ask users how often they envision wanting a break, and for how long they would be on break. Furthermore, with more time we might be able to acquire traffic data that would help the model accurately estimate how many miles per hour a user can expect to travel.

Due to the time constraints of this project, we decided that our prototype would assume that a user would go on a 10-minute break after every 2 hours of driving. Furthermore, we suspect that the average speed limit is likely to be (approximately) 55 miles per hour.

In [3]:
break_interval = 120
break_time = 10

break_percentage = break_time / break_interval
average_speed = 55

In [4]:
def calculate_travel_time(total_distance):
    est_travel_time = (total_distance / average_speed) * (1 + break_percentage)
    return est_travel_time

In [5]:
california_cities

Unnamed: 0,city,latitude,longitude,coords
0,Adelanto,34.582769,-117.409214,"(34.582769, -117.409214)"
1,Agoura Hills,34.153339,-118.761675,"(34.153339, -118.761675)"
2,Alameda,37.765206,-122.241636,"(37.765206, -122.241636)"
3,Albany,37.886869,-122.297747,"(37.886869, -122.297747)"
4,Alhambra,34.095286,-118.127014,"(34.095286, -118.127014)"
...,...,...,...,...
454,Woodland,38.678517,-121.773297,"(38.678517, -121.773297)"
455,Yorba Linda,33.888625,-117.813111,"(33.888625, -117.813111)"
456,Yreka,41.735419,-122.634472,"(41.735419, -122.634472)"
457,Yuba City,39.140447,-121.616911,"(39.140447, -121.616911)"


In the next line, the code takes a random sample of 10 cities to simulate a roadtrip being planned.

In [6]:
cities_sample = california_cities.sample(n=10)
coordinates = list(zip(cities_sample['latitude'], cities_sample['longitude']))

## Random Route (Baseline Model)

A good way to assess the quality of the model is to compare it to a different model that chooses the route of the road trip randomly.

In [7]:
def calculate_random_route(route_coords):
    start_time = time.time()
    df = route_coords

    random_route = df.sample(frac=1).reset_index(drop=True)
    random_route = pd.concat([random_route, random_route.iloc[[0]]], ignore_index=True)

    def calculate_distance(coord1, coord2):
        return geodesic(coord1, coord2).miles

    distances = []
    for i in range(len(random_route) - 1):
        dist = calculate_distance((random_route.loc[i, 'latitude'], random_route.loc[i, 'longitude']),
                                (random_route.loc[i+1, 'latitude'], random_route.loc[i+1, 'longitude']))
        distances.append(dist)

    rolling_distances = pd.Series(distances).cumsum()

    random_route['next_city'] = random_route['city'].shift(-1)
    random_route['coords'] = list(zip(random_route['latitude'], random_route['longitude']))
    random_route['distance_to_next'] = distances + [distances[0]]
    random_route['segment_travel_time'] = random_route['distance_to_next'].apply(calculate_travel_time)
    random_route_time_array = np.array(random_route['segment_travel_time'])
    random_route['rolling_travel_time'] = np.cumsum(random_route_time_array)
    random_route['total_rolling_distance'] = [0] + rolling_distances.tolist()

    final_df = random_route[['city',
                             'coords',
                              'next_city', 
                              'distance_to_next', 
                              'total_rolling_distance',
                              'segment_travel_time',
                              'rolling_travel_time']]
    final_df['distance_to_next'] = final_df['distance_to_next'].round(0)
    final_df['total_rolling_distance'] = final_df['total_rolling_distance'].round(0)
    final_df['segment_travel_time'] = final_df['segment_travel_time'].round(2)
    final_df['rolling_travel_time'] = final_df['rolling_travel_time'].round(2)

    
    print('Random Road Trip Route')

    display(final_df)

    print(f'Total Distance: {round(rolling_distances.iloc[-1])} miles')
    print(f'Estimated Travel Time: {round(calculate_travel_time(rolling_distances.iloc[-1]))} hours')
    end_time = time.time()

    # final_df.to_json('random_route_5.json', orient='records', lines=True)

    print(f'Runtime for Random Route: {round(end_time - start_time, 2)} seconds')

## Optimal Route

Note that the optimal route function (and the random route function) calculates not only the distance and time needed to travel from one city to the next, it also calculates the rolling distance and time values.

In [8]:
def calculate_optimal_route(df):
    coords_sample = df
    start_time = time.time()
    def calculate_distance_matrix(coords_sample):
        num_coords = len(coords_sample)
        distance_matrix = [[0] * num_coords for _ in range(num_coords)]

        for i in range(num_coords):
            for j in range(i + 1, num_coords):
                distance = geodesic(coords_sample[i], coords_sample[j]).miles
                distance_matrix[i][j] = distance
                distance_matrix[j][i] = distance
                
        return distance_matrix

    distance_matrix = calculate_distance_matrix(coords_sample)

    G = nx.Graph()

    # add nodes
    for i, coord in enumerate(coords_sample):
        G.add_node(i, pos=coord)

    # add edges with weights
    for i in range(len(coords_sample)):
        for j in range(i + 1, len(coords_sample)):
            G.add_edge(i, j, weight=distance_matrix[i][j])
            G.add_edge(j, i, weight=distance_matrix[i][j])

    tsp_path = nx.approximation.traveling_salesman_problem(G, weight='weight')


    total_distance = 0
    segment_distances = [0]
    for i in range(len(tsp_path) - 1):
        distance_traveled = G[tsp_path[i]][tsp_path[i + 1]]['weight']
        segment_distances.append(distance_traveled)
        total_distance += distance_traveled
    # Add the distance to return to the starting point
    if G.has_edge(tsp_path[-1], tsp_path[0]):
        distance_traveled = G[tsp_path[-1]][tsp_path[0]]['weight']
        segment_distances.append(distance_traveled)
        total_distance += distance_traveled

    # Map node indices to location names
    route_coords = [coords_sample[node] for node in tsp_path]
    route_names = []
    for coord in [coords_sample[node] for node in tsp_path]:
        city_name = california_cities[california_cities['coords'] == coord]['city'].values[0]
        route_names.append(city_name)


    segment_travel_times = [calculate_travel_time(dist) for dist in segment_distances]
    rolling_travel_time = np.cumsum(segment_travel_times)
    rolling_distance = np.cumsum(segment_distances)

    optimal_route = pd.DataFrame({'city': route_names,
                                'coordinates': route_coords,
                                'segment_distance': segment_distances,
                                'rolling_distance': rolling_distance,
                                'segment_travel_time': segment_travel_times,
                                'rolling_travel_time': rolling_travel_time
                                })

    optimal_route['segment_distance'] = optimal_route['segment_distance'].round()
    optimal_route['rolling_distance'] = optimal_route['rolling_distance'].round()
    optimal_route['segment_travel_time'] = optimal_route['segment_travel_time'].round(2)
    optimal_route['rolling_travel_time'] = optimal_route['rolling_travel_time'].round(2)

    # Output the route and total distance
    print('Optimal Road Trip Route')
    display(optimal_route)
    print(f'Total Distance: {round(total_distance)} miles')

    print('Estimated Travel Time:', round(calculate_travel_time(total_distance)), 'hours')
    end_time = time.time()

    # optimal_route.to_json('optimal_route_5.json', orient='records', lines=True)
    print(f'Runtime for Optimized Route: {round(end_time - start_time, 2)} seconds')

## Find Nearest Interesting Activity

Recall that at the end of the first notebook it was written that the model also computes which of ten popular attractions is located closest to one of the cities on a user's road trip itinerary.

If we had more time, one way to improve this part of the model is to indicate which city on the road trip is closest to <u>EACH</u> of the ten attractions. This improvement would give users a thorough understanding of which attractions they might have time to visit.

In [9]:
attractions_names = ['Alcatraz Island', 'Balboa Park', 'Disneyland', 'Hearst Castle', 'Joshua Tree National Park',
                     'Legoland California', 'Malibu Beach', 'Redwoods National Park', 'Universal Studios Hollywood', 'Yosemite National Park']

attractions_latitudes = [37.8265991, 32.7325629, 33.8120294, 35.6852218, 33.8875175, 33.1264746, 34.0318786, 37.8488593, 34.1373322, 37.6727756]
attractions_longitudes = [-122.4228001, -117.1472597, -117.9190063, -121.1679822, -115.8082581, -117.3113757, -118.6880654, -119.5570877, -118.3532224, -119.7282411]

attractions_df = pd.DataFrame({'attraction': attractions_names, 'latitude': attractions_latitudes, 'longitude': attractions_longitudes})

display(attractions_df)

Unnamed: 0,attraction,latitude,longitude
0,Alcatraz Island,37.826599,-122.4228
1,Balboa Park,32.732563,-117.14726
2,Disneyland,33.812029,-117.919006
3,Hearst Castle,35.685222,-121.167982
4,Joshua Tree National Park,33.887518,-115.808258
5,Legoland California,33.126475,-117.311376
6,Malibu Beach,34.031879,-118.688065
7,Redwoods National Park,37.848859,-119.557088
8,Universal Studios Hollywood,34.137332,-118.353222
9,Yosemite National Park,37.672776,-119.728241


In [10]:
def calculate_nearest_attractions(df):
    start_time = time.time()
    coords_sample_df = df
    # beginning with an empty distance matrix
    distance_matrix_interesting = np.zeros((len(coords_sample_df), len(attractions_df)))

    # iterating through the coords_sample_df (sample of city coordinates, simulating a planned trip)
    # and the attractions_df to find the distance between all locations
    for i in range(len(coords_sample_df)):
        for j in range(len(attractions_df)):
            city_coords = (coords_sample_df.iloc[i]['latitude'], coords_sample_df.iloc[i]['longitude'])
            place_coords = (attractions_df.iloc[j]['latitude'], attractions_df.iloc[j]['longitude'])
            # using  geopy.distance.geodesic to calculate the distance between lat/lon in miles
            distance_matrix_interesting[i, j] = geodesic(city_coords, place_coords).miles

    # creating the dataframe of the distance matrix for easier inspection
    attraction_distance_df = pd.DataFrame(distance_matrix_interesting, index=coords_sample_df['city'], columns=attractions_df['attraction'])

    display(attraction_distance_df.round())

    # when axis=None is specified in dataframe.min, it finds the lowest value in both axes
    best_match = attraction_distance_df.min(axis=None)

    # function to iterate through the provided dataframe looking for the provided value
    def search_dataframe(df, value):
        found_item = []
        for column in df.columns:
            for index in df.index:
                if df.loc[index, column] == value:
                    found_item.append((column, index, value))
                    print('The closest attraction on the trip is', column)
                    print(f'It is approximately {round(value)} miles away from {index}')
        if not found_item:
            print('Value not found in dataframe')

    search_dataframe(attraction_distance_df, best_match)
    end_time = time.time()
    print(f'Runtime for Attraction Search: {round(end_time - start_time, 2)} seconds')

## Road Trip Example

Let's now pretend that a road trip has been planned. Suppose we desire to go to the following ten cities:

In [11]:
cities_sample['city']

393          Soledad
414    Thousand Oaks
385       Sebastopol
449         Williams
121        Elk Grove
13           Arcadia
305      Pismo Beach
168    Hermosa Beach
284           Orinda
187          Isleton
Name: city, dtype: object

Now that we have our cities picked, we will throw caution to the wind and pick a route at random!

Afterwards, we will optimize that route in order to minimize the distance traveled. 

Additionally, we will try to find the closest of our top 10 favorite attractions to any of the cities we will be in.

### Random Route Example

In [12]:
calculate_random_route(cities_sample)

Random Road Trip Route


Unnamed: 0,city,coords,next_city,distance_to_next,total_rolling_distance,segment_travel_time,rolling_travel_time
0,Isleton,"(38.161861, -121.611622)",Soledad,432.0,0.0,8.51,8.51
1,Soledad,"(32.991156, -117.271147)",Arcadia,91.0,432.0,1.78,10.3
2,Arcadia,"(34.139728, -118.035344)",Pismo Beach,164.0,523.0,3.23,13.52
3,Pismo Beach,"(35.142753, -120.641283)",Williams,289.0,686.0,5.69,19.21
4,Williams,"(39.154614, -122.149419)",Elk Grove,66.0,975.0,1.31,20.52
5,Elk Grove,"(38.4088, -121.371617)",Orinda,57.0,1042.0,1.13,21.65
6,Orinda,"(37.877147, -122.179689)",Sebastopol,50.0,1099.0,0.99,22.64
7,Sebastopol,"(38.402136, -122.823881)",Thousand Oaks,398.0,1150.0,7.84,30.48
8,Thousand Oaks,"(34.107231, -118.057847)",Hermosa Beach,26.0,1548.0,0.51,30.99
9,Hermosa Beach,"(33.862236, -118.399519)",Isleton,347.0,1573.0,6.83,37.82


Total Distance: 1920 miles
Estimated Travel Time: 38 hours
Runtime for Random Route: 0.02 seconds


### Optimized Route Example

In [13]:
calculate_optimal_route(coordinates)

Optimal Road Trip Route


Unnamed: 0,city,coordinates,segment_distance,rolling_distance,segment_travel_time,rolling_travel_time
0,Solana Beach,"(32.991156, -117.271147)",0.0,0.0,0.0,0.0
1,Hermosa Beach,"(33.862236, -118.399519)",89.0,89.0,1.75,1.75
2,Pismo Beach,"(35.142753, -120.641283)",155.0,244.0,3.06,4.81
3,Orinda,"(37.877147, -122.179689)",207.0,451.0,4.08,8.89
4,Isleton,"(38.161861, -121.611622)",37.0,488.0,0.72,9.61
5,Elk Grove,"(38.4088, -121.371617)",21.0,509.0,0.42,10.03
6,Williams,"(39.154614, -122.149419)",66.0,576.0,1.31,11.34
7,Sebastopol,"(38.402136, -122.823881)",63.0,639.0,1.25,12.59
8,Arcadia,"(34.139728, -118.035344)",397.0,1036.0,7.82,20.41
9,Temple City,"(34.107231, -118.057847)",3.0,1039.0,0.05,20.46


Total Distance: 1128 miles
Estimated Travel Time: 22 hours
Runtime for Optimized Route: 0.03 seconds


### Nearest Attraction Example

In [14]:
calculate_nearest_attractions(cities_sample)

attraction,Alcatraz Island,Balboa Park,Disneyland,Hearst Castle,Joshua Tree National Park,Legoland California,Malibu Beach,Redwoods National Park,Universal Studios Hollywood,Yosemite National Park
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Soledad,442.0,19.0,68.0,290.0,105.0,10.0,109.0,359.0,101.0,351.0
Thousand Oaks,354.0,108.0,22.0,207.0,130.0,80.0,37.0,271.0,17.0,263.0
Sebastopol,45.0,505.0,419.0,209.0,501.0,478.0,380.0,182.0,386.0,176.0
Williams,93.0,524.0,437.0,245.0,506.0,496.0,402.0,167.0,405.0,166.0
Elk Grove,70.0,458.0,371.0,188.0,440.0,430.0,337.0,106.0,339.0,103.0
Arcadia,354.0,110.0,24.0,207.0,129.0,81.0,38.0,270.0,18.0,261.0
Pismo Beach,210.0,261.0,180.0,48.0,289.0,236.0,135.0,196.0,148.0,182.0
Hermosa Beach,354.0,106.0,28.0,201.0,149.0,81.0,20.0,282.0,19.0,273.0
Orinda,14.0,454.0,368.0,161.0,451.0,427.0,329.0,143.0,335.0,135.0
Isleton,50.0,451.0,364.0,173.0,439.0,423.0,328.0,114.0,332.0,108.0


The closest attraction on the trip is Legoland California
It is approximately 10 miles away from Soledad
Runtime for Attraction Search: 0.05 seconds


Wonderful! Not only does the model work, its runtime is extremely short!

The following is documentation of a specific example of what the model outputted:

A random route to ten particular cities and back would result in traveling 2361 miles, which would take about 47 hours (including breaks)!

When we used the optimizer to generate the shortest possible route, we almost cut travel distance and time in half! We ended up with 1335 miles traveled across 26 hours. 

One of the ten cities was only 4 miles away from the famous Balboa park! That is definitely worth the short detour!

## Conclusion

Overall, the team thinks that the prototype has potential to expand into a fantastic tool for road trip enthusiasts! In addition to the ideas for improvement that have already been mentioned throughout the notebook, it would be interesting to investigate what additional supplemental data the website can give about each city on a road trip itinerary. 

So far we have the violent crime and Chipotle data, and it might be nice to have data on the cost of accomodation (e.g. hotel, Airbnb) in each city. This data would help users make informed decisions about where to stay overnight. Also, more restaurant data wouldn't hurt! Perhaps data about the locations of both cheap, fast-food restaurants and high-end restaurants?