### Route Visualization using Geopy, Polyline, Folium
Using this code, we can create an interactive route visualization using Geopy, Polyline, and Folium. 

Originally in our project proposal we said we would use the Google Maps API, but through experimentation it proved to be difficult to work with. You can obtain an API key from Google, but a lot of features cannot be accessed unless you provide billing information to Google. If you provide billing information, there is a free version you can use, but you have a limited number of geocode requests you can make. Requests over that threshold will charge you using your billing information. That's not something that I think we want to deal with for a school project, so I tried to find some alternate methods.

### Modules
The modules we'll be using throughout the code are Pandas and Requests.
- **Pandas**: We use Pandas to create the dataframe containing all of the addresses and their corresponding longitude and latitude coordinates. This dataframe will also be used later in selecting which specific locations we want to visualize.
- **requests**: This module allows us to make http requests in Python. We use requests to query from OSRM and get coordinates for each of our addresses.

In [1]:
# main modules to use
import pandas as pd # create the locations dataframe
import requests # send http requests
# import import_ipynb
%run webscraping_code_touristsite_all.ipynb
%run webscraping_code_hotel_all.ipynb
%run webscraping_code_restaurant_all.ipynb

Exception: File `'webscraping_code_touristsite_all.ipynb.py'` not found.

### Obtaining the Locations
We create dataframes from the csv files we obtained from web scraping. They contain all of the attractions/hotels obtained from webscraping. We then take user input and create the list of attractions/hotels that they want to go to. I will add restaurants later.

In [None]:
# Attractions dataframe
df = pd.read_csv("2520_touristsite.csv")
df = df.drop('Unnamed: 0', 1)
df

In [37]:
tourist_site_names = list(df['Tourist Site Name'].unique())
tourist_site_names

['The Getty Center',
 'Griffith Observatory',
 'Universal Studios Hollywood',
 'Petersen Automotive Museum',
 'The Wizarding World of Harry Potter',
 'Battleship USS Iowa Museum',
 'The Broad',
 'Staples Center',
 'Griffith Park',
 'The Grove',
 'La Brea Tar Pits and Museum',
 'Walt Disney Concert Hall',
 'Natural History Museum of Los Angeles County',
 'Runyon Canyon Park',
 'Venice Canals Walkway',
 'The Nethercutt Collection',
 'Dodger Stadium',
 'Union Station',
 'Hollywood Sign',
 'Bradbury Building',
 'Lake Hollywood Park',
 'Universal CityWalk Hollywood',
 'Los Angeles County Museum of Art',
 'Madame Tussauds Hollywood',
 'Abbot Kinney Boulevard',
 'Angels Flight Railway',
 'University of California, Los Angeles (UCLA)',
 'Hollywood Forever Cemetery',
 'Citadel Outlets',
 'Pantages Theatre',
 'The Hollywood Museum',
 'OUE Skyspace LA',
 'Los Angeles Central Library',
 'Olvera Street',
 'Pierce Brothers Westwood Village Memorial Park',
 'Dolby Theatre',
 'The Greek Theatre',
 'Ci

In [38]:
want_to_go_name = str(input("Which tourist sites do you want to go? (*please input the names of the tourist sites separated by a period)"))

Which tourist sites do you want to go? (*please input the names of the tourist sites separated by a period)Stahl House.Museum of Illusions


In [126]:
import import_ipynb
from ipynb.fs.full.webscraping_code_touristsite_all import visit_html, find_location

ModuleNotFoundError: No module named 'ipynb.fs.full.webscraping_code_touristsite_all.ipynb'; 'ipynb.fs.full.webscraping_code_touristsite_all' is not a package

In [39]:
visit_html = visit_tourist(df, want_to_go_name)

In [40]:
# list of attractions
locations = find_location(visit_html)
locations

['1635 Woods Dr, Los Angeles, CA 90069-1633"',
 '6751 Hollywood Blvd, Los Angeles, CA 90028-4623"']

In [41]:
# Hotels dataframe
hotels = pd.read_csv("450_hotel.csv")
hotels = hotels.drop('Unnamed: 0', 1)
hotels

Unnamed: 0,Rank,Hotel Name,Rate,Site Link
0,1,Hollywood Hotel,4 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
1,2,Hilton Los Angeles/Universal City,4 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
2,3,Hotel Erwin,4.5 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
3,4,Hotel Figueroa,4.5 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
4,5,Loews Hollywood Hotel,4.5 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
...,...,...,...,...
445,208,One Ten Motel,3.5 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
446,216,Casa Luan Motel,4.5 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
447,223,Howard Johnson Plaza Wilshire Royale,4 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...
448,224,Travelodge Hotel At LAX,5 of 5 bubbles,https://www.tripadvisor.com//Hotel_Review-g326...


In [42]:
hotel_want_to_go_name = str(input("Which hotel do you want to live?"))
hotel_visit_html = visit_hotel(hotels, hotel_want_to_go_name)

Which hotel do you want to live?Loews Hollywood Hotel


In [43]:
# list of hotels
hotel_location = find_location(hotel_visit_html)
hotel_location

['1755 North Highland Avenue, Los Angeles, CA 90028-4403',
 '1755 North Highland Avenue, Los Angeles, CA 90028-4403']

### Geocoding
Another module we will be using is **Geopy**, which is a Python client for geocoding. What is geocoding?

**Geocoding** is the process of taking some text-based description of a location (usually the address) and transforming it into coordinates, using the longitude/latitude pair. We will use geocoding to transform the addresses of our hotel/attraction locations (obtained from webscraping) into longitude/latitude coordinates (as far as I know, TripAdvisor doesn't provide coordinates).

From the Geopy module, we will use **Nominatim**, which is a geocoder for **OpenStreetMap (OSM)** data. OSM is a map of the world that is open data, licensed under the Open Database License. As such, we will credit OSM at the end of the notebook. You can check out OSM here:

https://www.openstreetmap.org/

In [44]:
from geopy.geocoders import Nominatim # Geopy is a Python client for geocoding. We use the Nominatim geocoder for OpenStreetMap (OSM) data.

We create a sample locations dataframe using Nominatim geocoding. First, we clean up the list of addresses using the location_cleaner method. This ensures that we will be able to geocode all of the addresses (since OSM sometimes has trouble recognizing addresses that have extra details).

Then, we create the attractions dataframe and hotels dataframe by adding their address and longitude/latitude coordinates.

In [45]:
# import re

In [46]:
# Cleans the addresses by removing the code at the end and then removing the last word of the address with each iteration of
# the loop. OSM sometimes has trouble recognizing addresses that have extra details in the middle of their addresses. 
def location_cleaner(locations):
    locations_copy = []
    for location in locations:
        location = location.rsplit(' ', 1)[0]
        words = location.split()
        location_length = len(words)
        for i in range(location_length, 0, -1):
            if geolocator.geocode(location) != None:
                locations_copy.append(location)
                break
            location = location.rsplit(' ', 1)[0]
    return locations_copy

In [47]:
geolocator = Nominatim(user_agent = "pic16b")

In [48]:
df = pd.DataFrame(columns = ["Address", "Longitude", "Latitude"]) # creating the Pandas dataframe

locations = location_cleaner(locations)
for location in locations:
    loc = geolocator.geocode(location) # geocoding the location
    df.loc[len(df.index)] = [loc, loc.longitude, loc.latitude] # adding the location, longitude, and latitude to the dataframe
    
df

Unnamed: 0,Address,Longitude,Latitude
0,"(Case Study House No. 22, 1635, Woods Drive, H...",-118.370138,34.1004
1,"(6751, Hollywood Boulevard, Whitley Heights Hi...",-118.337259,34.101472


We can see that we have obtained the longitude and latitude for each address and we have successfully added the data to a Pandas dataframe. A quick Google search tells us that our coordinates are pretty accurate, up to about 3 decimal places.

In [91]:
# may or may not need a dataframe depending on if we allow a user to stay at multiple hotels
hotel = pd.DataFrame(columns = ["Address", "Longitude", "Latitude"]) # creating the Pandas dataframe

hotel_locations = location_cleaner(hotel_location)
print(hotel_locations)
for location in hotel_locations:
    loc = geolocator.geocode(location) # geocoding the location
    location_dict = {"Address" : loc, "Longitude": loc.longitude, "Latitude" : loc.latitude}
    hotel = hotel.append(location_dict, ignore_index = True) # adding the location, longitude, and latitude to the dataframe
    
hotel

['1755 North Highland Avenue, Los Angeles, CA', '1755 North Highland Avenue, Los Angeles, CA']


Unnamed: 0,Address,Longitude,Latitude
0,"(1755, North Highland Avenue, Hollywood, Los A...",-118.338602,34.102842
1,"(1755, North Highland Avenue, Hollywood, Los A...",-118.338602,34.102842


In [93]:
hotel = hotel.drop_duplicates(subset = ["Longitude", "Latitude"])
hotel

Unnamed: 0,Address,Longitude,Latitude
0,"(1755, North Highland Avenue, Hollywood, Los A...",-118.338602,34.102842


In [94]:
hotel.iloc[0]["Address"]

Location(1755, North Highland Avenue, Hollywood, Los Angeles, California, 90028, United States, (34.10284167850925, -118.3386019049562, 0.0))

### Creating the Route Visualization
Now that we have our addresses and coordinates within a Pandas dataframe, we can begin to construct the route visualization. For this section, I mainly referenced this article: https://www.thinkdatascience.com/post/2020-03-03-osrm/osrm/

In the article, Mr. Yan makes use of **Open Source Routing Machine (OSRM)**, which is a C++ routing engine for the shortest paths in road networks. Essentially, you use the requests module to perform an http request, obtaining route data for the routes between your specified locations. You can learn more about OSRM and how to construct requests here: http://project-osrm.org/docs/v5.7.0/api/?language=Python#general-options

For the route visualization, we also import the **Folium** and **Polyline** modules. 
- **Folium**: The folium module takes location and route data and plots it on an interactive map. 
- **Polyline**: The polyline module is the Python implementation of Google's Encoded Polyline Algorithm Format, which allows you to store a series of coordinates as an encoded string. You can read about it here: https://developers.google.com/maps/documentation/utilities/polylinealgorithm

In [50]:
import folium # visualizes data on map; wondering if I can replace this with plotly express
import polyline # Python implementation of Google’s Encoded Polyline Algorithm Format

Let's attempt to get the route between the Griffith Observatory and Universal Studios Hollywood. We specify our request url to be the default OSRM url, followed by the mode of transportation (car) and the coordinates for our locations. The first pair of coordinates corresponds to the Griffith Observatory, and the second pair corresponds to Universal Studios Hollywood.

In [119]:
# The following code will be based on the code in the above article. I don't fully understand what it's doing yet, but I'm 
# hoping to get some more practice with it so I can understand it better.

# url = "http://router.project-osrm.org/route/v1/car/-118.475712,34.076951;-118.300293,34.118219;-118.361879,34.138321" # we specify the long/lat coordinates of our starting point and our ending point, as well as mode of transportation; hopefully we can add more waypoints and get the route between multiple locations
# r = requests.get(url) # getting the request
# res = r.json()
# res

We have successfully run the request! From the above results, we get what seems to be some interesting data but also a lot of gibberish. I believe those are the routes and locations that are encoded using Google's Polyline Algorithm. Now, let's attempt to break down the more interesting parts of the data (bear with me, this is new to me as well):
- **Waypoints**: A waypoint is an intermediate point or place on a route. In our case, we get a list of all the locations we travel to in our route. For this example, we only have two locations, our starting point and ending point, so only two locations show up in the waypoints list. This leads me to believe that it is highly possible to construct a route between multiple sightseeing locations and a hotel location, which is one of the goals of our project.
- **Routes**: The routes list contains the routes used to travel between the different locations. In our example, we only have two locations, our starting point and ending point, so there is only one route. I believe that if we add more locations, we will get more routes. 
    - **Geometry**: This encoding gives us the line representing the route. 
    - **Distance**: The distance specified per entry in the routes list gives the distance in meters that the route covers.
    - **Duration**: The duration specified per entry in the routes list gives the duration of the trip in seconds.

In [120]:
# We use the polyline module to decode the encoding into coordinates. Not entirely sure what all of the coordinates mean but 
# I'm guessing they represent locations of different streets when you're changing streets? I don't really know; this requires
# further investigation.

# polyline.decode('yj~nEh|brUzG~AeAhH~IwBjL}Tx_@eXag@wIkDoGpFyHzkBo{Ahk@yT{@iGmTyhAuwD{iGu_AcjBy@whHacAJU}kBkd@a@aT_kBqZyI{Xf`@{\\hJlGdJZ|VnG_L|MkAqW}AeGyOz\\iJ`Y{^jZlHpSjkBbNXSxgBc_@~s@cz@rWshAfgAsVj{@qNpQcC}D')

### Creating the Travel Plan
From our attractions dataframe we can create a sample travel plan for the user. This will be a random itinerary for the user depending on how long they plan to travel. We will determine how many attractions they need to visit per day in order to visit all of the places they want. We will also create separate visualizations for each of the days. Each route will start at the user's hotel and end at the final location (which is randomly determined - I might have to experiment to see if I can generate the shortest route **per day**, though I feel this might be a lot of extra work to complete). 

In [101]:
# user input
travel_length = input("How many days will your travel be? ")
coordinates, addresses = locations_per_day(df, travel_length)
coordinates

How many days will your travel be? 1


[[(-118.3372585, 34.1014721), (-118.37013841063524, 34.1004003)]]

In [102]:
addresses

[[Location(6751, Hollywood Boulevard, Whitley Heights Historic District, Hollywood, Los Angeles, California, 90028, United States, (34.1014721, -118.3372585, 0.0)),
  Location(Case Study House No. 22, 1635, Woods Drive, Hollywood Hills West, Los Angeles, California, 90069, United States, (34.1004003, -118.37013841063524, 0.0))]]

In [54]:
import random

In [100]:
# Determines the number of attractions that the user must visit per day and randomly chooses attractions for each day.
def locations_per_day(df, travel_length):
    travel_length = int(travel_length)
    loc_per_day = []
    number_of_locs = df.shape[0]
    while travel_length != 0:
        loc_per_day.append(number_of_locs // travel_length)
        number_of_locs -= (number_of_locs // travel_length)
        travel_length -= 1
    random.shuffle(loc_per_day) # obtain a random ordering of locations per day
    
    coordinates = [0] * len(loc_per_day)
    used_coordinates = []
    addresses = [0] * len(loc_per_day)
    
    # creates nested lists of tuples within a list that indicate the coordinates of the places you visit per day
    for i in range(len(coordinates)):
        coordinates[i] = []
        addresses[i] = []
        for j in range(0, loc_per_day[i]):
            row = df.sample()
            long_lat = (row.iloc[0]["Longitude"], row.iloc[0]["Latitude"])
            while long_lat in used_coordinates:
                row = df.sample()
                long_lat = (row.iloc[0]["Longitude"], row.iloc[0]["Latitude"])
            coordinates[i].append(long_lat)
            used_coordinates.append(long_lat)
            addresses[i].append(row.iloc[0]["Address"])
            
    
    return (coordinates, addresses)

#### Old get_route Function
Obsolete, do not run!

In [118]:
# Putting what we did before into a function. We get the route between the two locations provided we have the coordinates
# for the locations. Ideally, I will get the coordinates from the Pandas dataframe, though I'm still figuring that out.
def get_route_old(lon_0, lat_0, lon_1, lat_1, lon_2, lat_2):
    
    loc = "{},{};{},{};{},{}".format(lon_0, lat_0, lon_1, lat_1, lon_2, lat_2)
    url = "http://router.project-osrm.org/route/v1/car/"
    r = requests.get(url + loc) # same thing as what we did before, getting the request
    print(r.status_code)
    if r.status_code!= 200: # I don't know what this means
        return {}
  
    res = r.json()   
    routes = polyline.decode(res['routes'][0]['geometry']) # the geometry specifies the polyline encoding; I'm also guessing that if we have more routes we would do this for each of the routes, with the index changing for each route
    start_point = [res['waypoints'][0]['location'][1], res['waypoints'][0]['location'][0]] # 0th waypoint corresponds to the starting location
    end_point = [res['waypoints'][1]['location'][1], res['waypoints'][1]['location'][0]] # 1st waypoint corresponds to the ending location
    distance = res['routes'][0]['distance']
    
    out = {'route':routes,
           'start_point':start_point,
           'end_point':end_point,
           'distance':distance
          } # returning a dictionary with the routes, starting point, ending point, and distance

    return out

#### New get_route Function
This function generates a route for each day (nested list of coordinates). The route begins at the hotel, passes through several tourist site waypoints, and ends at the final tourist site. The user can specify what mode of transportation they want to use.

In [189]:
def get_route(coordinates, hotel, transportation):
    loc = str(hotel.iloc[0]["Longitude"]) + "," + str(hotel.iloc[0]["Latitude"]) + ";"
    for i in range(len(coordinates)):
        if i == len(coordinates) - 1:
            loc += str(coordinates[i][0]) + "," + str(coordinates[i][1])
        else:
            loc += str(coordinates[i][0]) + "," + str(coordinates[i][1]) + ";"
    # loc += str(hotel.iloc[0]["Longitude"]) + "," + str(hotel.iloc[0]["Latitude"]) # unaable to perform round trip
    url = "https://routing.openstreetmap.de/routed-" + transportation + "/route/v1/driving/"
    r = requests.get(url + loc + "?steps=true") # same thing as what we did before, getting the request
    if r.status_code!= 200:
        print("Failed")
        return {}
  
    res = r.json()   
    routes = polyline.decode(res['routes'][0]['geometry']) # the geometry specifies the polyline encoding
    start_point = [res['waypoints'][0]['location'][1], res['waypoints'][0]['location'][0]] # 0th waypoint corresponds to the starting location
    end_point = [res['waypoints'][len(res['waypoints']) - 1]['location'][1], res['waypoints'][len(res['waypoints']) - 1]['location'][0]] # len - 1 waypoint corresponds to the ending location
    waypoints = []
    for i in range(1, len(res['waypoints']) - 1):
        waypoints.append((res['waypoints'][i]['location'][1], res['waypoints'][i]['location'][0]))
    distance = res['routes'][0]['distance']
    duration = res['routes'][0]['duration']

    out = {'route':routes,
           'start_point':start_point,
           'waypoints': waypoints,
           'end_point':end_point,
           'distance':distance,
           'duration':duration
          } # returning a dictionary with the routes, starting point, ending point, and distance

    return out

In [190]:
transportation = str(input("What mode of transportation would you like to use? Specify car, bike, or foot. "))

What mode of transportation would you like to use? Specify car, bike, or foot. car


In [191]:
# obtain the list of routes for the entire travel
route_list = []
for i in range(len(coordinates)):
    test_route = get_route(coordinates[i], hotel, transportation)
    route_list.append(test_route)
route_list

[{'route': [(34.10284, -118.33873),
   (34.10155, -118.33872),
   (34.10155, -118.33726),
   (34.10156, -118.33701),
   (34.10054, -118.33699),
   (34.10054, -118.33871),
   (34.09798, -118.33868),
   (34.09808, -118.36044),
   (34.09797, -118.36672),
   (34.0996, -118.36712),
   (34.10001, -118.36702),
   (34.09996, -118.36768),
   (34.10057, -118.36749),
   (34.10059, -118.36837),
   (34.10095, -118.36817),
   (34.10105, -118.36828),
   (34.10105, -118.36898),
   (34.10078, -118.36912),
   (34.09988, -118.36829),
   (34.09958, -118.3683),
   (34.09997, -118.36935),
   (34.09989, -118.36979),
   (34.1, -118.37013)],
  'start_point': [34.102842, -118.338725],
  'waypoints': [(34.101554, -118.33726)],
  'end_point': [34.099999, -118.370134],
  'distance': 4409.3,
  'duration': 504.9}]

In [182]:
for route_leg in route_list[0]["route"]:
    print(route_leg)

(34.10284, -118.33873)
(34.10155, -118.33872)
(34.10155, -118.33726)
(34.10156, -118.33701)
(34.10054, -118.33699)
(34.10054, -118.33871)
(34.09798, -118.33868)
(34.09808, -118.36044)
(34.09797, -118.36672)
(34.0996, -118.36712)
(34.10001, -118.36702)
(34.09996, -118.36768)
(34.10057, -118.36749)
(34.10059, -118.36837)
(34.10095, -118.36817)
(34.10105, -118.36828)
(34.10105, -118.36898)
(34.10078, -118.36912)
(34.09988, -118.36829)
(34.09958, -118.3683)
(34.09997, -118.36935)
(34.09989, -118.36979)
(34.1, -118.37013)


In [176]:
# list of colors for plotting different routes
list_colors = [
    "red",
    "orange",
    "yellow",
    "green",
    "blue",
    "purple"
]

### Creating the Maps
We plot an individual route and specify the route color (this is mostly to differentiate the routes). We use a green play icon to indicate the starting point, a red stop icon to indicate the ending point, and a blue icon to indicate waypoints.

In [177]:
# Using folium to draw the route on an interactive map. Since none of us are really familiar with folium, it would be worth
# seeing if we can accomplish the same results but using plotly express instead.
def get_map(route, route_color, addresses):
    
    m = folium.Map(location=[(route['start_point'][0] + route['end_point'][0])/2, 
                             (route['start_point'][1] + route['end_point'][1])/2], 
                   zoom_start=15, tooltip = "Hover")

    folium.PolyLine(
        route['route'],
        weight=8,
        color=route_color,
        opacity=0.6
    ).add_to(m)

    folium.Marker(
        location=route['start_point'],
        icon=folium.Icon(icon='play', color='green'),
        popup = hotel.iloc[0]["Address"]
    ).add_to(m)

    folium.Marker(
        location=route['end_point'],
        icon=folium.Icon(icon='stop', color='red'),
        popup = addresses[len(addresses) - 1]
    ).add_to(m)
    
    for i in range(len(route['waypoints'])):
        folium.Marker(
            location=route['waypoints'][i],
            icon=folium.Icon(icon='circle', color='blue'),
            popup = addresses[i]
        ).add_to(m)

    return m

In [178]:
# Getting the map for our test route
maps = []
for i in range(len(route_list)):
    maps.append(get_map(route_list[i], list_colors[i], addresses[i]))

In [179]:
for i in range(0, len(maps)):
    print("Day {}".format(i + 1))
    print("The trveling distance is {} m".format(route_list[i]["distance"]))
    print("The traveling duration is {} min".format(route_list[i]["duration"] / 60))
    display(maps[i])

Day 1
The trveling distance is 4409.3 m
The traveling duration is 8.415 min


This code was made possible by OSM (© OpenStreetMap contributors).

Copyright info for OSM: https://www.openstreetmap.org/copyright

**Note**: This code is not completely up-to-date with the route-visualization code in the Flask webapp. This code provides most of the framework; however, the code in the webapp has been adjusted to fix the issue of entering a number of days greater than the number of tourist attractions.