### Route Visualization using Geopy, Polyline, Folium
Using this code, we can create an interactive route visualization using Geopy, Polyline, and Folium. 

Originally in our project proposal we said we would use the Google Maps API, but through experimentation it proved to be difficult to work with. You can obtain an API key from Google, but a lot of features cannot be accessed unless you provide billing information to Google. If you provide billing information, there is a free version you can use, but you have a limited number of geocode requests you can make. Requests over that threshold will charge you using your billing information. That's not something that I think we want to deal with for a school project, so I tried to find some alternate methods.

### Modules
The modules we'll be using throughout the code are Pandas and Requests.
- **Pandas**: We use Pandas to create the dataframe containing all of the addresses and their corresponding longitude and latitude coordinates. This dataframe will also be used later in selecting which specific locations we want to visualize.
- **requests**: This module allows us to make http requests in Python. We use requests to query from OSRM and get coordinates for each of our addresses.

In [1]:
# main modules to use
import pandas as pd # create the locations dataframe
import requests # send http requests

### Geocoding
Another module we will be using is **Geopy**, which is a Python client for geocoding. What is geocoding?

**Geocoding** is the process of taking some text-based description of a location (usually the address) and transforming it into coordinates, using the longitude/latitude pair. We will use geocoding to transform the addresses of our hotel/attraction locations (obtained from webscraping) into longitude/latitude coordinates (as far as I know, TripAdvisor doesn't provide coordinates).

From the Geopy module, we will use **Nominatim**, which is a geocoder for **OpenStreetMap (OSM)** data. OSM is a map of the world that is open data, licensed under the Open Database License. As such, we will credit OSM at the end of the notebook. You can check out OSM here:

https://www.openstreetmap.org/

In [4]:
from geopy.geocoders import Nominatim # Geopy is a Python client for geocoding. We use the Nominatim geocoder for OpenStreetMap (OSM) data.

We create a sample locations dataframe using Nominatim geocoding. I will update this code later once we have a working web scraper. For now, I am using the top 3 attractions in Los Angeles from TripAdvisor: the Getty Center, the Griffith Observatory, and Universal Studios Hollywood. 

Something I noticed about OSM is that not all addresses work. For example, the address for the Getty Center provided by TripAdvisor is *1200 Getty Center Dr N Sepulveda Blvd & Getty Center Dr, Los Angeles, CA*. However, if you take this address and plug it into OSM's website, you will get no results. This causes problems for the geocoder, and you will not get the location. For this example, I simplified the address for the Getty Center as *1200 Getty Center, Los Angeles*, but for the long term we will have to find some workaround that can apply to all addresses. I think the solution might be some combination of regex and try/except blocks.

In [15]:
# Nominatim geocoding and creating the Pandas dataframe

geolocator = Nominatim(user_agent = "my-app") # use custom user_agent to avoid violating Nominatim's usage policy
df = pd.DataFrame(columns = ["Address", "Longitude", "Latitude"]) # creating the Pandas dataframe
locations = ["1200 Getty Center Dr N Sepulveda Blvd & Getty Center Dr, Los Angeles, CA", 
             "2800 E. Observatory Rd., Los Angeles, CA", 
             "100 Universal City Plaza, Los Angeles, CA"] # top attractions in LA pulled from TripAdvisor

for location in locations:
    if location == "1200 Getty Center Dr N Sepulveda Blvd & Getty Center Dr, Los Angeles, CA":
        location = "1200 Getty Center, Los Angeles"
    loc = geolocator.geocode(location) # geocoding the location
    df.loc[len(df.index)] = [location, loc.longitude, loc.latitude] # adding the location, longitude, and latitude to the dataframe
    
df

Unnamed: 0,Address,Longitude,Latitude
0,"1200 Getty Center, Los Angeles",-118.475712,34.076951
1,"2800 E. Observatory Rd., Los Angeles, CA",-118.300293,34.118219
2,"100 Universal City Plaza, Los Angeles, CA",-118.361879,34.138321


We can see that we have obtained the longitude and latitude for each address and we have successfully added the data to a Pandas dataframe. A quick Google search tells us that our coordinates are pretty accurate, up to about 3 decimal places.

### Creating the Route Visualization
Now that we have our addresses and coordinates within a Pandas dataframe, we can begin to construct the route visualization. For this section, I mainly referenced this article: https://www.thinkdatascience.com/post/2020-03-03-osrm/osrm/

In the article, Mr. Yan makes use of **Open Source Routing Machine (OSRM)**, which is a C++ routing engine for the shortest paths in road networks. Essentially, you use the requests module to perform an http request, obtaining route data for the routes between your specified locations. You can learn more about OSRM and how to construct requests here: http://project-osrm.org/docs/v5.7.0/api/?language=Python#general-options

For the route visualization, we also import the **Folium** and **Polyline** modules. 
- **Folium**: The folium module takes location and route data and plots it on an interactive map. 
- **Polyline**: The polyline module is the Python implementation of Google's Encoded Polyline Algorithm Format, which allows you to store a series of coordinates as an encoded string. You can read about it here: https://developers.google.com/maps/documentation/utilities/polylinealgorithm

In [8]:
import folium # visualizes data on map; wondering if I can replace this with plotly express
import polyline # Python implementation of Google’s Encoded Polyline Algorithm Format

Let's attempt to get the route between the Griffith Observatory and Universal Studios Hollywood. We specify our request url to be the default OSRM url, followed by the mode of transportation (car) and the coordinates for our locations. The first pair of coordinates corresponds to the Griffith Observatory, and the second pair corresponds to Universal Studios Hollywood.

In [9]:
# The following code will be based on the code in the above article. I don't fully understand what it's doing yet, but I'm 
# hoping to get some more practice with it so I can understand it better.

url = "http://router.project-osrm.org/route/v1/car/-118.300293,34.118219;-118.361879,34.138321" # we specify the long/lat coordinates of our starting point and our ending point, as well as mode of transportation; hopefully we can add more waypoints and get the route between multiple locations
r = requests.get(url) # getting the request
res = r.json()
res

{'code': 'Ok',
 'waypoints': [{'hint': '8BVQiP___38FAAAABQAAAJgAAAADAAAASbJlQAAAAACSqNJC1c3gPwUAAAAFAAAAmAAAAAMAAAC6QwAAKeHy-AeeCAJ74fL4S5oIAg4AzwbIxWTp',
   'distance': 106.310396,
   'location': [-118.300375, 34.119175],
   'name': 'West Observatory Road'},
  {'hint': 'SzlBiP___38AAAAAHQAAAEgAAAArAAAARO68Pa9_oEELyklCJcPxQQAAAAAdAAAASAAAACsAAAC6QwAAivHx-KLqCALp8PH40egIAgIArwbIxWTp',
   'distance': 53.674346,
   'location': [-118.361718, 34.138786],
   'name': 'Universal Hollywood Drive'}],
 'routes': [{'legs': [{'steps': [],
     'weight': 924.9,
     'distance': 11467.6,
     'summary': '',
     'duration': 924.9}],
   'weight_name': 'routability',
   'geometry': '{{foEjp`qUuC[iDeCoBlAwD}@iDr@g@}I}EiB?qAdN_AtMiHhJkJ|B{KxIsFbGKdHdI`IKpSjkBbNXZvbBo@`DwSnd@kJnNuHbEga@tIeNxFwIfHcR`U_NnJw[l\\wGvLqFnZiFbRqNpQmD}Ah@_B',
   'weight': 924.9,
   'distance': 11467.6,
   'duration': 924.9}]}

We have successfully run the request! From the above results, we get what seems to be some interesting data but also a lot of gibberish. I believe those are the routes and locations that are encoded using Google's Polyline Algorithm. Now, let's attempt to break down the more interesting parts of the data (bear with me, this is new to me as well):
- **Waypoints**: A waypoint is an intermediate point or place on a route. In our case, we get a list of all the locations we travel to in our route. For this example, we only have two locations, our starting point and ending point, so only two locations show up in the waypoints list. This leads me to believe that it is highly possible to construct a route between multiple sightseeing locations and a hotel location, which is one of the goals of our project.
- **Routes**: The routes list contains the routes used to travel between the different locations. In our example, we only have two locations, our starting point and ending point, so there is only one route. I believe that if we add more locations, we will get more routes. 
    - **Geometry**: This encoding gives us the line representing the route. 
    - **Distance**: The distance specified per entry in the routes list gives the distance in meters that the route covers.
    - **Duration**: The duration specified per entry in the routes list gives the duration of the trip in seconds.

In [10]:
# We use the polyline module to decode the encoding into coordinates. Not entirely sure what all of the coordinates mean but 
# I'm guessing they represent locations of different streets when you're changing streets? I don't really know; this requires
# further investigation.

polyline.decode('{{foEjp`qUuC[iDeCoBlAwD}@iDr@g@}I}EiB?qAdN_AtMiHhJkJ|B{KxIsFbGKdHdI`IKpSjkBbNXZvbBo@`DwSnd@kJnNuHbEga@tIeNxFwIfHcR`U_NnJw[l\\wGvLqFnZiFbRqNpQmD}Ah@_B')

[(34.11918, -118.30038),
 (34.11993, -118.30024),
 (34.12078, -118.29957),
 (34.12134, -118.29996),
 (34.12226, -118.29965),
 (34.12311, -118.29991),
 (34.12331, -118.29816),
 (34.12442, -118.29763),
 (34.12442, -118.29722),
 (34.12199, -118.2969),
 (34.11964, -118.29541),
 (34.11783, -118.29359),
 (34.1172, -118.29153),
 (34.11547, -118.29031),
 (34.11417, -118.29025),
 (34.1127, -118.29188),
 (34.11109, -118.29182),
 (34.1078, -118.30916),
 (34.10538, -118.30929),
 (34.10524, -118.32525),
 (34.10548, -118.32606),
 (34.1088, -118.33206),
 (34.11062, -118.33454),
 (34.11217, -118.33552),
 (34.11765, -118.33723),
 (34.12008, -118.33848),
 (34.1218, -118.33996),
 (34.12486, -118.34349),
 (34.12726, -118.34533),
 (34.13186, -118.35004),
 (34.13326, -118.35224),
 (34.13447, -118.35664),
 (34.13564, -118.3597),
 (34.13813, -118.36267),
 (34.139, -118.3622),
 (34.13879, -118.36172)]

In [11]:
# Putting what we did before into a function. We get the route between the two locations provided we have the coordinates
# for the locations. Ideally, I will get the coordinates from the Pandas dataframe, though I'm still figuring that out.
def get_route(pickup_lon, pickup_lat, dropoff_lon, dropoff_lat):
    
    loc = "{},{};{},{}".format(pickup_lon, pickup_lat, dropoff_lon, dropoff_lat)
    url = "http://router.project-osrm.org/route/v1/car/"
    r = requests.get(url + loc) # same thing as what we did before, getting the request
    if r.status_code!= 200: # I don't know what this means
        return {}
  
    res = r.json()   
    routes = polyline.decode(res['routes'][0]['geometry']) # the geometry specifies the polyline encoding; I'm also guessing that if we have more routes we would do this for each of the routes, with the index changing for each route
    start_point = [res['waypoints'][0]['location'][1], res['waypoints'][0]['location'][0]] # 0th waypoint corresponds to the starting location
    end_point = [res['waypoints'][1]['location'][1], res['waypoints'][1]['location'][0]] # 1st waypoint corresponds to the ending location
    distance = res['routes'][0]['distance']
    
    out = {'route':routes,
           'start_point':start_point,
           'end_point':end_point,
           'distance':distance
          } # returning a dictionary with the routes, starting point, ending point, and distance

    return out

In [16]:
# We specify our starting and ending coordinates using the rows from the Pandas dataframe. The first pair corresponds to the 
# Griffith Observatory and the second pair corresponds to Universal Studios Hollywood.

#pickup_lon, pickup_lat, dropoff_lon, dropoff_lat = -118.300293,34.118219,-118.361879,34.138321
pickup_lon, pickup_lat, dropoff_lon, dropoff_lat = df.iloc[1]["Longitude"], df.iloc[1]["Latitude"], df.iloc[2]["Longitude"], df.iloc[2]["Latitude"]
test_route = get_route(pickup_lon, pickup_lat, dropoff_lon, dropoff_lat)
test_route

{'route': [(34.11918, -118.30038),
  (34.11993, -118.30024),
  (34.12078, -118.29957),
  (34.12134, -118.29996),
  (34.12226, -118.29965),
  (34.12311, -118.29991),
  (34.12331, -118.29816),
  (34.12442, -118.29763),
  (34.12442, -118.29722),
  (34.12199, -118.2969),
  (34.11964, -118.29541),
  (34.11783, -118.29359),
  (34.1172, -118.29153),
  (34.11547, -118.29031),
  (34.11417, -118.29025),
  (34.1127, -118.29188),
  (34.11109, -118.29182),
  (34.1078, -118.30916),
  (34.10538, -118.30929),
  (34.10524, -118.32525),
  (34.10548, -118.32606),
  (34.1088, -118.33206),
  (34.11062, -118.33454),
  (34.11217, -118.33552),
  (34.11765, -118.33723),
  (34.12008, -118.33848),
  (34.1218, -118.33996),
  (34.12486, -118.34349),
  (34.12726, -118.34533),
  (34.13186, -118.35004),
  (34.13326, -118.35224),
  (34.13447, -118.35664),
  (34.13564, -118.3597),
  (34.13813, -118.36267),
  (34.139, -118.3622),
  (34.13879, -118.36172)],
 'start_point': [34.119175, -118.300375],
 'end_point': [34.1387

In [17]:
# Using folium to draw the route on an interactive map. Since none of us are really familiar with folium, it would be worth
# seeing if we can accomplish the same results but using plotly express instead.
def get_map(route):
    
    m = folium.Map(location=[(route['start_point'][0] + route['end_point'][0])/2, 
                             (route['start_point'][1] + route['end_point'][1])/2], 
                   zoom_start=13)

    folium.PolyLine(
        route['route'],
        weight=8,
        color='blue',
        opacity=0.6
    ).add_to(m)

    folium.Marker(
        location=route['start_point'],
        icon=folium.Icon(icon='play', color='green')
    ).add_to(m)

    folium.Marker(
        location=route['end_point'],
        icon=folium.Icon(icon='stop', color='red')
    ).add_to(m)

    return m

In [18]:
# Getting the map for our test route
get_map(test_route)

This code was made possible by OSM (© OpenStreetMap contributors), as well as Michael Yan at Think.Data.Science.

Copyright info for OSM: https://www.openstreetmap.org/copyright