# Travel Planner Project
The following notebook is the code that I used to get a portion of the data needed for our group's travel planner project (Data-X / IEOR 135).

Topics involved:
- Webscraping with BeautifulSoup and Selenium (scraping tripadvisor.com)
- Pandas
- Geolocation

### Doing some webscraping with BeautifulSoup

In [1]:
# imports for webscraping
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
import re

!pip install geopy
import geopy 
from geopy.geocoders import Nominatim


Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/53/fc/3d1b47e8e82ea12c25203929efb1b964918a77067a874b2c7631e2ec35ec/geopy-1.21.0-py2.py3-none-any.whl (104kB)
[K    100% |████████████████████████████████| 112kB 2.1MB/s ta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.21.0
[33mYou are using pip version 18.0, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [16]:
def get_top_attractions(link, city_name, df):
    page = requests.get(link)
    soup = BeautifulSoup(page.content, 'html.parser')

    attractions = soup.find_all('div', {'class': lambda x: x and 'attractions-attraction-overview-pois-PoiInfo__info' in x})

    for att in attractions:
        place = att.find("h3").text
        rank_within_city = place.split('.')[0]
        place = place.split('.')[1].strip()
        genre = att.find("span", {"class": "_21qUqkJx"}).text
        link_to_att_reviews = att.find('a', {'class': lambda x: x and 'attractions-attraction-overview-pois-PoiInfo__name' in x})["href"]
        link_to_att_reviews = "www.tripadvisor.com" + link_to_att_reviews
        num_reviews = att.find("span", {'class': lambda x: x and 'reviewCount' in x}).text
        price = att.find("span", {'class': lambda x: x and 'attractions-attraction-overview-pois-PriceFrom__amount' in x})
        if price == None:
            price = "No Price"
        else:
            price = price.text
        row = [place, city_name, genre, num_reviews, price]
        row_dict = {"Attraction Name": place, 
                    "City": city_name, 
                    "Type": genre, 
                    "Number of Reviews": num_reviews, 
                    "Price": price,
                    "Rank": rank_within_city,
                    "Link to Attraction Reviews": link_to_att_reviews}

        df = df.append(row_dict, ignore_index = True)
    return df


In [17]:
att_columns = ["Attraction Name", "City", "Type", "Number of Reviews", "Price", "Rank", "Link to Attraction Reviews"]
top_attractions = pd.DataFrame(data=[], columns=att_columns)

link_dict = {"San Francisco": "https://www.tripadvisor.com/Attractions-g60713-Activities-San_Francisco_California.html",
             "New Orleans": "https://www.tripadvisor.com/Attractions-g60864-Activities-New_Orleans_Louisiana.html",
             "Los Angeles": "https://www.tripadvisor.com/Attractions-g32655-Activities-Los_Angeles_California.html"}
# "Las Vegas": "https://www.tripadvisor.com/Attractions-g45963-Activities-Las_Vegas_Nevada.html",


for city in link_dict:
    top_attractions = get_top_attractions(link_dict[city], city, top_attractions)

In [18]:
top_attractions

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews
0,Alcatraz Island,San Francisco,Sights & Landmarks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...
2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...
4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...
7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...
8,Walt Disney Family Museum,San Francisco,Museums,"2,650 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...
9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...


In [5]:
# add in location (latitude and longitude)

def get_lat(attraction_name, city_name):
    geolocator = Nominatim(user_agent="lol")
    location = geolocator.geocode(attraction_name + ", " + city_name)
    if location == None:
        return "No Latitude Found"
    lat = location.latitude
    return lat

def get_long(attraction_name, city_name):
    geolocator = Nominatim(user_agent="lol")
    location = geolocator.geocode(attraction_name + ", " + city_name)
    if location == None:
        return "No Longitude Found"
    long = location.longitude
    return long

def get_address(attraction_name, city_name):
    geolocator = Nominatim(user_agent="lol")
    location = geolocator.geocode(attraction_name + ", " + city_name)
    if location == None:
        return "No Address Found"
    address = location.address
    return address 


top_attractions["Latitude"] = top_attractions["Attraction Name"].apply(get_lat)
top_attractions["Longitude"] = top_attractions["Attraction Name"].apply(get_long)
top_attractions["Address"] = top_attractions["Attraction Name"].apply(get_address, args=(city_name,))
top_attractions

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Latitude,Longitude,Address
0,Alcatraz Island,San Francisco,Sights & Landmarks,"55,264 reviews",$105.00,1,37.826721,-122.422759,"Alcatraz Island, Parade Ground, San Francisco,..."
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,247 reviews",No Price,2,37.830321,-122.47975,"Golden Gate Bridge, San Francisco, San Francis..."
2,Oracle Park,San Francisco,Sights & Landmarks,"7,081 reviews",No Price,3,37.778612,-122.390267,"Oracle Park, 24, Willie Mays Plaza, South Beac..."
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,37.802752,-122.45122,"Presidio Dance Theatre, Presidio Parkway, Mari..."
4,Golden Gate Park,San Francisco,Nature & Parks,"9,499 reviews",$15.00,5,37.769368,-122.482184,"Golden Gate Park, Richmond District, San Franc..."
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,377 reviews",No Price,6,37.75464,-122.44648,"Twin Peaks, Christmas Tree Point Road, Cole Va..."
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,37.800906,-122.398523,"Exploratorium, Herb Caen Way, Northeast Waterf..."
7,California Academy of Sciences,San Francisco,Museums,"5,951 reviews",$15.00,8,37.769825,-122.466087,"California Academy of Sciences, 55, Music Conc..."
8,Walt Disney Family Museum,San Francisco,Museums,"2,756 reviews",$25.00,9,37.801363,-122.458721,"Walt Disney Family Museum, 104, Montgomery Str..."
9,Lands End,San Francisco,Outdoor Activities,"2,822 reviews",No Price,10,50.066263,-5.714822,"Land's End, Sennen Cove, Cornwall, South West ..."


We see that some of the cities have very "weird" addresses, some of which don't even seem like English and are likely from other parts of the world. Let's incorporate the attraction name and city name together to put into the geolocator to find the correct address for each attraction.

In [21]:
# add in location (latitude and longitude) a different way to guarantee the right cities

lat_list = []
long_list = []
add_list = []
for index, row in top_attractions.iterrows():
    geolocator = Nominatim(user_agent="lol")
    location = geolocator.geocode(row["Attraction Name"] + ", " + row["City"])
    if location == None:
        lat_list.append("No Address Found")
        long_list.append("No Address Found")
        add_list.append("No Address Found")
    else:
        lat = location.latitude
        long = location.longitude
        address = location.address
        lat_list.append(lat)
        long_list.append(long)
        add_list.append(address)



top_attractions["Latitude"] = lat_list
top_attractions["Longitude"] = long_list
top_attractions["Address"] = add_list
top_attractions

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews,Latitude,Longitude,Address
0,Alcatraz Island,San Francisco,Sights & Landmarks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8267,-122.423,"Alcatraz Island, Parade Ground, San Francisco,..."
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8303,-122.48,"Golden Gate Bridge, San Francisco, San Francis..."
2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7786,-122.39,"Oracle Park, 24, Willie Mays Plaza, South Beac..."
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7694,-122.482,"Golden Gate Park, Richmond District, San Franc..."
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7546,-122.446,"Twin Peaks, Christmas Tree Point Road, Cole Va..."
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8009,-122.399,"Exploratorium, Herb Caen Way, Northeast Waterf..."
7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7698,-122.466,"California Academy of Sciences, 55, Music Conc..."
8,Walt Disney Family Museum,San Francisco,Museums,"2,650 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8014,-122.459,"Walt Disney Family Museum, 104, Montgomery Str..."
9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7839,-122.507,"Lands End, San Francisco, San Francisco City a..."


In [22]:
# palace of fine arts in SF
# battleship USS Iowa museum in LA
# the grove seems off LA

# palace of fine arts
top_attractions.at[3, "Latitude"] = 37.8020
top_attractions.at[3, "Longitude"] = -122.4486
top_attractions.at[3, "Address"] = "Palace of Fine Arts, 3601, Lyon St, San Francisco, California, 94123, United States of America"

# battleship USS Iowa museum
top_attractions.at[24, "Latitude"] = 33.7423
top_attractions.at[24, "Longitude"] = -118.2773
top_attractions.at[24, "Address"] = "Battleship USS Iowa Museum, 250, S Harbor Blvd, Los Angeles, California, 90731, United States of America"

# the grove
top_attractions.at[28, "Latitude"] = 34.0722
top_attractions.at[28, "Longitude"] = -118.3581
top_attractions.at[28, "Address"] = "The Grove, 189, The Grove Dr, Los Angeles, California, 90036, United States of America"

top_attractions

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews,Latitude,Longitude,Address
0,Alcatraz Island,San Francisco,Sights & Landmarks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8267,-122.423,"Alcatraz Island, Parade Ground, San Francisco,..."
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8303,-122.48,"Golden Gate Bridge, San Francisco, San Francis..."
2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7786,-122.39,"Oracle Park, 24, Willie Mays Plaza, South Beac..."
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...,37.802,-122.449,"Palace of Fine Arts, 3601, Lyon St, San Franci..."
4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7694,-122.482,"Golden Gate Park, Richmond District, San Franc..."
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7546,-122.446,"Twin Peaks, Christmas Tree Point Road, Cole Va..."
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8009,-122.399,"Exploratorium, Herb Caen Way, Northeast Waterf..."
7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7698,-122.466,"California Academy of Sciences, 55, Music Conc..."
8,Walt Disney Family Museum,San Francisco,Museums,"2,650 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8014,-122.459,"Walt Disney Family Museum, 104, Montgomery Str..."
9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7839,-122.507,"Lands End, San Francisco, San Francisco City a..."


In [23]:
# download as csv
top_attractions.to_csv("top_ten_attraction_tripadvisor.csv", index=False)

In [63]:
# from google.colab import drive
# drive.mount('/content/gdrive')

In [64]:
# from google.colab import files
# files.download('top_ten_attraction_tripadvisor.csv')

## Using Selenium to get the top 30 Attractions with Dynamic Website Scraping

Trying to get more of the top attractions from the tripadvisor site (30 if we are able to click the "see more" button)

In [24]:
!pip install selenium
import selenium

Collecting selenium
[?25l  Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
[K    100% |████████████████████████████████| 911kB 6.6MB/s eta 0:00:01
Installing collected packages: selenium
Successfully installed selenium-3.141.0
[33mYou are using pip version 18.0, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [25]:
from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import ElementNotInteractableException

In [26]:
option = webdriver.ChromeOptions()
option.add_argument('--incognito')
option.add_argument('--ignore-certificate-errors')
option.add_argument("--test-type")

In [40]:
def get_top_attractions_selenium(browser, link, city_name, df):
    browser.get(link)
    seemore_button = browser.find_element_by_class_name('attractions-attraction-overview-main-TopPOIs__see_more--2Vsb-')
    seemore_button.click()
    browser.implicitly_wait(5)
    soup = BeautifulSoup(browser.page_source, 'html.parser')
    attractions = soup.find_all('div', {'class': lambda x: x and 'attractions-attraction-overview-pois-PoiInfo__info' in x})

    for att in attractions:
        place = att.find("h3").text
        rank_within_city = place.split('.')[0]
        place = place.split('.')[1].strip()
        genre = att.find("span", {"class": "_21qUqkJx"}).text
        link_to_att_reviews = att.find('a', {'class': lambda x: x and 'attractions-attraction-overview-pois-PoiInfo__name' in x})["href"]
        link_to_att_reviews = "www.tripadvisor.com" + link_to_att_reviews
        num_reviews = att.find("span", {'class': lambda x: x and 'reviewCount' in x}).text
        price = att.find("span", {'class': lambda x: x and 'attractions-attraction-overview-pois-PriceFrom__amount' in x})
        if price == None:
            price = "No Price"
        else:
            price = price.text
        row = [place, city_name, genre, num_reviews, price]
        row_dict = {"Attraction Name": place, 
                    "City": city_name, 
                    "Type": genre, 
                    "Number of Reviews": num_reviews, 
                    "Price": price,
                    "Rank": rank_within_city,
                    "Link to Attraction Reviews": link_to_att_reviews}

        df = df.append(row_dict, ignore_index = True)
    return df


In [41]:
browser = webdriver.Chrome(executable_path='chromedriver', options=option)
att_columns = ["Attraction Name", "City", "Type", "Number of Reviews", "Price", "Rank", "Link to Attraction Reviews"]
top_attractions_30_sf = pd.DataFrame(data=[], columns=att_columns)
top_attractions_30_no = pd.DataFrame(data=[], columns=att_columns)
top_attractions_30_la = pd.DataFrame(data=[], columns=att_columns)


link_dict = {"San Francisco": "https://www.tripadvisor.com/Attractions-g60713-Activities-San_Francisco_California.html",
             "New Orleans": "https://www.tripadvisor.com/Attractions-g60864-Activities-New_Orleans_Louisiana.html",
             "Los Angeles": "https://www.tripadvisor.com/Attractions-g32655-Activities-Los_Angeles_California.html"}

# did separately otherwise it was only getting top 30 for the last one
top_attractions_30_sf = get_top_attractions_selenium(browser, link_dict["San Francisco"], "San Francisco", top_attractions_30_sf)
top_attractions_30_no = get_top_attractions_selenium(browser, link_dict["New Orleans"], "New Orleans", top_attractions_30_no)
top_attractions_30_la = get_top_attractions_selenium(browser, link_dict["Los Angeles"], "Los Angeles", top_attractions_30_la)


top_attractions_30_sf

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews
0,Alcatraz Island,San Francisco,Nature & Parks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...
2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...
4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...
7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...
8,Walt Disney Family Museum,San Francisco,Museums,"2,756 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...
9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...


In [42]:
top_attractions_30_no

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews
0,The National WWII Museum,New Orleans,Museums,"29,174 reviews",$29.77,1,www.tripadvisor.com/Attraction_Review-g60864-d...
1,Garden District,New Orleans,Sights & Landmarks,"8,747 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60864-d...
2,Frenchmen Street,New Orleans,Sights & Landmarks,"11,832 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60864-d...
3,New Orleans City Park,New Orleans,Nature & Parks,"4,220 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60864-d...
4,Jackson Square,New Orleans,Sights & Landmarks,"15,923 reviews",No Price,5,www.tripadvisor.com/Attraction_Review-g60864-d...
5,French Quarter,New Orleans,Sights & Landmarks,"22,147 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60864-d...
6,Preservation Hall,New Orleans,Concerts & Shows,"5,768 reviews",No Price,7,www.tripadvisor.com/Attraction_Review-g60864-d...
7,Audubon Zoo,New Orleans,Nature & Parks,"2,868 reviews",No Price,8,www.tripadvisor.com/Attraction_Review-g60864-d...
8,Blaine Kern's Mardi Gras World,New Orleans,Museums,"4,219 reviews",No Price,9,www.tripadvisor.com/Attraction_Review-g60864-d...
9,St,New Orleans,Sights & Landmarks,"3,780 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60864-d...


In [43]:
top_attractions_30_la

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews
0,The Getty Center,Los Angeles,Museums,"14,495 reviews",$150.00,1,www.tripadvisor.com/Attraction_Review-g32655-d...
1,Griffith Observatory,Los Angeles,Museums,"20,234 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g32655-d...
2,Universal Studios Hollywood,Los Angeles,Water & Amusement Parks,"36,452 reviews",$109.99,3,www.tripadvisor.com/Attraction_Review-g32655-d...
3,Petersen Automotive Museum,Los Angeles,Museums,"2,324 reviews",$16.00,4,www.tripadvisor.com/Attraction_Review-g32655-d...
4,Battleship USS Iowa Museum,Los Angeles,Museums,"1,963 reviews",$29.95,5,www.tripadvisor.com/Attraction_Review-g32655-d...
5,The Broad,Los Angeles,Museums,"1,707 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g32655-d...
6,Staples Center,Los Angeles,Sights & Landmarks,"3,437 reviews",No Price,7,www.tripadvisor.com/Attraction_Review-g32655-d...
7,Griffith Park,Los Angeles,Nature & Parks,"3,165 reviews",No Price,8,www.tripadvisor.com/Attraction_Review-g32655-d...
8,The Grove,Los Angeles,Shopping,"2,475 reviews",No Price,9,www.tripadvisor.com/Attraction_Review-g32655-d...
9,La Brea Tar Pits and Museum,Los Angeles,Museums,"3,158 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g32655-d...


In [48]:
# combine all three into one
top_attractions_30 = pd.concat([top_attractions_30_sf, top_attractions_30_no, top_attractions_30_la])
top_attractions_30 = top_attractions_30.reset_index()
top_attractions_30

Unnamed: 0,index,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews
0,0,Alcatraz Island,San Francisco,Nature & Parks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...
1,1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...
2,2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...
3,3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...
4,4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...
5,5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...
6,6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...
7,7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...
8,8,Walt Disney Family Museum,San Francisco,Museums,"2,756 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...
9,9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...


In [50]:
# getting the location for the above data frame
# add in location (latitude and longitude) a different way to guarantee the right cities

def get_location_info(df):
    lat_list = []
    long_list = []
    add_list = []
    for index, row in df.iterrows():
        geolocator = Nominatim(user_agent="lol")
        location = geolocator.geocode(row["Attraction Name"] + ", " + row["City"])
        if location == None:
            lat_list.append("No Address Found")
            long_list.append("No Address Found")
            add_list.append("No Address Found")
        else:
            lat = location.latitude
            long = location.longitude
            address = location.address
            lat_list.append(lat)
            long_list.append(long)
            add_list.append(address)
    df["Latitude"] = lat_list
    df["Longitude"] = long_list
    df["Address"] = add_list
    
get_location_info(top_attractions_30)

In [55]:
top_attractions_30 = top_attractions_30.drop("index", axis=1)
top_attractions_30

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews,Latitude,Longitude,Address
0,Alcatraz Island,San Francisco,Nature & Parks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8267,-122.423,"Alcatraz Island, Parade Ground, San Francisco,..."
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8303,-122.48,"Golden Gate Bridge, San Francisco, San Francis..."
2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7786,-122.39,"Oracle Park, 24, Willie Mays Plaza, South Beac..."
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7694,-122.482,"Golden Gate Park, Richmond District, San Franc..."
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7546,-122.446,"Twin Peaks, Christmas Tree Point Road, Cole Va..."
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8009,-122.399,"Exploratorium, Herb Caen Way, Northeast Waterf..."
7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7698,-122.466,"California Academy of Sciences, 55, Music Conc..."
8,Walt Disney Family Museum,San Francisco,Museums,"2,756 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8014,-122.459,"Walt Disney Family Museum, 104, Montgomery Str..."
9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7839,-122.507,"Lands End, San Francisco, San Francisco City a..."


### Fill in the missing address values manually

In [60]:
# find all the indices and places with missing values
top_attractions_30[(top_attractions_30["Address"] == "No Address Found") & (top_attractions_30["Type"] != "Transportation")]

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews,Latitude,Longitude,Address
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
11,Ferry Building Marketplace,San Francisco,Shopping,"7,358 reviews",No Price,12,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
14,San Francisco Museum of Modern Art (SFMOMA),San Francisco,Museums,"2,036 reviews",$25.00,15,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
24,Angel Island State Park,San Francisco,Nature & Parks,974 reviews,No Price,25,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
28,16 Avenue Tiled Steps,San Francisco,Sights & Landmarks,635 reviews,No Price,29,www.tripadvisor.com/Attraction_Review-g60713-d...,No Address Found,No Address Found,No Address Found
45,Lafayette Cemetery No,New Orleans,Sights & Landmarks,"2,849 reviews",No Price,16,www.tripadvisor.com/Attraction_Review-g60864-d...,No Address Found,No Address Found,No Address Found
46,Old New Orleans Rum Distillery,New Orleans,Food & Drink,"1,196 reviews",No Price,17,www.tripadvisor.com/Attraction_Review-g60864-d...,No Address Found,No Address Found,No Address Found
49,Old River Road Plantation Adventure,New Orleans,Nature & Parks,542 reviews,No Price,20,www.tripadvisor.com/Attraction_Review-g60864-d...,No Address Found,No Address Found,No Address Found
50,The Sydney and Walda Besthoff Sculpture Garden...,New Orleans,Museums,"1,336 reviews",No Price,21,www.tripadvisor.com/Attraction_Review-g60864-d...,No Address Found,No Address Found,No Address Found
54,New Orleans Jazz & Heritage Festival,New Orleans,Events,160 reviews,No Price,25,www.tripadvisor.com/Attraction_Review-g60864-d...,No Address Found,No Address Found,No Address Found


In [61]:
# palace of fine arts
top_attractions_30.at[3, "Latitude"] = 37.8020
top_attractions_30.at[3, "Longitude"] = -122.4486
top_attractions_30.at[3, "Address"] = "Palace of Fine Arts, 3601, Lyon St, San Francisco, California, 94123, United States of America"

# ferry building marketplace
top_attractions_30.at[11, "Latitude"] = 37.7958
top_attractions_30.at[11, "Longitude"] = -122.3938
top_attractions_30.at[11, "Address"] = "1 Ferry Building, San Francisco, CA 94111"

# sfmoma
top_attractions_30.at[14, "Latitude"] = 37.7857
top_attractions_30.at[14, "Longitude"] = -122.4011
top_attractions_30.at[14, "Address"] = "151 3rd St, San Francisco, CA 94103"

# angel island state park
top_attractions_30.at[24, "Latitude"] = 37.8609
top_attractions_30.at[24, "Longitude"] = -122.4326
top_attractions_30.at[24, "Address"] = "Tiburon, CA 94920"

# 16 avenue tiled steps
top_attractions_30.at[28, "Latitude"] = 37.7563
top_attractions_30.at[28, "Longitude"] = -122.4732
top_attractions_30.at[28, "Address"] = "16th Ave, San Francisco, CA 94122"

# lafayette cemetery no
top_attractions_30.at[45, "Latitude"] = 29.9288
top_attractions_30.at[45, "Longitude"] = -90.0854
top_attractions_30.at[45, "Address"] = "1427 Washington Ave, New Orleans, LA 70130"

# old new orleans rum distillery
top_attractions_30.at[46, "Latitude"] = 29.9865
top_attractions_30.at[46, "Longitude"] = -90.0592
top_attractions_30.at[46, "Address"] = "2815 Frenchmen St, New Orleans, LA 70122"

# old river road plantation adventure
top_attractions_30.at[49, "Latitude"] = 29.9617
top_attractions_30.at[49, "Longitude"] = -90.0810
top_attractions_30.at[49, "Address"] = "2041 Canal St, New Orleans, LA 70112"

# The Sydney and Walda Besthoff Sculpture Garden
top_attractions_30.at[50, "Latitude"] = 29.9863
top_attractions_30.at[50, "Longitude"] = -90.0948
top_attractions_30.at[50, "Address"] = "1 Collins Diboll Cir, New Orleans, LA 70124"

# harrah's casino new orleans
top_attractions_30.at[58, "Latitude"] = 29.9496
top_attractions_30.at[58, "Longitude"] = -90.0646
top_attractions_30.at[58, "Address"] = "8 Canal St, New Orleans, LA 70130"

# The Historic New Orleans Collection
top_attractions_30.at[59, "Latitude"] = 29.9572
top_attractions_30.at[59, "Longitude"] = -90.0660
top_attractions_30.at[59, "Address"] = "520 Royal St, New Orleans, LA 70130"

# battleship USS Iowa museum
top_attractions_30.at[64, "Latitude"] = 33.7423
top_attractions_30.at[64, "Longitude"] = -118.2773
top_attractions_30.at[64, "Address"] = "Battleship USS Iowa Museum, 250, S Harbor Blvd, Los Angeles, California, 90731, United States of America"

# Venice Canals Walkway
top_attractions_30.at[73, "Latitude"] = 33.9835
top_attractions_30.at[73, "Longitude"] = -118.4677
top_attractions_30.at[73, "Address"] = "Venice, CA 90292"

# The Nethercutt Collection
top_attractions_30.at[74, "Latitude"] = 34.3074
top_attractions_30.at[74, "Longitude"] = -118.4640
top_attractions_30.at[74, "Address"] = "15151 Bledsoe St, Sylmar, CA 91342"

# Universal CityWalk Hollywood
top_attractions_30.at[80, "Latitude"] = 34.1362
top_attractions_30.at[80, "Longitude"] = -118.3552
top_attractions_30.at[80, "Address"] = "100 Universal City Plaza, Universal City, CA 91608"

# University of California, Los Angeles (UCLA)
top_attractions_30.at[85, "Latitude"] = 34.0689
top_attractions_30.at[85, "Longitude"] = -118.4452
top_attractions_30.at[85, "Address"] = "Los Angeles, CA 90095"

# Hollywood Forever Cemetery
top_attractions_30.at[86, "Latitude"] = 34.0889
top_attractions_30.at[86, "Longitude"] = -118.3191
top_attractions_30.at[86, "Address"] = "6000 Santa Monica Blvd, Los Angeles, CA 90038"

# Pantages Theatre
top_attractions_30.at[88, "Latitude"] = 34.1020
top_attractions_30.at[88, "Longitude"] = -118.3258
top_attractions_30.at[88, "Address"] = "6233 Hollywood Blvd, Los Angeles, CA 90028"

# # the grove
# top_attractions.at[28, "Latitude"] = 34.0722
# top_attractions.at[28, "Longitude"] = -118.3581
# top_attractions.at[28, "Address"] = "The Grove, 189, The Grove Dr, Los Angeles, California, 90036, United States of America"

top_attractions_30

Unnamed: 0,Attraction Name,City,Type,Number of Reviews,Price,Rank,Link to Attraction Reviews,Latitude,Longitude,Address
0,Alcatraz Island,San Francisco,Nature & Parks,"55,267 reviews",$105.00,1,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8267,-122.423,"Alcatraz Island, Parade Ground, San Francisco,..."
1,Golden Gate Bridge,San Francisco,Sights & Landmarks,"49,248 reviews",No Price,2,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8303,-122.48,"Golden Gate Bridge, San Francisco, San Francis..."
2,Oracle Park,San Francisco,Sights & Landmarks,"7,082 reviews",No Price,3,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7786,-122.39,"Oracle Park, 24, Willie Mays Plaza, South Beac..."
3,Palace of Fine Arts Theatre,San Francisco,Concerts & Shows,"4,414 reviews",No Price,4,www.tripadvisor.com/Attraction_Review-g60713-d...,37.802,-122.449,"Palace of Fine Arts, 3601, Lyon St, San Franci..."
4,Golden Gate Park,San Francisco,Nature & Parks,"9,502 reviews",$15.00,5,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7694,-122.482,"Golden Gate Park, Richmond District, San Franc..."
5,Twin Peaks,San Francisco,Sights & Landmarks,"6,378 reviews",No Price,6,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7546,-122.446,"Twin Peaks, Christmas Tree Point Road, Cole Va..."
6,Exploratorium,San Francisco,Museums,"3,491 reviews",$19.95,7,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8009,-122.399,"Exploratorium, Herb Caen Way, Northeast Waterf..."
7,California Academy of Sciences,San Francisco,Museums,"5,953 reviews",$15.00,8,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7698,-122.466,"California Academy of Sciences, 55, Music Conc..."
8,Walt Disney Family Museum,San Francisco,Museums,"2,756 reviews",$25.00,9,www.tripadvisor.com/Attraction_Review-g60713-d...,37.8014,-122.459,"Walt Disney Family Museum, 104, Montgomery Str..."
9,Lands End,San Francisco,Nature & Parks,"2,822 reviews",No Price,10,www.tripadvisor.com/Attraction_Review-g60713-d...,37.7839,-122.507,"Lands End, San Francisco, San Francisco City a..."


In [62]:
# download as csv
top_attractions_30.to_csv("top_thirty_attraction_tripadvisor_sf_no_la.csv", index=False)