# **This notebook is for the IBM Data Science Capstone Project on Coursera**

by: Alex Henner

## Introduction

There are many people who enjoy traveling to new states, but have a certain palate and would like the ability to pair their tastes with the best cities to travel to in order to get the best experience. In this paper we will explore using a recommendation system to find the top 5 state capitals to visit based on your palate. We will be gathering venue data using Foursquare and will be using the amount of venues in each category based off individual Capitals. We will be focusing our data on the State Capitals.

## Data

In [103]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np

#!pip install geopy
from geopy.geocoders import Nominatim

import time

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!pip install folium
import folium

import requests
from pandas.io.json import json_normalize

Grabbing the State Capitals as well as the Area

In [104]:
#get a table with the State Capitals
wiki = pd.read_html('https://en.wikipedia.org/wiki/List_of_capitals_in_the_United_States', header=1)
state_cap = wiki[1]
state_cap.drop(index=50, axis=0, inplace=True)
state_cap.drop(['Since', 'Proper', 'MSA/µSA', 'Rank in State (city proper)', 'CSA'], axis=1, inplace=True)

#finding the center latitude/longitude for each State Capital
geolocator = Nominatim(user_agent="ny_explorer")
latitudes = []
longitudes = []

for state, capital in zip(state_cap['State'], state_cap['Capital']):
    location = geolocator.geocode(f'{capital}, {state}')
    latitudes.append(location.latitude)
    longitudes.append(location.longitude)

#adding the latitude and longitude for each Capital
state_cap['Latitude'] = latitudes
state_cap['Longitude'] = longitudes 

state_cap.head()

Unnamed: 0,State,Capital,Area (mi2),Latitude,Longitude
0,Alabama,Montgomery,159.8,32.366966,-86.300648
1,Alaska,Juneau,2716.7,58.30195,-134.419734
2,Arizona,Phoenix,517.6,33.448437,-112.074142
3,Arkansas,Little Rock,116.2,34.746481,-92.289595
4,California,Sacramento,97.9,38.581061,-121.493895


### Visualize the State Capitals

In [105]:
#getting location for center of USA
location = geolocator.geocode('United States')
latitude = location.latitude
longitude = location.longitude - 6  #to get Hawaii in the frame

In [106]:
map_usa = folium.Map(location=[latitude, longitude], zoom_start=3)

#printing State Capitals to map
for lat, lng, capital, state in zip(state_cap['Latitude'], state_cap['Longitude'], state_cap['Capital'], state_cap['State']):
    label = f'{capital}, {state}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_usa) 

map_usa

In [107]:
CLIENT_ID = '0FMTRSXF00IKBQUDFHO4JRM10IHI40E2GESZU0JPW0N10YIR' # your Foursquare ID
CLIENT_SECRET = 'ASA3QD3F0I1AVAO14HDHO3V4XESSWHNIO1GYONSWFTMBNVF3' # your Foursquare Secret
ACCESS_TOKEN = 'T5N1J1WSVGVA4I5XRJK2ZOSYTLS31CJ21BVJ23JOWGGNKYV0' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100

In [108]:
def getNearbyVenues(capital, state, latitudes, longitudes, radius=100000):
    
    venues_list=[]
    search_querys = ['coffee', 'breakfast', 'dinner', 'lunch', 'food', 'resturant', 'take out', "Jamaican", "American", "Japanese", "Mediterranean", "Korean", "Italian",  "hamburger", "Asian", "fast food"]
    
    
    for capital, state, lat, lng, rad in zip(capital, state, latitudes, longitudes, radius):
        #print(f'Getting locations from {capital}, {state}')
        time.sleep(1)
        count = 0
        for search_query in search_querys:
            
            # finding radius using sq mi area of Capital or the max limit for Foursquare, whichever is lower
            radius = min(int(((float(rad)**0.5) / 2) * 1609.34), 100000)  #Assuming sq mi is actual square centered at location and change to meters

            # format url
            time.sleep(0.1)
            url = url = f'https://api.foursquare.com/v2/venues/search?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={lat},{lng}&v={VERSION}&query={search_query}&radius={radius}&limit={LIMIT}'
            
            # make the GET request
            try:
                results = requests.get(url).json()["response"]['venues']
            except:
                results = requests.get(url).json()
                print(results['meta'])
                continue
            
            # return only relevant information for each nearby venue
            for v in results:
                try:
                    venues_list.append([ capital,
                        state,
                        lat, 
                        lng, 
                        v['id'],
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],  
                        v['categories'][0]['name']])
                except:
                    venues_list.append([ capital,
                        state,
                        lat, 
                        lng, 
                        v['id'],
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],  
                        np.nan])
                finally:
                    count += 1
        #print(f'Found {count} venues\n\n')
                    
    nearby_venues = pd.DataFrame(venues_list)
    #nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Capital',
                  'State',
                  'Capital Latitude', 
                  'Capital Longitude', 
                  'ID',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [109]:
capital_venues = getNearbyVenues(state_cap['Capital'], state_cap['State'], state_cap['Latitude'], state_cap['Longitude'], state_cap['Area (mi2)'])
capital_venues.head()

{'code': 200, 'requestId': '6009b5061aa0e03d6315e8d3'}


Unnamed: 0,Capital,State,Capital Latitude,Capital Longitude,ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Montgomery,Alabama,32.366966,-86.300648,5891e7c72ec36420ba56ec49,Prevail Union Craft Coffee,32.377642,-86.308226,Coffee Shop
1,Montgomery,Alabama,32.366966,-86.300648,4f95762be4b0a510a824e32c,La' Rosa Coffee & Deli,32.375826,-86.295269,Deli / Bodega
2,Montgomery,Alabama,32.366966,-86.300648,4f32204619836c91c7b89a27,Stop and Sip Coffee,32.37555,-86.2955,Food
3,Montgomery,Alabama,32.366966,-86.300648,4bc8f746fb84c9b602301a3e,The Coffee Bean,32.380147,-86.213099,Coffee Shop
4,Montgomery,Alabama,32.366966,-86.300648,5231dce911d2a5199006eee7,The Coffee House,32.350234,-86.286101,Coffee Shop


In [111]:
x1 = capital_venues.shape[0]
#drop any venue duplicates picked up by the different search queries
capital_venues.drop_duplicates(subset=['ID'], inplace=True)
x2 = capital_venues.shape[0]
print(f'Started off with {x1} venues, deleted {x1 - x2} duplicates, leaving {x2} venues')

Started off with 11768 venues, deleted 1532 duplicates, leaving 10236 venues


In [112]:
#saving
capital_venues.to_csv('capital_venues.csv')

In [113]:
#opening so to not have to rerun
capital_venues = pd.read_csv('capital_venues.csv')
capital_venues.drop('Unnamed: 0',1, inplace=True)
capital_venues.head()

Unnamed: 0,Capital,State,Capital Latitude,Capital Longitude,ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Montgomery,Alabama,32.366966,-86.300648,5891e7c72ec36420ba56ec49,Prevail Union Craft Coffee,32.377642,-86.308226,Coffee Shop
1,Montgomery,Alabama,32.366966,-86.300648,4f95762be4b0a510a824e32c,La' Rosa Coffee & Deli,32.375826,-86.295269,Deli / Bodega
2,Montgomery,Alabama,32.366966,-86.300648,4f32204619836c91c7b89a27,Stop and Sip Coffee,32.37555,-86.2955,Food
3,Montgomery,Alabama,32.366966,-86.300648,4bc8f746fb84c9b602301a3e,The Coffee Bean,32.380147,-86.213099,Coffee Shop
4,Montgomery,Alabama,32.366966,-86.300648,5231dce911d2a5199006eee7,The Coffee House,32.350234,-86.286101,Coffee Shop


In [114]:
#list of venues not related to food
drop_venues = ['Tourist Information Center', 'Gun Range','Comic Shop', 'Train Station', 'Hotel', 'Motel', 'Event Space', 'Pet Service', 'Comedy Club', 'Concert Hall', 'Farm', 'Church', 'General Travel', 'Train', 'Gourmet Shop', 'Non-Profit', 'Conference Room', 'Building', 'Other Great Outdoors', 'Parking', 'Gift Shop', 'College Quad', 'Auditorium', 'Health & Beauty Service', 'Scenic Lookout', 'Clothing Store', 'Office', 'Toy / Game Store', 'Government Building', 'Tour Provider', 'Grocery Store', 'Food Service', 'Convenience Store', 'Liquor Store', 'Supermarket', 'Gas Station', 'Miscellaneous Shop', 'Massage Studio', 'Business Center', 'Medical Center', 
               'Paper / Office Supplies Store', 'Car Wash', 'Arcade', 'Construction & Landscaping', 'Market', 'Bank', 'Tattoo Parlor', 'Financial or Legal Service', 'Campaign Office', 'General College & University', 'River', 'Trail', 'Monument / Landmark', 'Bike Trail', 'High School', 'Capitol Building', 'College Classroom', 'Sculpture Garden', 'Factory', 'Intersection', 'Club House', "Doctor's Office", 'Cosmetics Shop', 'Art Gallery', 'Community College', 'Insurance Office', 'Furniture / Home Store', 'Thrift / Vintage Store', 'Assisted Living', 'Bridge', 'Outdoor Supply Store', 'Plane', 'Automotive Shop', 'Garden',
               'Boat or Ferry', 'Historic Site', 'Spiritual Center', 'Residential Building (Apartment / Condo)', 'Martial Arts School', 'Shopping Plaza', 'General Entertainment', 'Beach', 'Farmers Market', 'Coworking Space', 'Jewelry Store', 'IT Services', 'Theater', 'Travel Agency', 'Warehouse Store', 'Kitchen Supply Store', 'Bridal Shop', 'Hospital', 'Spa', 'Smoke Shop', 'Hobby Shop', 'Housing Development', 'Gym / Fitness Center', 'Basketball Court', 'Discount Store', 'Sporting Goods Shop', "Men's Store", 'Auto Dealership', 'Business Service', 'Storage Facility', 'Hardware Store', 'Cultural Center', 'Check Cashing Service', 
               'Print Shop', 'Pharmacy', 'Park', 'Cruise Ship', 'Lingerie Store', 'Other Repair Shop', 'Home Service', 'Professional & Other Places', 'Warehouse', 'Coffee Roaster', 'Distribution Center', 'Travel Lounge', 'Performing Arts Venue', 'Pedestrian Plaza', 'Yoga Studio', 'Island', 'Plaza', 'Locksmith', 'College Residence Hall', 'Athletics & Sports', 'Recreation Center', 'Real Estate Office', 'Lounge', 'Airport Gate', 'Lawyer', 'Dance Studio', 'Botanical Garden', 'National Park', 'Resort', 'Lake', 'Convention Center', 'Airport Service', 'Social Club', "Dentist's Office", 'Credit Union', 'Moving Target', 'Trade School',
               'Acupuncturist', 'Candy Store', 'Night Market', 'Department Store', 'Ski Chalet', 'Well', 'Exhibit', 'Arts & Entertainment', 'Music Venue', 'Student Center', 'Outlet Store', 'Gym', 'Nursery School', 'Airport Terminal', 'Research Laboratory', 'Library', 'Airport', 'Electronics Store', 'Road', 'Salon / Barbershop', 'Shoe Store', 'Recycling Facility', 'Chiropractor', 'Tech Startup', 'Marijuana Dispensary', 'Herbs & Spices Store', 'Gaming Cafe', 'School', 'Science Museum', 'Pet Store', 'Mobile Phone Shop', 'Other Nightlife', 'Tailor Shop', 'Advertising Agency',  'College Administrative Building',  'Pool', 'Design Studio',
               'Boutique', 'Laundry Service', 'Transportation Service', 'Art Museum', 'College & University', 'Zoo Exhibit',  'EV Charging Station', 'Shop & Service', 'Field', 'Nightclub', 'Rehab Center', 'Bus Stop', 'Shopping Mall', 'Arts & Crafts Store', 'Physical Therapist', 'Racetrack', 'College Auditorium', 'Antique Shop', 'Community Center', 'Garden Center', "Women's Store", 'Outdoors & Recreation', 'Taxi', 'Auto Workshop', 'Auto Garage', 'College Academic Building', 'Nightlife Spot', 'Bike Shop', 'Tennis Court', 'Meeting Room', 'Shrine', 'Indie Theater', 'Medical Lab', 'College Lab', 'Water Park', 'Bus Line', 'Baseball Field',
               'Fraternity House', 'Daycare', 'Sports Club', 'Museum', 'Embassy / Consulate', 'Sorority House', 'University', 'Animal Shelter', 'Zoo', 'Vehicle Inspection Station', 'Harbor / Marina', 'Line / Queue', 'Other Event', 'Hotel Pool', 'ATM', 'College Theater', 'Carpet Store', 'Airport Lounge', 'Cemetery', 'Language School', 'Outdoor Event Space', 'College Communications Building', 'Fruit & Vegetable Store', 'Flea Market', 'Accessories Store', 'Gun Shop', 'Pier', 'History Museum', 'Fair', 'Post Office', 'Hockey Arena', 'Event Service', 'Medical School', 'Skate Park', 'Playground', 'Bookstore', 'Police Station', 'Military Base',
               'Paintball Field', 'Alternative Healer', 'Vape Store', 'College Science Building', 'Veterinarian', 'Lighting Store', 'Mental Health Office', 'College History Building', "Veterans' Organization", 'Corporate Amenity', 'Tunnel', 'Metro Station', 'Music School', 'Outdoor Gym', 'Big Box Store', 'Hospital Ward', 'Movie Theater', 'Rock Club', 'Organic Grocery', 'Kids Store', 'Dog Run', 'Campground', 'Disc Golf', 'Surf Spot', 'Radio Station', 'Elementary School', 'Neighborhood', 'Circus', 'Outlet Mall', 'Bus Station', 'Child Care Service', 'Flower Shop', 'Rest Area', 'Butcher', 'Hot Spring', 'Dry Cleaner', 'Travel & Transport',
               'College Basketball Court', 'Prison', 'Mattress Store', 'Temple', 'Board Shop', 'College Library', 'Courthouse', 'Video Game Store', 'City', 'Bathing Area', 'College Rec Center', 'College Bookstore', 'Casino', 'Pawn Shop', 'Video Store', 'Baby Store', 'Skating Rink', 'Jazz Club', 'Rental Service', 'Baggage Claim', 'College Gym', 'Vacation Rental', 'Funeral Home', 'Emergency Room', 'Multiplex', 'Theme Park', 'Stadium', 'Stables', 'Boxing Gym', 'Escape Room', 'College Technology Building', 'Public Art', 'Memorial Site', 'Wine Shop', 'Amphitheater', 'Track', 'Nail Salon', 'Hostel', 'College Arts Building', 'Roof Deck', 'Pool Hall',
               'Waterfall', 'Optical Shop', 'Internet Cafe', 'Picnic Area', 'Shipping Store', 'Distillery', 'Fish Market', 'Health Food Store', 'Cheese Shop']

capital_venues.drop(capital_venues[capital_venues['Venue Category'].isin(drop_venues)].index, inplace=True)
capital_venues.dropna(0, inplace=True)


In [115]:
def venue_change(category):
    if category in ['Moroccan Restaurant', 'Ethiopian Restaurant']:
        category = 'African Restaurant'
    elif category in ['Café', 'Snack Place', 'Bakery', 'Tea Room', 'Bubble Tea Shop', 'Corporate Coffee Shop', 'Juice Bar']:
        category = 'Coffee Shop'
    elif category in ['Street Food Gathering']:
        category = 'Food Truck'
    elif category in ['Breakfast Spot', 'Bed & Breakfast', 'Bagel Shop', 'Donut Shop', 'Creperie']:
        category = 'Breakfast'
    elif category in ['Taco Place', 'Burrito Place', 'Mexican Restaurant', 'Argentinian Restaurant', 'Cuban Restaurant', 'Brazilian Restaurant', 'Portuguese Restaurant', 'Caribbean Restaurant']:
        category = 'Latin American Restaurant'
    elif category in ['Food & Drink Shop', 'Food Court', 'Bistro', 'Food Stand', 'Airport Food Court', 'Food', 'Restaurant']:
        category = 'Fast Food Restaurant'
    elif category in ['Theme Restaurant', 'New American Restaurant', 'Diner', 'Steakhouse', 'Burger Joint', 'Hot Dog Joint', 'BBQ Joint', 'Tex-Mex Restaurant', 'Hawaiian Restaurant', 'Wings Joint']:
        category = 'American Restaurant'
    elif category in ['Cafeteria', 'College Cafeteria', 'Corporate Cafeteria',]:
        category = 'Buffet'
    elif category in ['Andhra Restaurant', 'North Indian Restaurant', 'Sri Lankan Restaurant']:
        category = 'Indian Restaurant'
    elif category in ['Pizza Place']:
        category = 'Italian Restaurant'
    elif category in ['Irish Pub', 'Fish & Chips Shop','Wine Bar', 'Brewery', 'Winery', 'Sake Bar', 'Karaoke Bar',  'Strip Club',  'Dive Bar',  'Sports Bar',  'Speakeasy', 'Pub', 'Hookah Bar', 'Gay Bar', 'Cocktail Bar', 'Whisky Bar', 'Hotel Bar', 'Piano Bar', 'Tiki Bar', 'Beer Bar', 'Gastropub', 'Beer Garden']:
        category = 'Bar'
    elif category in ['Deli / Bodega']:
        category = 'Sandwich Place'
    elif category in ['Dumpling Restaurant', 'Szechuan Restaurant', 'Malay Restaurant', 'Shanghai Restaurant', 'Dim Sum Restaurant']:
        category = 'Chinese Restaurant'
    elif category in ['Mongolian Restaurant', 'Thai Restaurant', 'Filipino Restaurant', 'Vietnamese Restaurant', 'Noodle House']:
        category = 'Asian Restaurant'
    elif category in ['Salad Place']:
        category = 'Vegetarian / Vegan Restaurant'
    elif category in ['Tapas Restaurant']:
        category = 'Spanish Restaurant'
    elif category in ['Greek Restaurant']:
        category = 'Mediterranean Restaurant'
    elif category in ['Korean BBQ Restaurant']:
        category = 'Korean Restaurant'
    elif category in ['Sushi Restaurant', 'Shabu-Shabu Restaurant', 'Hotpot Restaurant', 'Ramen Restaurant']:
        category = 'Japanese Restaurant'
    elif category in ['Halal Restaurant', 'Israeli Restaurant', 'Turkish Restaurant', 'Turkish Home Cooking Restaurant', 'Doner Restaurant', 'Falafel Restaurant', 'Persian Restaurant', 'Afghan Restaurant']:
        category = 'Middle Eastern Restaurant'
    elif category in ['Ice Cream Shop', 'Cupcake Shop', 'Frozen Yogurt Shop', 'Chocolate Shop', 'Gelato Shop',]:
        category = 'Dessert Shop'
    elif category in ['Mac & Cheese Joint', 'Fried Chicken Joint']:
        category = 'Southern / Soul Food Restaurant'
    elif category in ['Soup Place']:
        category = 'Comfort Food Restaurant'
    elif category in ['Cajun / Creole Restaurant']:
        category = 'French Restaurant'
    elif category in ['Swiss Restaurant']:
        category = 'Modern European Restaurant'
    return category


In [116]:
venue_changed_names = capital_venues['Venue Category'].apply(venue_change)
capital_venues['Venue Category'] = venue_changed_names

capital_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Capital,State,Capital Latitude,Capital Longitude,ID,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
African Restaurant,14,14,14,14,14,14,14,14
American Restaurant,383,383,383,383,383,383,383,383
Asian Restaurant,283,283,283,283,283,283,283,283
Bar,141,141,141,141,141,141,141,141
Breakfast,343,343,343,343,343,343,343,343
Buffet,52,52,52,52,52,52,52,52
Chinese Restaurant,128,128,128,128,128,128,128,128
Coffee Shop,1439,1439,1439,1439,1439,1439,1439,1439
Comfort Food Restaurant,13,13,13,13,13,13,13,13
Dessert Shop,40,40,40,40,40,40,40,40


In [117]:
capital_venues.to_csv('capital_venues1.csv')

In [120]:
capital_venues = pd.read_csv('capital_venues1.csv')
capital_venues.drop('Unnamed: 0',1, inplace=True)

In [121]:
capital_venues.groupby(['Capital', 'Venue Category']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,State,Capital Latitude,Capital Longitude,ID,Venue,Venue Latitude,Venue Longitude
Capital,Venue Category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Albany,African Restaurant,1,1,1,1,1,1,1
Albany,American Restaurant,7,7,7,7,7,7,7
Albany,Asian Restaurant,4,4,4,4,4,4,4
Albany,Bar,3,3,3,3,3,3,3
Albany,Breakfast,1,1,1,1,1,1,1
Albany,Buffet,2,2,2,2,2,2,2
Albany,Chinese Restaurant,2,2,2,2,2,2,2
Albany,Coffee Shop,15,15,15,15,15,15,15
Albany,Dessert Shop,1,1,1,1,1,1,1
Albany,Fast Food Restaurant,11,11,11,11,11,11,11


### I will be using this data to create a recommendation system for users to let them know their top five Capitals that they should visit