# Introduction/Business Problem

   Whenever I plan a trip to a new city, I find myself looking up specific interests of mine and trying to map how far away they are from each other. I will spend hours deciding which area in a city I will visit because of how tedious it is planning my commute between desired attractions. My solution to this problem is to create a tool using the location data on Foursquare that will allow customers to input their desired attractions (ex. resturants, movies, sports). This tool will optimize the customers experience while visiting the city by recommending neighborhoods in the desired city with the maximum density of requested attractions. This will save the customer time and gurantee an excepional experience while visiting their city of choice. For this project, I will be exploring the city of Pittsburgh, PA.

# Data

   To create the tool described in  the Introduction/Business Problem section, I will be using the location data provided by Foursquare. Similar to the New York and Toronto examples, I will be retrieving information on the density of different attractions in a given neighborhood. I will create a tool that will filter columns to be used in calculating which neighborhoods should be recommended. For example, if the user inputs 'resturants' and 'parks' as search criteria, the tool will filter only those columns and add up their densities. The new column with the summation of the densities will be sorted in descending order and the top three neighborhood will be suggested as recommended places to visit. 

# Code

In [3]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from sklearn.cluster import KMeans
#import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

print('done')

done


Scrape names of neighboorhoods in Pittsburgh and put into list

In [4]:
url = 'https://www.visitpittsburgh.com/neighborhoods/'

soup = BeautifulSoup(requests.get(url).content, "html.parser")
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    table_contents.append(str(row)[4:-5])

print(table_contents)

['Allegheny Center', 'Allegheny West', 'Allentown', 'Arlington', 'Arlington Heights', 'Banksville', 'Bedford Dwellings', 'Beechview', 'Beltzhoover', 'Bloomfield', 'Bluff', 'Bon Air', 'Brighton Heights', 'Brookline', 'California Kirkbride', 'Carrick', 'Central Lawrenceville', 'Central North Side', 'Central Oakland', 'Chartiers City', 'Chateau', 'Crafton Heights', 'Crawford-Roberts', 'Downtown', 'Duquesne Heights', 'East Allegheny', 'East Carnegie', 'East Hills', 'East Liberty', 'Elliott', 'Esplen', 'Fairywood', 'Fineview', 'Friendship', 'Garfield', 'Glen Hazel', 'Greenfield', 'Hays', 'Hazelwood', 'Highland Park', 'Homewood North', 'Homewood South', 'Homewood West', 'Knoxville', 'Larimer', 'Lincoln-Lemington-Belmar', 'Lincoln Place', 'Lower Lawrenceville', 'Manchester', 'Marshall-Shadeland', 'Middle Hill', 'Morningside', 'Mount Oliver', 'Mount Washington', 'New Homestead', 'North Oakland', 'North Shore', 'Northview Heights', 'Oakwood', 'Overbrook', 'Perry North', 'Perry South', 'Point Br

Use list of neighborhoods to find latitude and longitude

In [5]:
address =  (str(table_contents[0]) + ', Pittsburgh, PA')

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude


lats = []
longs = []


for area in table_contents[0:88]:
    address = (str(area) + ', Pittsburgh, PA')
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    lats.append(latitude)
    longs.append(longitude)


Create Dataframe with nieghborhoods and latitude/longitude

In [6]:
# define the dataframe columns
data = {'Neighborhood': table_contents[0:88],
        'Latitude' : lats,
        'Longitude' : longs} 

# instantiate the dataframe
neighborhoods = pd.DataFrame(data)
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allegheny Center,40.452937,-80.006344
1,Allegheny West,40.452273,-80.015741
2,Allentown,40.422618,-79.993086
3,Arlington,40.415179,-79.970018
4,Arlington Heights,40.416659,-79.962402
...,...,...,...
83,Troy Hill,40.468958,-79.978664
84,Upper Hill,40.453068,-79.959823
85,Upper Lawrenceville,40.482037,-79.951025
86,West End,40.440919,-80.034066


In [7]:
CLIENT_ID = 'BDHUOEU2XMAHR2WOJZIDN2X1U1LA3P3YI0K0PQM02FQDBPBR' # your Foursquare ID
CLIENT_SECRET = 'RZIM1GZ1PKT24OCFGG05IEZS203SX4BW51DY2ID4G33PRQ10' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BDHUOEU2XMAHR2WOJZIDN2X1U1LA3P3YI0K0PQM02FQDBPBR
CLIENT_SECRET:RZIM1GZ1PKT24OCFGG05IEZS203SX4BW51DY2ID4G33PRQ10


In [8]:



def getNearbyVenues(names, latitudes, longitudes, radius=750):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
df = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Allegheny Center
Allegheny West
Allentown
Arlington
Arlington Heights
Banksville
Bedford Dwellings
Beechview
Beltzhoover
Bloomfield
Bluff
Bon Air
Brighton Heights
Brookline
California Kirkbride
Carrick
Central Lawrenceville
Central North Side
Central Oakland
Chartiers City
Chateau
Crafton Heights
Crawford-Roberts
Downtown
Duquesne Heights
East Allegheny
East Carnegie
East Hills
East Liberty
Elliott
Esplen
Fairywood
Fineview
Friendship
Garfield
Glen Hazel
Greenfield
Hays
Hazelwood
Highland Park
Homewood North
Homewood South
Homewood West
Knoxville
Larimer
Lincoln-Lemington-Belmar
Lincoln Place
Lower Lawrenceville
Manchester
Marshall-Shadeland
Middle Hill
Morningside
Mount Oliver
Mount Washington
New Homestead
North Oakland
North Shore
Northview Heights
Oakwood
Overbrook
Perry North
Perry South
Point Breeze
Point Breeze North
Polish Hill
Regent Square
Ridgemont
Shadyside
Sheraden
South Oakland
South Shore
South Side Flats
South Side Slopes
Spring Garden
Spring Hill-City View
Squirrel Hil

### Analyze each nieghborhood

In [10]:
# one hot encoding
pitt_onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
pitt_onehot['Neighborhood'] = df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [pitt_onehot.columns[-1]] + list(pitt_onehot.columns[:-1])
pitt_onehot = pitt_onehot[fixed_columns]


neigh = df['Neighborhood']
pitt_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
pitt_onehot.insert(0, 'Neighborhood', neigh)


pitt_onehot.head(10)

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse,Wine Bar,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Grouping the neighborhood and find mean occurrence

In [11]:
pitt_grouped = pitt_onehot.groupby('Neighborhood').mean().reset_index()
pitt_grouped

for i in pitt_grouped.columns:
    print(i)

Neighborhood
Zoo Exhibit
ATM
American Restaurant
Antique Shop
Arcade
Argentinian Restaurant
Art Gallery
Art Museum
Arts & Crafts Store
Asian Restaurant
Athletics & Sports
Auto Garage
Auto Workshop
BBQ Joint
Bagel Shop
Bakery
Bank
Bar
Baseball Field
Baseball Stadium
Bed & Breakfast
Beer Garden
Beer Store
Belgian Restaurant
Big Box Store
Bike Shop
Boat or Ferry
Bookstore
Botanical Garden
Boutique
Bowling Alley
Brazilian Restaurant
Breakfast Spot
Brewery
Bridge
Bubble Tea Shop
Building
Burger Joint
Burrito Place
Bus Line
Bus Station
Bus Stop
Business Service
Butcher
Café
Cambodian Restaurant
Camera Store
Candy Store
Caribbean Restaurant
Carpet Store
Casino
Check Cashing Service
Cheese Shop
Chinese Restaurant
Climbing Gym
Clothing Store
Cocktail Bar
Coffee Shop
College Academic Building
College Arts Building
College Baseball Diamond
College Basketball Court
College Gym
Comic Shop
Concert Hall
Construction & Landscaping
Convenience Store
Cosmetics Shop
Cuban Restaurant
Cycle Studio
Dance St

In [20]:
fun_list = ['Park','Restaurant','Bar', 'Dive Bar']



def search_tool(df,my_list,include_all=True):
    hood = df['Neighborhood']
    df = round(df[my_list]*100,2)
    sum_column = round(df.mean(axis=1),2)
    df['Total'] = sum_column
    if include_all==True:
        df = df.replace(0, pd.np.nan).dropna(axis=0, how='any', subset=fun_list).fillna(0).astype(float)
    df.insert(0, 'Neighborhood', hood)
    final_df = df.sort_values(by=['Total'], ascending=False)
    return final_df.head()

search_tool(pitt_grouped, fun_list)
    
  

    



Unnamed: 0,Neighborhood,Park,Restaurant,Bar,Dive Bar,Total
72,South Side Slopes,3.33,1.67,15.0,1.67,5.42
71,South Side Flats,3.03,1.52,9.09,6.06,4.92


In [21]:
search_tool(pitt_grouped, fun_list, include_all=False)  

Unnamed: 0,Neighborhood,Park,Restaurant,Bar,Dive Bar,Total
54,New Homestead,100.0,0.0,0.0,0.0,25.0
81,Swisshelm Park,16.67,0.0,16.67,0.0,8.34
86,West End,8.33,8.33,8.33,0.0,6.25
7,Beechview,16.67,0.0,8.33,0.0,6.25
2,Allentown,7.69,0.0,7.69,7.69,5.77


# Results/Discussion/Conclusion

The main outcome of this project was a simple tool that was made by leveraging the FourSquare API data to give customers the most optimized result based on their inputs. The tool will find which neighborhoods have the most amount of the customers activity searched and give a recommendation. A key feature of the tool that is shown in the code block above is the "inclue_all" feature. This allows the customer to have their searches give recommendations that include all of their inputs or just some of their desired attractions. The first example that has include_all=True has lower total scores for the neighborhoods, however, it includes all attractions that the customer was looking for. The second example where include_all=False has higher total scores but does not include all the attraction that the customer is looking for. This option gives the customer the ability to decide where to visit depending on how important it is that they visit certain attractions. In conclusion, I was able to import data using the FourSquare API and local neighborhood data on Pittsburgh to give cust