# Introduction/Business Problem

   Whenever I plan a trip to a new city, I find myself looking up specific interests of mine and trying to map how far away they are from each other. I will spend hours deciding which area in a city I will visit because of how tedious it is planning my commute between desired attractions. My solution to this problem is to create a tool using the location data on Foursquare that will allow customers to input their desired attractions (ex. resturants, movies, sports). This tool will optimize the customers experience while visiting the city by recommending neighborhoods in the desired city with the maximum density of requested attractions. This will save the customer time and gurantee an excepional experience while visiting their city of choice. For this project, I will be exploring the city of Pittsburgh, PA.

# Data

   To create the tool described in  the Introduction/Business Problem section, I will be using the location data provided by Foursquare. Similar to the New York and Toronto examples, I will be retrieving information on the density of different attractions in a given neighborhood. I will create a tool that will filter columns to be used in calculating which neighborhoods should be recommended. For example, if the user inputs 'resturants' and 'parks' as search criteria, the tool will filter only those columns and add up their densities. The new column with the summation of the densities will be sorted in descending order and the top three neighborhood will be suggested as recommended places to visit. 

# Code

In [111]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from sklearn.cluster import KMeans
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

print('done')

done


Scrape names of neighboorhoods in Pittsburgh and put into list

In [112]:
url = 'https://www.visitpittsburgh.com/neighborhoods/'

soup = BeautifulSoup(requests.get(url).content, "html.parser")
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    table_contents.append(str(row)[4:-5])

print(table_contents)

['Allegheny Center', 'Allegheny West', 'Allentown', 'Arlington', 'Arlington Heights', 'Banksville', 'Bedford Dwellings', 'Beechview', 'Beltzhoover', 'Bloomfield', 'Bluff', 'Bon Air', 'Brighton Heights', 'Brookline', 'California Kirkbride', 'Carrick', 'Central Lawrenceville', 'Central North Side', 'Central Oakland', 'Chartiers City', 'Chateau', 'Crafton Heights', 'Crawford-Roberts', 'Downtown', 'Duquesne Heights', 'East Allegheny', 'East Carnegie', 'East Hills', 'East Liberty', 'Elliott', 'Esplen', 'Fairywood', 'Fineview', 'Friendship', 'Garfield', 'Glen Hazel', 'Greenfield', 'Hays', 'Hazelwood', 'Highland Park', 'Homewood North', 'Homewood South', 'Homewood West', 'Knoxville', 'Larimer', 'Lincoln-Lemington-Belmar', 'Lincoln Place', 'Lower Lawrenceville', 'Manchester', 'Marshall-Shadeland', 'Middle Hill', 'Morningside', 'Mount Oliver', 'Mount Washington', 'New Homestead', 'North Oakland', 'North Shore', 'Northview Heights', 'Oakwood', 'Overbrook', 'Perry North', 'Perry South', 'Point Br

Use list of neighborhoods to find latitude and longitude

In [113]:
address =  (str(table_contents[0]) + ', Pittsburgh, PA')

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude


lats = []
longs = []


for area in table_contents[0:88]:
    address = (str(area) + ', Pittsburgh, PA')
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    lats.append(latitude)
    longs.append(longitude)


40.4529367 -80.0063438
90


Create Dataframe with nieghborhoods and latitude/longitude

In [114]:
# define the dataframe columns
data = {'Neighborhood': table_contents[0:88],
        'Latitude' : lats,
        'Longitude' : longs} 

# instantiate the dataframe
neighborhoods = pd.DataFrame(data)
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allegheny Center,40.452937,-80.006344
1,Allegheny West,40.452273,-80.015741
2,Allentown,40.422618,-79.993086
3,Arlington,40.415179,-79.970018
4,Arlington Heights,40.416659,-79.962402
...,...,...,...
83,Troy Hill,40.468958,-79.978664
84,Upper Hill,40.453068,-79.959823
85,Upper Lawrenceville,40.482037,-79.951025
86,West End,40.440919,-80.034066


In [115]:
CLIENT_ID = 'BDHUOEU2XMAHR2WOJZIDN2X1U1LA3P3YI0K0PQM02FQDBPBR' # your Foursquare ID
CLIENT_SECRET = 'RZIM1GZ1PKT24OCFGG05IEZS203SX4BW51DY2ID4G33PRQ10' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BDHUOEU2XMAHR2WOJZIDN2X1U1LA3P3YI0K0PQM02FQDBPBR
CLIENT_SECRET:RZIM1GZ1PKT24OCFGG05IEZS203SX4BW51DY2ID4G33PRQ10


In [116]:



def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
df = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Allegheny Center
Allegheny West
Allentown
Arlington
Arlington Heights
Banksville
Bedford Dwellings
Beechview
Beltzhoover
Bloomfield
Bluff
Bon Air
Brighton Heights
Brookline
California Kirkbride
Carrick
Central Lawrenceville
Central North Side
Central Oakland
Chartiers City
Chateau
Crafton Heights
Crawford-Roberts
Downtown
Duquesne Heights
East Allegheny
East Carnegie
East Hills
East Liberty
Elliott
Esplen
Fairywood
Fineview
Friendship
Garfield
Glen Hazel
Greenfield
Hays
Hazelwood
Highland Park
Homewood North
Homewood South
Homewood West


### Analyze each nieghborhood

In [110]:
# one hot encoding
pitt_onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
pitt_onehot['Neighborhood'] = df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [pitt_onehot.columns[-1]] + list(pitt_onehot.columns[:-1])
pitt_onehot = pitt_onehot[fixed_columns]


neigh = df['Neighborhood']
pitt_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
pitt_onehot.insert(0, 'Neighborhood', neigh)


pitt_onehot.head(10)

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Adult Boutique,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Allegheny Center,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Allegheny Center,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Grouping the neighborhood and find mean occurrence

In [106]:
pitt_grouped = pitt_onehot.groupby('Neighborhood').mean().reset_index()
pitt_grouped

pitt_grouped.columns

Index(['Neighborhood', 'Zoo', 'Accessories Store', 'Adult Boutique',
       'American Restaurant', 'Antique Shop', 'Arcade', 'Art Gallery',
       'Art Museum', 'Arts & Crafts Store',
       ...
       'Tunnel', 'Turkish Restaurant', 'Vegetarian / Vegan Restaurant',
       'Video Game Store', 'Video Store', 'Vietnamese Restaurant', 'Wine Bar',
       'Wings Joint', 'Women's Store', 'Yoga Studio'],
      dtype='object', length=223)

In [109]:
fun_list = ['American Restaurant']



def search_tool(df,my_list):
    hood = df['Neighborhood']
    df = df[my_list]
    sum_column = df.sum(axis=1)
    df['Total'] = sum_column
    df.insert(0, 'Neighborhood', hood)
    final_df = df.sort_values(by=['Total'], ascending=False)
    return final_df.head(5)
  #  clean_df['Total'] = clean_df.sum(axis=1)

search_tool(pitt_grouped, fun_list)
    
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Neighborhood,American Restaurant,Total
4,Arlington Heights,0.5,0.5
75,Summer Hill,0.333333,0.333333
3,Arlington,0.333333,0.333333
55,Overbrook,0.142857,0.142857
16,Central North Side,0.142857,0.142857
