# Battle of the Neighborhoods (Week 2) - An Analysis of Fitness in  San Francisco

## 1 Introduction:

According to a poll conducted by US News in 2019, San Francisco was considered the 4th fittest city in the United States.  With the new trend of healthy living and fitness it has become now more than ever a good time to invest in fitness centers. Also, with the accessibility to public transportation it has become easier to access different parts of the city.  

As a new fitness center startup in the San Francisco area, we need to take an analysis of the city’s neighborhoods to find the most opportunistic area to set up a new fitness center.  The aim of this proposal is to analyze the neighborhoods by leveraging Four Square's API to see which areas have the right conditions to create a chain of fitness centers.

This will help to guide our company into making key strategic business decisions.

## 2 Business Problem:

The purpose of this proposal is to find the most optimal location to open a new fitness center.  To do this, we need to answer to following key questions:

1. Using FourSquare API, can we get a visual map of different locations with the nearest venues?
2. From those venues, how many gyms or fitness centers are the most common in those neighborhoods?
3. Can we conduct an analysis where we can isolate specific neighborhoods that can be targeted (i.e., Clustering)?

## 3 Data:

To complete this analysis, we need to collect data regarding San Francisco's different neighborhoods.  We also need to collect data on the different venues surrounding the neighborhoods.

Here is the following data being used to conduct this analysis:

1. For San Francisco postal code and Neighborhood data we will be using the following dataset:
   Data Source: http://www.healthysf.org/bdi/outcomes/zipmap.htm
   To get the latitude and longitude of the locations we will be using the geocoder package.
   
   
2. We will use the Foursquare API to gather venue data for each San Francisco Neighborhood.  It will help to isolate the nearest venues per location. 

## 4 Methodology:

We will begin by importing all of the neccessary libraries for this analysis

In [89]:
import pandas as pd
import numpy as np
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
import folium
from sklearn.cluster import KMeans
import geocoder
!pip install pgeocode
import pgeocode



Lets begin scrapping the data using Beautiful soup and then create a dataframe

In [90]:
# scrap the data from the url
url = "http://www.healthysf.org/bdi/outcomes/zipmap.htm"
san_fran_url = requests.get(url).text
soup = BeautifulSoup(san_fran_url,'lxml')
table = soup.find_all("table")

# move the data into a dataframe
san_fran_df = pd.read_html(str(table))

# clean the dataframe to fit
san_fran_df = pd.DataFrame(san_fran_df[4])
san_fran_df.columns = san_fran_df.iloc[0]
san_fran_df = san_fran_df.iloc[1:]
san_fran_df.drop(index = san_fran_df.index[21],axis = 0, inplace = True)
san_fran_df = san_fran_df.iloc[:,0:2]
san_fran_df.head()

Unnamed: 0,Zip Code,Neighborhood
1,94102,Hayes Valley/Tenderloin/North of Market
2,94103,South of Market
3,94107,Potrero Hill
4,94108,Chinatown
5,94109,Polk/Russian Hill (Nob Hill)


### After Scraping the data, we will need to get the latitude and longitude for each of the coordinates

In [91]:
# Will use the pgeocode library to get the latitude and longitude coordinates for each neighborhood
nomi_object = pgeocode.Nominatim('us')
latitude = []
longitude = []

for index,row in san_fran_df.iterrows():
    zipcode = nomi_object.query_postal_code(row["Zip Code"])
    latitude.append(zipcode.latitude)
    longitude.append(zipcode.longitude)

san_fran_df["Latitude"] = latitude
san_fran_df["Longitude"] = longitude

san_fran_df.head()

Unnamed: 0,Zip Code,Neighborhood,Latitude,Longitude
1,94102,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167
2,94103,South of Market,37.7725,-122.4147
3,94107,Potrero Hill,37.7621,-122.3971
4,94108,Chinatown,37.7929,-122.4079
5,94109,Polk/Russian Hill (Nob Hill),37.7917,-122.4186


#### Now that we have our coordinates we can build a visual map of San Francisco using the Folium package

In [109]:
address = 'San Francisco, California'

geolocator = Nominatim(user_agent="san_francisco")
location = geolocator.geocode(address)
san_latitude = location.latitude
san_longitude = location.longitude

map_san_fran = folium.Map(location=[san_latitude,san_longitude], zoom_start = 10)

for lat,lng, neighborhood in zip(san_fran_df['Latitude'],san_fran_df['Longitude'],san_fran_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'green',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_san_fran)
    
map_san_fran

#### Using the Foursquare API, we will begin to explore each of these neighborhoods

In [93]:
# Enter in the Foursquare credentials

client_id = '******'
client_secret = '******'
version = '20180605'

radius = 500
LIMIT = 100

venue_list = []

for neighborhood,lat, long  in zip(san_fran_df['Neighborhood'],san_fran_df['Latitude'],san_fran_df['Longitude']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        client_id,
        client_secret,
        version,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results : 
        venue_list.append((
        neighborhood,
        lat,
        long,
        venue['venue']['location']['lat'], 
        venue['venue']['location']['lng'],
        venue['venue']['name'],
        venue['venue']['categories'][0]['name']))

In [94]:
# This will create a nearby venues dataframe with the San francisco neighborhood data

nearby_venues = pd.DataFrame(venue_list)
nearby_venues.columns = ["Neighborhood",'Neighborhood Latitude', 'Neighborhood Longitude', "Venue Latitude", "Venue Longitude","Venue Name", "Venue Category"]
nearby_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude,Venue Name,Venue Category
0,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167,37.780178,-122.416505,Asian Art Museum,Art Museum
1,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167,37.782751,-122.415656,Ales Unlimited: Beer Basement,Beer Bar
2,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167,37.783084,-122.41765,Saigon Sandwich,Sandwich Place
3,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167,37.781266,-122.416901,Philz Coffee,Coffee Shop
4,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167,37.782896,-122.418897,Brenda's French Soul Food,Southern / Soul Food Restaurant


#### Will conduct EDA of the nearby venues dataframe

In [95]:
# To get a count of the number of venues per neighborhood
nearby_venues.groupby('Neighborhood')["Venue Name"].count()

Neighborhood
Bayview-Hunters Point                       19
Castro/Noe Valley                           75
Chinatown                                   88
Haight-Ashbury                              35
Hayes Valley/Tenderloin/North of Market     67
Ingelside-Excelsior/Crocker-Amazon          45
Inner Mission/Bernal Heights                58
Inner Richmond                              66
Lake Merced                                 14
Marina                                     100
North Beach/Chinatown                       72
Outer Richmond                              34
Parkside/Forest Hill                        38
Polk/Russian Hill (Nob Hill)               100
Potrero Hill                                33
South of Market                            100
St. Francis Wood/Miraloma/West Portal        5
Sunset                                      31
Twin Peaks-Glen Park                        16
Visitacion Valley/Sunnydale                  8
Western Addition/Japantown                 100


In [96]:
# To get the number of unique categories
len(nearby_venues['Venue Category'].unique())

234

In [97]:
# This will encode the Venue Category column
san_fran_hot = pd.get_dummies(nearby_venues[['Venue Category']], prefix = "", prefix_sep="")

# Add zipcode and Neighborhood into the dataframe
san_fran_hot['Neighborhood'] = nearby_venues['Neighborhood']

# move zip code and Neighboorhood to the front of the data set
fixed_columns = list(san_fran_hot.columns[-1:]) + list(san_fran_hot.columns[:-1])
san_fran_hot = san_fran_hot[fixed_columns]

san_fran_hot

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yemeni Restaurant,Yoga Studio
0,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1099,Visitacion Valley/Sunnydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1100,Visitacion Valley/Sunnydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1101,Visitacion Valley/Sunnydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1102,Visitacion Valley/Sunnydale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [98]:
# This will get the frequency of each of the occurances
san_fran_freq = san_fran_hot.groupby(['Neighborhood']).mean().reset_index()
san_fran_freq.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yemeni Restaurant,Yoga Studio
0,Bayview-Hunters Point,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Castro/Noe Valley,0.013333,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,...,0.0,0.0,0.0,0.0,0.0,0.026667,0.013333,0.0,0.0,0.026667
2,Chinatown,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,...,0.0,0.0,0.034091,0.0,0.011364,0.011364,0.0,0.0,0.0,0.011364
3,Haight-Ashbury,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571
4,Hayes Valley/Tenderloin/North of Market,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.104478,0.0,0.0,0.014925,0.0,0.0,0.014925,0.0


#### Lets take a look at the top 10 venues for each neighborhood

In [99]:
top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


# create a new dataframe
neighborhoods_venues = pd.DataFrame(columns=columns)
neighborhoods_venues['Neighborhood'] = san_fran_freq['Neighborhood']

for ind in np.arange(san_fran_freq.shape[0]):
    row_categories = san_fran_freq.iloc[ind,:].iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues.iloc[ind, 1:] = row_categories_sorted.index.values[0:top_venues]

neighborhoods_venues

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayview-Hunters Point,Park,Southern / Soul Food Restaurant,Light Rail Station,Chinese Restaurant,Pharmacy,Theater,Grocery Store,BBQ Joint,Mexican Restaurant,Gym
1,Castro/Noe Valley,Gay Bar,Thai Restaurant,Coffee Shop,Yoga Studio,Pharmacy,Flower Shop,Mediterranean Restaurant,New American Restaurant,Clothing Store,Convenience Store
2,Chinatown,Chinese Restaurant,Bakery,Hotel,Coffee Shop,Vietnamese Restaurant,Dim Sum Restaurant,Tea Room,Cocktail Bar,Szechuan Restaurant,Bank
3,Haight-Ashbury,Coffee Shop,Boutique,Park,Ice Cream Shop,Bakery,Bus Stop,Breakfast Spot,Bubble Tea Shop,Mexican Restaurant,Burrito Place
4,Hayes Valley/Tenderloin/North of Market,Vietnamese Restaurant,Sandwich Place,Hotel,Thai Restaurant,Theater,Hotel Bar,Coffee Shop,Beer Bar,Concert Hall,Bakery
5,Ingelside-Excelsior/Crocker-Amazon,Pizza Place,Mexican Restaurant,Bus Station,Café,Sandwich Place,Latin American Restaurant,Furniture / Home Store,Bar,Bubble Tea Shop,Restaurant
6,Inner Mission/Bernal Heights,Mexican Restaurant,Grocery Store,Coffee Shop,Art Gallery,Café,Bookstore,Performing Arts Venue,Fish Market,Jewish Restaurant,Record Shop
7,Inner Richmond,Bakery,Thai Restaurant,Sushi Restaurant,Pet Store,Breakfast Spot,Burger Joint,Wine Shop,Japanese Restaurant,Korean Restaurant,Burmese Restaurant
8,Lake Merced,Coffee Shop,Café,Rental Car Location,Paper / Office Supplies Store,Mexican Restaurant,Sandwich Place,Smoke Shop,Pizza Place,Nightclub,Burger Joint
9,Marina,Italian Restaurant,Gym / Fitness Center,Cosmetics Shop,Bank,Spa,Salad Place,Sushi Restaurant,French Restaurant,Wine Bar,Mexican Restaurant


#### Use KMeans clustering to analyze the neighborhoods using 2 clusters

In [100]:
from sklearn.cluster import KMeans

In [101]:
kclusters = 2
san_fran_clusters = san_fran_freq.drop(['Neighborhood'],1)
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(san_fran_clusters)
neighborhoods_venues.insert(0,'Cluster Labels', kmeans.labels_)
sf_new_df = san_fran_df
sf_new_df = sf_new_df.merge(neighborhoods_venues, on = 'Neighborhood')

In [102]:
sf_new_df

Unnamed: 0,Zip Code,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,94102,Hayes Valley/Tenderloin/North of Market,37.7813,-122.4167,1,Vietnamese Restaurant,Sandwich Place,Hotel,Thai Restaurant,Theater,Hotel Bar,Coffee Shop,Beer Bar,Concert Hall,Bakery
1,94103,South of Market,37.7725,-122.4147,1,Nightclub,Coffee Shop,Food Truck,Gay Bar,Cocktail Bar,Sushi Restaurant,Pizza Place,Rental Car Location,Clothing Store,Thai Restaurant
2,94107,Potrero Hill,37.7621,-122.3971,1,Breakfast Spot,Coffee Shop,Café,Brewery,Cosmetics Shop,Office,Grocery Store,Sushi Restaurant,Bookstore,Sandwich Place
3,94108,Chinatown,37.7929,-122.4079,1,Chinese Restaurant,Bakery,Hotel,Coffee Shop,Vietnamese Restaurant,Dim Sum Restaurant,Tea Room,Cocktail Bar,Szechuan Restaurant,Bank
4,94109,Polk/Russian Hill (Nob Hill),37.7917,-122.4186,1,Grocery Store,Thai Restaurant,Italian Restaurant,Massage Studio,Vietnamese Restaurant,Bakery,French Restaurant,Café,Wine Bar,Gym / Fitness Center
5,94110,Inner Mission/Bernal Heights,37.7509,-122.4153,1,Mexican Restaurant,Grocery Store,Coffee Shop,Art Gallery,Café,Bookstore,Performing Arts Venue,Fish Market,Jewish Restaurant,Record Shop
6,94112,Ingelside-Excelsior/Crocker-Amazon,37.7195,-122.4411,1,Pizza Place,Mexican Restaurant,Bus Station,Café,Sandwich Place,Latin American Restaurant,Furniture / Home Store,Bar,Bubble Tea Shop,Restaurant
7,94114,Castro/Noe Valley,37.7587,-122.433,1,Gay Bar,Thai Restaurant,Coffee Shop,Yoga Studio,Pharmacy,Flower Shop,Mediterranean Restaurant,New American Restaurant,Clothing Store,Convenience Store
8,94115,Western Addition/Japantown,37.7856,-122.4358,1,Cosmetics Shop,Ice Cream Shop,Bakery,Pizza Place,Spa,Tea Room,Creperie,Shopping Mall,Coffee Shop,Seafood Restaurant
9,94116,Parkside/Forest Hill,37.7441,-122.4863,1,Chinese Restaurant,Dumpling Restaurant,Light Rail Station,Café,Sandwich Place,Korean Restaurant,Burrito Place,Gym / Fitness Center,Sushi Restaurant,Gastropub


#### Using these 2 clusters, we will map these out

In [111]:
map_clusters = folium.Map(location = [san_latitude,san_longitude], zoom_start =11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat,lon,poi,cluster  in zip(sf_new_df['Latitude'],sf_new_df['Longitude'],sf_new_df['Neighborhood'],sf_new_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Lets take a closer look at each of the clusters

##### Cluster # 1

In [112]:
sf_new_df.loc[sf_new_df['Cluster Labels'] == 0, sf_new_df.columns[[1] + list(range(5, sf_new_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,St. Francis Wood/Miraloma/West Portal,Park,Bus Line,Trail,Monument / Landmark,Tree,Food Court,Flower Shop,Food,Food & Drink Shop,Yoga Studio
17,Twin Peaks-Glen Park,Playground,Trail,Scenic Lookout,Salon / Barbershop,Dim Sum Restaurant,Shopping Mall,Food,Grocery Store,Bakery,Coffee Shop
20,Visitacion Valley/Sunnydale,BBQ Joint,Playground,Music Venue,Park,Performing Arts Venue,Trail,Scenic Lookout,Garden,Ethiopian Restaurant,Event Space


##### Cluster #2

In [113]:
sf_new_df.loc[sf_new_df['Cluster Labels'] == 1, sf_new_df.columns[[1] + list(range(5, sf_new_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hayes Valley/Tenderloin/North of Market,Vietnamese Restaurant,Sandwich Place,Hotel,Thai Restaurant,Theater,Hotel Bar,Coffee Shop,Beer Bar,Concert Hall,Bakery
1,South of Market,Nightclub,Coffee Shop,Food Truck,Gay Bar,Cocktail Bar,Sushi Restaurant,Pizza Place,Rental Car Location,Clothing Store,Thai Restaurant
2,Potrero Hill,Breakfast Spot,Coffee Shop,Café,Brewery,Cosmetics Shop,Office,Grocery Store,Sushi Restaurant,Bookstore,Sandwich Place
3,Chinatown,Chinese Restaurant,Bakery,Hotel,Coffee Shop,Vietnamese Restaurant,Dim Sum Restaurant,Tea Room,Cocktail Bar,Szechuan Restaurant,Bank
4,Polk/Russian Hill (Nob Hill),Grocery Store,Thai Restaurant,Italian Restaurant,Massage Studio,Vietnamese Restaurant,Bakery,French Restaurant,Café,Wine Bar,Gym / Fitness Center
5,Inner Mission/Bernal Heights,Mexican Restaurant,Grocery Store,Coffee Shop,Art Gallery,Café,Bookstore,Performing Arts Venue,Fish Market,Jewish Restaurant,Record Shop
6,Ingelside-Excelsior/Crocker-Amazon,Pizza Place,Mexican Restaurant,Bus Station,Café,Sandwich Place,Latin American Restaurant,Furniture / Home Store,Bar,Bubble Tea Shop,Restaurant
7,Castro/Noe Valley,Gay Bar,Thai Restaurant,Coffee Shop,Yoga Studio,Pharmacy,Flower Shop,Mediterranean Restaurant,New American Restaurant,Clothing Store,Convenience Store
8,Western Addition/Japantown,Cosmetics Shop,Ice Cream Shop,Bakery,Pizza Place,Spa,Tea Room,Creperie,Shopping Mall,Coffee Shop,Seafood Restaurant
9,Parkside/Forest Hill,Chinese Restaurant,Dumpling Restaurant,Light Rail Station,Café,Sandwich Place,Korean Restaurant,Burrito Place,Gym / Fitness Center,Sushi Restaurant,Gastropub
