<heading><b> Opening a New Fitness Cente in the Bay Area</b></heading>

1. Build a dataframe of neighborhoods in Kuala Lumpur, Malaysia by web scraping the data from Wikipedia page
2. Get the geographical coordinates of the neighborhoods
3. Obtain the venue data for the neighborhoods from Foursquare API
4. Explore and cluster the neighborhoods
5. Select the best cluster to open a new shopping mall

<b>Importing Libraries</b>

In [35]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents
import urllib.request

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


<b> Scrapping data from Wiki page into a Dataframe</b>

In [36]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Counties_in_the_San_Francisco_Bay_Area").text

In [37]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [38]:

# create a list to store neighborhood data
neighborhoodList = []
        


In [39]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [50]:

# create a new DataFrame from the list
bayarea_df = pd.DataFrame({"Neighborhood": neighborhoodList})

bayarea_df

Unnamed: 0,Neighborhood
0,"► Alameda County, California‎ (15 C, 12 P)"
1,"► Contra Costa County, California‎ (14 C, 11 P)"
2,"► Marin County, California‎ (14 C, 20 P)"
3,"► Napa County, California‎ (11 C, 22 P)"
4,"► San Francisco‎ (24 C, 10 P)"
5,"► San Mateo County, California‎ (12 C, 14 P)"
6,"► Santa Clara County, California‎ (15 C, 12 P)"
7,"► Solano County, California‎ (12 C, 9 P)"
8,"► Sonoma County, California‎ (17 C, 18 P, 1 F)"


In [51]:
#using string manipulations to remove "►" and the last 11 chars from the entire column in the dataframe

pattern = "►"
bayarea_df['Neighborhood']= bayarea_df['Neighborhood'].str.replace(pattern,'', regex = True)

bayarea_df['Neighborhood'] = [x[:-12] for x in bayarea_df['Neighborhood']]
bayarea_df.head()

Unnamed: 0,Neighborhood
0,"Alameda County, California‎"
1,"Contra Costa County, California‎"
2,"Marin County, California‎"
3,"Napa County, California‎"
4,San Francisco‎


In [52]:
# print the number of rows of the dataframe
bayarea_df.shape


(9, 1)

<b> Get the geographical coordinates</b>

In [54]:

# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, San Francisco Bay Area, California'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [55]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in bayarea_df["Neighborhood"].tolist() ]

In [56]:
coords

[[37.86948000000007, -122.25928999999996],
 [38.037579250097885, -122.26204370123158],
 [37.88789894816959, -122.53040271864795],
 [38.39687427791226, -122.36099084372239],
 [37.777120000000025, -122.41963999999996],
 [37.52758087251249, -122.51334269637053],
 [37.30327894927281, -121.89362006124473],
 [38.085719745615705, -122.13936127704916],
 [38.31046996515037, -122.4932698823946]]

In [57]:

# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [58]:
# merge the coordinates into the original dataframe
bayarea_df['Latitude'] = df_coords['Latitude']
bayarea_df['Longitude'] = df_coords['Longitude']

# check the neighborhoods and the coordinates
print(bayarea_df.shape)
bayarea_df

(9, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Alameda County, California‎",37.86948,-122.25929
1,"Contra Costa County, California‎",38.037579,-122.262044
2,"Marin County, California‎",37.887899,-122.530403
3,"Napa County, California‎",38.396874,-122.360991
4,San Francisco‎,37.77712,-122.41964
5,"San Mateo County, California‎",37.527581,-122.513343
6,"Santa Clara County, California‎",37.303279,-121.89362
7,"Solano County, California‎",38.08572,-122.139361
8,"Sonoma County, California‎ (17 C",38.31047,-122.49327


In [59]:
# save the DataFrame as CSV file
bayarea_df.to_csv("bayarea_df.csv", index=False)

<b> Create a map of Bay area with neighborhoods</b>

In [61]:
# get the coordinates of Kuala Lumpur
address = 'San Francisco Bay Area, California'

geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco Bay Area, California {}, {}.'.format(latitude, longitude))


The geograpical coordinate of San Francisco Bay Area, California 37.7884969, -122.3558473.


In [65]:
# create map of Bay Area using latitude and longitude values
map_bayarea = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, neighborhood in zip(bayarea_df['Latitude'], bayarea_df['Longitude'], bayarea_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_bayarea)  
    
map_bayarea

In [66]:
# save the map as HTML file
map_bayarea.save('map_bayarea.html')

<b> Use Foursquare API to explore the neighborhoods </b>

In [67]:
# define Foursquare Credentials and Version
CLIENT_ID = 'OSXJARTKNTZCRXXURBVKCKWAHBLPW4MBYC2JJCCRW5P2BPTH' # your Foursquare ID
CLIENT_SECRET = 'ASAYSXHPTWFXEH5C1G3SJRTTELHCCWYARJ110FNPUL4V24JB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OSXJARTKNTZCRXXURBVKCKWAHBLPW4MBYC2JJCCRW5P2BPTH
CLIENT_SECRET:ASAYSXHPTWFXEH5C1G3SJRTTELHCCWYARJ110FNPUL4V24JB


<b> Now, let's get the top 100 venues that are within a radius of 2000 meters.</b>

In [68]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(bayarea_df['Latitude'], bayarea_df['Longitude'], bayarea_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [71]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(566, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Alameda County, California‎",37.86948,-122.25929,Zellerbach Hall,37.869167,-122.260498,College Theater
1,"Alameda County, California‎",37.86948,-122.25929,Campanile (Sather Tower),37.871932,-122.258097,Scenic Lookout
2,"Alameda County, California‎",37.86948,-122.25929,Woo Hon Fai Hall,37.868679,-122.255867,Performing Arts Venue
3,"Alameda County, California‎",37.86948,-122.25929,Top Dog,37.867986,-122.257397,Hot Dog Joint
4,"Alameda County, California‎",37.86948,-122.25929,Tacos Sinaloa,37.867242,-122.258914,Taco Place


In [72]:
#check how many venues were returned for each neighorhood
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Alameda County, California‎",100,100,100,100,100,100
"Contra Costa County, California‎",25,25,25,25,25,25
"Marin County, California‎",91,91,91,91,91,91
"Napa County, California‎",53,53,53,53,53,53
San Francisco‎,100,100,100,100,100,100
"San Mateo County, California‎",27,27,27,27,27,27
"Santa Clara County, California‎",100,100,100,100,100,100
"Solano County, California‎",15,15,15,15,15,15
"Sonoma County, California‎ (17 C",55,55,55,55,55,55


In [75]:
#unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

## print out the list of categories
venues_df['VenueCategory'].unique()[:200]

There are 182 uniques categories.


array(['College Theater', 'Scenic Lookout', 'Performing Arts Venue',
       'Hot Dog Joint', 'Taco Place', 'Japanese Restaurant',
       'Ice Cream Shop', 'Pool', 'Concert Hall', 'Record Shop',
       'Bookstore', 'Shipping Store', 'Bubble Tea Shop',
       'Toy / Game Store', 'Bakery', 'Art Museum', 'Amphitheater',
       'Italian Restaurant', 'Chinese Restaurant', 'Accessories Store',
       'Salad Place', 'Pizza Place', 'College Gym',
       'College Arts Building', 'Burger Joint', 'Coffee Shop', 'Brewery',
       'Brazilian Restaurant', 'Frozen Yogurt Shop', 'Café',
       'Beer Garden', 'Theater', 'Food & Drink Shop', 'Thai Restaurant',
       'College Library', 'Music Venue', 'Breakfast Spot',
       'Indian Restaurant', 'American Restaurant', 'Tea Room',
       'Southern / Soul Food Restaurant', 'Gift Shop',
       'Mexican Restaurant', 'Tattoo Parlor', 'College Bookstore',
       'Vietnamese Restaurant', 'Yoga Studio', 'Hotel',
       'Asian Restaurant', 'Comic Shop', 'Sandwich

In [83]:
# check if the results contain "Gym / Fitness Center"
"Gym" in venues_df['VenueCategory'].unique()

True

<b> Analyzing each Neighborhood</b>

In [85]:
# one hot encoding
ba_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ba_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ba_onehot.columns[-1]] + list(ba_onehot.columns[:-1])
ba_onehot = ba_onehot[fixed_columns]

print(ba_onehot.shape)
ba_onehot.head()

(566, 183)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport Service,American Restaurant,Amphitheater,Art Gallery,Art Museum,Asian Restaurant,Auto Dealership,Automotive Shop,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bike Rental / Bike Share,Board Shop,Boat or Ferry,Bookstore,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Burrito Place,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Bookstore,College Gym,College Library,College Theater,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Dog Run,Donut Shop,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Truck,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hawaiian Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Lighthouse,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Music Venue,Nature Preserve,New American Restaurant,Opera House,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Resort,Restaurant,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stables,Supermarket,Sushi Restaurant,Taco Place,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Toy / Game Store,Track Stadium,Trail,Video Store,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio
0,"Alameda County, California‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Alameda County, California‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Alameda County, California‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Alameda County, California‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Alameda County, California‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [86]:
#Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

bayarea_grouped = ba_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(bayarea_grouped.shape)
bayarea_grouped

(9, 183)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport Service,American Restaurant,Amphitheater,Art Gallery,Art Museum,Asian Restaurant,Auto Dealership,Automotive Shop,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bike Rental / Bike Share,Board Shop,Boat or Ferry,Bookstore,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Burrito Place,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Bookstore,College Gym,College Library,College Theater,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Dog Run,Donut Shop,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Truck,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hawaiian Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Lighthouse,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Music Venue,Nature Preserve,New American Restaurant,Opera House,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Resort,Restaurant,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stables,Supermarket,Sushi Restaurant,Taco Place,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Toy / Game Store,Track Stadium,Trail,Video Store,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio
0,"Alameda County, California‎",0.0,0.02,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.01,0.02,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.02,0.04,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.02,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03
1,"Contra Costa County, California‎",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0
2,"Marin County, California‎",0.010989,0.0,0.010989,0.032967,0.0,0.0,0.0,0.010989,0.010989,0.010989,0.010989,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.010989,0.0,0.021978,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.065934,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.021978,0.0,0.0,0.0,0.010989,0.021978,0.0,0.0,0.032967,0.0,0.0,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.021978,0.010989,0.0,0.010989,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.010989,0.021978,0.021978,0.043956,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.010989,0.010989,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.021978,0.010989,0.0,0.0,0.021978,0.0,0.0,0.010989,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989
3,"Napa County, California‎",0.0,0.0,0.0,0.056604,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.056604,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.113208,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.037736,0.0,0.037736,0.0,0.0,0.0,0.0,0.037736,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.056604,0.0,0.037736,0.0,0.0
4,San Francisco‎,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.03,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.01,0.01,0.01,0.01,0.0,0.03,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.03
5,"San Mateo County, California‎",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.259259,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037
6,"Santa Clara County, California‎",0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.04,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.01,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0
7,"Solano County, California‎",0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Sonoma County, California‎ (17 C",0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.018182,0.018182,0.0,0.0,0.018182,0.018182,0.0,0.0,0.018182,0.018182,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.036364,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.018182,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.109091,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.036364,0.0,0.0,0.018182,0.0,0.0,0.018182,0.0,0.018182,0.0,0.0,0.0,0.018182,0.018182,0.0,0.018182,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.036364,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [89]:
len(bayarea_grouped[bayarea_grouped["Gym / Fitness Center"] > 0])

4

In [90]:
len(bayarea_grouped[bayarea_grouped["Gym"] > 0])

4

In [91]:
len(bayarea_grouped[bayarea_grouped["College Gym"] > 0])

1

In [92]:
len(bayarea_grouped[bayarea_grouped["Yoga Studio"] > 0])

4

<b>Create a new DataFrame for fitness related data only</b>

In [93]:
bayarea_fitness = bayarea_grouped[["Neighborhoods","Gym / Fitness Center","Gym","College Gym","Yoga Studio"]]

In [94]:
bayarea_fitness.head()

Unnamed: 0,Neighborhoods,Gym / Fitness Center,Gym,College Gym,Yoga Studio
0,"Alameda County, California‎",0.0,0.01,0.01,0.03
1,"Contra Costa County, California‎",0.0,0.0,0.0,0.0
2,"Marin County, California‎",0.010989,0.0,0.0,0.010989
3,"Napa County, California‎",0.0,0.0,0.0,0.0
4,San Francisco‎,0.01,0.02,0.0,0.03


<b> Clustering Neighborhoods </b>

In [95]:
#Run Kmeans to cluster the naighborhoods into 3 clusters

# set number of clusters
kclusters = 3

bayarea_clustering = bayarea_fitness.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bayarea_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 0, 0, 2, 2, 1, 1, 0], dtype=int32)

In [97]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
bayarea_merged = bayarea_fitness.copy()

# add clustering labels
bayarea_merged["Cluster Labels"] = kmeans.labels_

In [99]:
bayarea_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)

bayarea_merged.head()

Unnamed: 0,Neighborhood,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Cluster Labels
0,"Alameda County, California‎",0.0,0.01,0.01,0.03,2
1,"Contra Costa County, California‎",0.0,0.0,0.0,0.0,0
2,"Marin County, California‎",0.010989,0.0,0.0,0.010989,0
3,"Napa County, California‎",0.0,0.0,0.0,0.0,0
4,San Francisco‎,0.01,0.02,0.0,0.03,2


In [100]:
# merge bayarea_grouped with bay area data to add latitude/longitude for each neighborhood
bayarea_merged = bayarea_merged.join(bayarea_df.set_index("Neighborhood"), on="Neighborhood")

print(bayarea_merged.shape)
bayarea_merged.head() # check the last columns!

(9, 8)


Unnamed: 0,Neighborhood,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Cluster Labels,Latitude,Longitude
0,"Alameda County, California‎",0.0,0.01,0.01,0.03,2,37.86948,-122.25929
1,"Contra Costa County, California‎",0.0,0.0,0.0,0.0,0,38.037579,-122.262044
2,"Marin County, California‎",0.010989,0.0,0.0,0.010989,0,37.887899,-122.530403
3,"Napa County, California‎",0.0,0.0,0.0,0.0,0,38.396874,-122.360991
4,San Francisco‎,0.01,0.02,0.0,0.03,2,37.77712,-122.41964


In [101]:
# sort the results by Cluster Labels
print(bayarea_merged.shape)
bayarea_merged.sort_values(["Cluster Labels"], inplace=True)
bayarea_merged

(9, 8)


Unnamed: 0,Neighborhood,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Cluster Labels,Latitude,Longitude
1,"Contra Costa County, California‎",0.0,0.0,0.0,0.0,0,38.037579,-122.262044
2,"Marin County, California‎",0.010989,0.0,0.0,0.010989,0,37.887899,-122.530403
3,"Napa County, California‎",0.0,0.0,0.0,0.0,0,38.396874,-122.360991
8,"Sonoma County, California‎ (17 C",0.0,0.018182,0.0,0.0,0,38.31047,-122.49327
6,"Santa Clara County, California‎",0.03,0.04,0.0,0.0,1,37.303279,-121.89362
7,"Solano County, California‎",0.066667,0.0,0.0,0.0,1,38.08572,-122.139361
0,"Alameda County, California‎",0.0,0.01,0.01,0.03,2,37.86948,-122.25929
4,San Francisco‎,0.01,0.02,0.0,0.03,2,37.77712,-122.41964
5,"San Mateo County, California‎",0.0,0.0,0.0,0.037037,2,37.527581,-122.513343


<b> Visualizing clusters </b>

In [105]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=9)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bayarea_merged['Latitude'], bayarea_merged['Longitude'], bayarea_merged['Neighborhood'], bayarea_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [106]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

<b> Examine Clusters </b>

In [107]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Cluster Labels,Latitude,Longitude
1,"Contra Costa County, California‎",0.0,0.0,0.0,0.0,0,38.037579,-122.262044
2,"Marin County, California‎",0.010989,0.0,0.0,0.010989,0,37.887899,-122.530403
3,"Napa County, California‎",0.0,0.0,0.0,0.0,0,38.396874,-122.360991
8,"Sonoma County, California‎ (17 C",0.0,0.018182,0.0,0.0,0,38.31047,-122.49327


In [109]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 1 ]

Unnamed: 0,Neighborhood,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Cluster Labels,Latitude,Longitude
6,"Santa Clara County, California‎",0.03,0.04,0.0,0.0,1,37.303279,-121.89362
7,"Solano County, California‎",0.066667,0.0,0.0,0.0,1,38.08572,-122.139361


In [110]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Gym / Fitness Center,Gym,College Gym,Yoga Studio,Cluster Labels,Latitude,Longitude
0,"Alameda County, California‎",0.0,0.01,0.01,0.03,2,37.86948,-122.25929
4,San Francisco‎,0.01,0.02,0.0,0.03,2,37.77712,-122.41964
5,"San Mateo County, California‎",0.0,0.0,0.0,0.037037,2,37.527581,-122.513343


<b> Observations </b>

Most of the Fitness Centers are concentrated in Cluster 1 and moderate number in Cluster 2. On the other hand, cluster 0 has very low number to totally no Fitness Centers in the neighborhoods. This represents a great opportunity and high potential areas to open new Fitness Centers as there is very little to no competition from existing gyms/Yoga Studios. Meanwhile, Fitness Centers in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of multiple centers. 

Therefore, this project recommends property developers to capitalize on these findings to open new Fitness centers/Gyms/Yoga Studios in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new centers in neighborhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of Fitness Centers and suffering from intense competition.