# San Francisco Neighborhoods



### Introduction/Bussiness Problem
Data analysis from public sources to identify family friendly neighborhoods (more parks, sport activities, culture spots and less bars and urbunt development). <br>
San Francisco is a popular tourist destination. Also it is famous as a big financial center. Usually family people with children prefer to live outside the city. My objective is to find neighborhoods of San Francisco with 'good' work-life balance.

### Data
Neighborhoods of San Francisco (real estate site): http://www.houseofkinoko.com/district-guide/ <br>
Venues of each neighborhoods: https://api.foursquare.com <br>
I will use geopy library to find coordinates of each neighborhoods. Unfortunately, geopy sometimes finds coordinates not correctly. In this case I will find significant points for such neighborhoods using Google.


### Metodology
Use K-means clustering to define 4 clusters. Analyze each cluster and find the cluster which matches my goal.

### Import libraries


In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from bs4 import BeautifulSoup  # to read html

import requests # library to handle requests, already imported
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')


Libraries imported.


### Select SF neighborhoods 

#### Get list of neighborhoods

In [2]:
# Get districts and nighborhoods SF
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests
from bs4 import BeautifulSoup

# read html
url = 'http://www.houseofkinoko.com/district-guide/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
response = requests.get(url, headers=headers)
htmltext = response.text

# extract name of neighborhoods from html
soup = BeautifulSoup(htmltext)
info_html = soup.find_all('div', attrs={'class':'vc_row wpb_row vc_inner vc_row-fluid vc_custom_1496726763337 vc_row-has-fill'})
district_neighborhoods = []
for item in info_html:
    district = item.find('h3', attrs={'class':'districtHeader'}).text.replace(u'\xa0', ' ')
    print(district)
    neighborhoods = item.find_all('li')
    for neighbor in neighborhoods:
        neighbor = neighbor.text.replace(u'\xa0', ' ').strip()
        district_neighborhoods.append([district, neighbor])
sf_df = pd.DataFrame(district_neighborhoods, columns=['District', 'Neighborhood'])
sf_df.head()

DISTRICT 1 – NORTHWEST
DISTRICT 2 – CENTRAL WEST
DISTRICT 3 – SOUTHWEST
DISTRICT 4 – TWIN PEAKS WEST
DISTRICT 5 – CENTRAL
DISTRICT 6 – CENTRAL NORTH
DISTRICT 7 – NORTH
DISTRICT 8 – NORTHEAST
DISTRICT 9 – CENTRAL EAST
DISTRICT 10 – SOUTH EAST


Unnamed: 0,District,Neighborhood
0,DISTRICT 1 – NORTHWEST,Central Richmond
1,DISTRICT 1 – NORTHWEST,Inner Richmond
2,DISTRICT 1 – NORTHWEST,Jordan Park – Laurel Heights
3,DISTRICT 1 – NORTHWEST,Lake Mountain
4,DISTRICT 1 – NORTHWEST,Lone Mountain


In [5]:
#combine diistricts with the same neighborhood
sf_neighborhoods = sf_df.groupby('Neighborhood').agg({'District': ', '.join}).reset_index()

# fix errors:replace Buena Vista to Inner Sunset (Buena )
idx = sf_neighborhoods[sf_neighborhoods.Neighborhood == 'Buena Vista'].index
sf_neighborhoods.at[idx, 'Neighborhood'] = 'Inner Sunset'
sf_neighborhoods.at[idx, 'District'] = 'DISTRICT 2 – CENTRAL WEST'
sf_neighborhoods.sort_values('Neighborhood', inplace=True)
sf_neighborhoods.reset_index(inplace=True, drop=True)

print(' San Francisco has 10 districts with {0} neighborhoods.'.format(len(sf_neighborhoods)))
sf_neighborhoods


 San Francisco has 10 districts with 88 neighborhoods.


Unnamed: 0,Neighborhood,District
0,Alamo Square,DISTRICT 6 – CENTRAL NORTH
1,Anza Vista,DISTRICT 6 – CENTRAL NORTH
2,Ashbury Heights,DISTRICT 5 – CENTRAL
3,Balboa Terrace,DISTRICT 4 – TWIN PEAKS WEST
4,Bayview,DISTRICT 10 – SOUTH EAST
5,Bayview Heights,DISTRICT 10 – SOUTH EAST
6,Bernal Heights,DISTRICT 9 – CENTRAL EAST
7,Candlestick Point,DISTRICT 10 – SOUTH EAST
8,Central Richmond,DISTRICT 1 – NORTHWEST
9,Central Sunset,DISTRICT 2 – CENTRAL WEST


In [6]:
sf_neighborhoods.groupby('District').count()

Unnamed: 0_level_0,Neighborhood
District,Unnamed: 1_level_1
DISTRICT 1 – NORTHWEST,7
DISTRICT 10 – SOUTH EAST,12
DISTRICT 2 – CENTRAL WEST,7
DISTRICT 3 – SOUTHWEST,9
DISTRICT 4 – TWIN PEAKS WEST,15
DISTRICT 5 – CENTRAL,11
DISTRICT 6 – CENTRAL NORTH,6
DISTRICT 7 – NORTH,4
DISTRICT 8 – NORTHEAST,8
"DISTRICT 8 – NORTHEAST, DISTRICT 9 – CENTRAL EAST",1


#### Get location of SF neighborhoods (latitude and longitude)
Used geopy. Unfortunately, geopy library does not find coordinates of some neighborhoods correctly. Using google to find remarkable places for such neighborhoods. 

In [7]:
# get longitude and latitude for each neighborhood
geolocator = Nominatim(user_agent="sf_explorer")
# Replace name of neighborhood with synonym to find coordinates
sf_neighborhoods['Alias_n'] = sf_neighborhoods['Neighborhood']
def replace_name(old_name, new_name):
    idx = sf_neighborhoods.loc[sf_neighborhoods['Alias_n'] == old_name].index
    sf_neighborhoods.at[idx, 'Alias_n'] = new_name

replace_name("Ashbury Heights", "Ashbury Terrace")
replace_name("Balboa Terrace","300 San Leandro Way")
replace_name("Bayview","Hilltop Park")
replace_name("Bayview Heights", "Bayview Park Road")
replace_name("Central Richmond", "Holy Virgin Cathedral")
replace_name("Central Sunset","2350 Moraga Street")
replace_name("Central Waterfront – Dogpatch","Dogpatch")
replace_name("Cole Valley – Parnassus Heights", "Cole Valley")
replace_name("Corona Heights", "Lower Terrace")
replace_name("Diamond Heights","Glen Canyon Park")
replace_name("Eureka Valley – Dolores Heights", "Eureka Valley")
replace_name("Golden Gate Heights","Golden Gate Heights Park")
replace_name("Financial District – Barbary Coast", "Transamerica Pyramid")
replace_name("Forest Hill Extension","Garcia Ave")
replace_name("Ingleside Heights","Brotherhood Way Open Space")
replace_name("Ingleside Terrace","Urbano Dr")
replace_name("Inner Mission","Franklin Square")
replace_name("Inner Parkside","700 Taraval St")
replace_name("Inner Richmond", "Cinderella")
replace_name("Jordan Park – Laurel Heights", "450 Euclid Ave")
replace_name("Lakeside","Winston Dr")
replace_name("Lake Mountain","Mountain Lake")
replace_name("Lake Shore","Lake Merced Blvd")
replace_name("Little Hollywood","Little Hollywood Park")
replace_name("Marina", "Marina District")
replace_name("Merced Manor","Merced Manor Reservoir")
replace_name("Midtown Terrace","Midtown Terrace Playground")
replace_name("Miraloma Park","Miraloma")
replace_name("Mission Terrace","Capistrano Ave")
replace_name("Monterey Heights","Maywood Dr")
replace_name("Mount Davidson Manor", "Kenwood Way")
replace_name("North Panhandle", "1500 Fulton St.")
replace_name("North Waterfront", "Fisherman's Wharf")
replace_name("Outer Mission","Moneta way")
replace_name("Outer Richmond","5100 Anza Street")
replace_name("Outer Parkside","3600 Taraval St")
replace_name("Outer Sunset","3600 Moraga St")
replace_name("Parkside","2000 Taraval St")
replace_name("Saint Francis Woods","Saint Francis Wood")
replace_name("Sherwood Forest","Robinhood Dr")
replace_name("Sunnyside", "Sunnyside Elementary School")
replace_name("Westwood Highlands","Brentwood Ave")
replace_name("Westwood Park","Miramar Ave")
replace_name("Yerba Buena","Yerba Buena Garden")
replace_name("Van Ness – Civic Center", "Civic Center")

In [8]:
def get_lat_lon(neighborhood):
    location = geolocator.geocode("{0}, San Francisco, California".format(neighborhood))
    if location is not None:
        return location.raw
    else:
        return None
    
sf_neighborhoods['Info'] = sf_neighborhoods['Alias_n'].apply(lambda x: get_lat_lon(x))

In [9]:
sf_neighborhoods['Latitude'] = sf_neighborhoods['Info'].apply(lambda x: x['lat'] if x is not None else None)
sf_neighborhoods['Longitude'] = sf_neighborhoods['Info'].apply(lambda x: x['lon'] if x is not None else None)
sf_neighborhoods.drop('Info', axis=1, inplace=True)
sf_neighborhoods.drop('Alias_n', axis=1, inplace=True)
sf_neighborhoods.shape

(88, 4)

In [10]:
sf_neighborhoods.head()

Unnamed: 0,Neighborhood,District,Latitude,Longitude
0,Alamo Square,DISTRICT 6 – CENTRAL NORTH,37.7763599,-122.43470002366266
1,Anza Vista,DISTRICT 6 – CENTRAL NORTH,37.7808364,-122.4431489
2,Ashbury Heights,DISTRICT 5 – CENTRAL,37.7649565,-122.4447648
3,Balboa Terrace,DISTRICT 4 – TWIN PEAKS WEST,37.73173,-122.468857
4,Bayview,DISTRICT 10 – SOUTH EAST,37.7329005,-122.38365140907936


### SF Map

In [12]:
address = 'San Francisco, California'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco are 37.7790262, -122.4199061.


In [13]:
# create map of San Francisco using latitude and longitude values
map_sf = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, district, neighborhood in zip(sf_neighborhoods['Latitude'], sf_neighborhoods['Longitude'], sf_neighborhoods['District'], sf_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#006666',
        weight=1,
        fill=True,
        fill_color='#CDFFFF',
        fill_opacity=0.5,
        parse_html=False).add_to(map_sf)  
    
map_sf

#### Screenshot
![sf](sf_neighborhoods.jpg)

### Get information about venus for each neighborhoods

#### Define Foursquare Credentials and Version

In [20]:
CLIENT_ID = '' #  Foursquare ID
CLIENT_SECRET = '' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y0FH2414BMQTZ3YE3LM0HXEDJCFJ3OQDLUOAN4U4YGZ3GUHS
CLIENT_SECRET:SCC5MVIPC0UTLIPRF5YQQGCE1CU4RGRNTTM22OG5WSVKJ3NM


#### Get the top 100 venues around each neighborhood within a radius of 500 meters

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
sf_venues = getNearbyVenues(names=sf_neighborhoods['Neighborhood'],
                                   latitudes=sf_neighborhoods['Latitude'],
                                   longitudes=sf_neighborhoods['Longitude']
                                  )
print(sf_venues.shape)
sf_venues.head()


(3372, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alamo Square,37.7763599,-122.43470002366266,Alamo Square,37.776045,-122.434363,Park
1,Alamo Square,37.7763599,-122.43470002366266,Alamo Square Dog Park,37.775878,-122.43574,Dog Run
2,Alamo Square,37.7763599,-122.43470002366266,Painted Ladies,37.77612,-122.433389,Historic Site
3,Alamo Square,37.7763599,-122.43470002366266,Lucinda’s Deli,37.774757,-122.436239,Sandwich Place
4,Alamo Square,37.7763599,-122.43470002366266,The Independent,37.775573,-122.437835,Rock Club


#### Check how many venues were returned for each neighborhood

In [24]:
sf_venues.groupby('Neighborhood').Venue.count()

Neighborhood
Alamo Square                           75
Anza Vista                             22
Ashbury Heights                        40
Balboa Terrace                         19
Bayview                                 1
Bayview Heights                         4
Bernal Heights                         69
Candlestick Point                      10
Central Richmond                       59
Central Sunset                         15
Central Waterfront – Dogpatch          59
Clarendon Heights                       5
Cole Valley – Parnassus Heights        47
Corona Heights                         16
Cow Hollow                             94
Crocker Amazon                          3
Diamond Heights                        15
Downtown San Francisco                100
Duboce Triangle                        64
Eureka Valley – Dolores Heights        87
Excelsior                              46
Financial District – Barbary Coast    100
Forest Hill                             5
Forest Hill Extension

#### Check how many unique categories can be curated from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(sf_venues['Venue Category'].unique())))

There are 335 uniques categories.


In [26]:
#list of all categories
sf_categ = sf_venues['Venue Category'].unique()
sf_categ

array(['Park', 'Dog Run', 'Historic Site', 'Sandwich Place', 'Rock Club',
       'Bakery', 'New American Restaurant', 'Gift Shop', 'Bar',
       'Seafood Restaurant', 'BBQ Joint', 'Market', 'Food Truck',
       'Ice Cream Shop', 'Boutique', 'Souvlaki Shop', 'Pizza Place',
       'Cocktail Bar', 'Bubble Tea Shop', 'Farmers Market',
       'Sushi Restaurant', 'Record Shop', 'Frozen Yogurt Shop',
       "Men's Store", 'Spiritual Center',
       'Southern / Soul Food Restaurant', 'Nightclub', 'Roller Rink',
       'Antique Shop', 'Marijuana Dispensary', 'Burrito Place', 'Diner',
       'Korean Restaurant', 'Mediterranean Restaurant',
       'Italian Restaurant', 'Hunan Restaurant', 'Toy / Game Store',
       'Comic Shop', 'Yoga Studio', 'Restaurant', 'Wine Bar',
       'Gym / Fitness Center', 'Hotel', 'Pakistani Restaurant',
       'Pet Store', 'Indian Restaurant', 'Arcade', 'Café', 'Coffee Shop',
       'Grocery Store', 'Fried Chicken Joint', 'Ethiopian Restaurant',
       'Liquor Store',

 #### Analyze each Neighborhood

In [27]:
# one hot encoding
sf_onehot = pd.get_dummies(sf_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sf_onehot['Neighborhood'] = sf_venues['Neighborhood'] 
columns = sf_onehot.columns
print("Number of columns before moving 'Neighborhood':", len(columns))

# move neighborhood column to the first column
idx = columns.get_loc("Neighborhood")
columns = list(columns) # convert to list
print("Idx of column 'Neighborhood'", idx)
print("Move 'Neighborhood' to the first column")
fixed_columns = [columns[idx]] + columns[:idx] + columns[idx+1:]
sf_onehot = sf_onehot[fixed_columns]
print("Number of columns after moving 'Neighborhood':", len(sf_onehot.columns))

sf_onehot.head()

Number of columns before moving 'Neighborhood': 335
Idx of column 'Neighborhood' 207
Move 'Neighborhood' to the first column
Number of columns after moving 'Neighborhood': 335


Unnamed: 0,Neighborhood,ATM,Acai House,Accessories Store,Adult Boutique,Alternative Healer,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comedy Club,Comic Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Credit Union,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Egyptian Restaurant,Electronics Store,Elementary School,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Food & Drink Shop,Food Stand,Food Truck,Football Stadium,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,High School,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Picnic Area,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Research Laboratory,Reservoir,Residential Building (Apartment / Condo),Restaurant,Road,Rock Club,Roller Rink,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Trade School,Trail,Train Station,Trattoria/Osteria,Travel Agency,Tree,Tunnel,Turkish Restaurant,Tuscan Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Wagashi Place,Warehouse,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [28]:
sf_grouped = sf_onehot.groupby('Neighborhood').mean().reset_index()
sf_grouped.head()

Unnamed: 0,Neighborhood,ATM,Acai House,Accessories Store,Adult Boutique,Alternative Healer,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comedy Club,Comic Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Credit Union,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Egyptian Restaurant,Electronics Store,Elementary School,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Food & Drink Shop,Food Stand,Food Truck,Football Stadium,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,High School,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Picnic Area,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Research Laboratory,Reservoir,Residential Building (Apartment / Condo),Restaurant,Road,Rock Club,Roller Rink,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Trade School,Trail,Train Station,Trattoria/Osteria,Travel Agency,Tree,Tunnel,Turkish Restaurant,Tuscan Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Wagashi Place,Warehouse,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Alamo Square,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.026667,0.013333,0.053333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.013333,0.013333,0.026667,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.013333
1,Anza Vista,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ashbury Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0
3,Balboa Terrace,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bayview,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Print each neighborhood along with the top 5 most common venues

In [30]:
num_top_venues = 5

for hood in sf_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sf_grouped[sf_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alamo Square----
              venue  freq
0               Bar  0.05
1         BBQ Joint  0.04
2  Sushi Restaurant  0.03
3             Hotel  0.03
4       Record Shop  0.03


----Anza Vista----
                     venue  freq
0                     Café  0.14
1              Bus Station  0.09
2  Health & Beauty Service  0.09
3              Coffee Shop  0.09
4               Donut Shop  0.05


----Ashbury Heights----
               venue  freq
0     Breakfast Spot  0.08
1               Park  0.05
2  Convenience Store  0.05
3     Scenic Lookout  0.05
4        Coffee Shop  0.05


----Balboa Terrace----
                 venue  freq
0      Bubble Tea Shop  0.11
1                  ATM  0.05
2               Bakery  0.05
3   Light Rail Station  0.05
4  Japanese Restaurant  0.05


----Bayview----
                     venue  freq
0  Health & Beauty Service   1.0
1                      ATM   0.0
2                     Park   0.0
3                Pet Store   0.0
4                 Pet Café   0.0



                 venue  freq
0                Trail  0.33
1   Light Rail Station  0.22
2                 Lake  0.11
3  Monument / Landmark  0.11
4           Playground  0.11


----Miraloma Park----
                 venue  freq
0                 Park  0.29
1                 Tree  0.14
2             Bus Line  0.14
3  Monument / Landmark  0.14
4           Playground  0.14


----Mission Bay----
         venue  freq
0   Food Truck  0.16
1          Gym  0.07
2         Café  0.05
3  Coffee Shop  0.05
4     Pharmacy  0.04


----Mission Dolores----
                  venue  freq
0              Boutique  0.05
1  Gym / Fitness Center  0.04
2        Clothing Store  0.04
3                   Spa  0.03
4           Coffee Shop  0.03


----Mission Terrace----
                    venue  freq
0      Light Rail Station  0.08
1      Mexican Restaurant  0.08
2  Thrift / Vintage Store  0.05
3            Liquor Store  0.05
4                    Bank  0.05


----Monterey Heights----
       venue  freq
0       Pa

#### Create dataframewith 10 top venues for each neighborhood

In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    final_row_categories = list(row_categories_sorted[row_categories_sorted != 0].index.values) 
    len_categories = len(final_row_categories)
    #return only categories which really exist in neighborhhod
    return final_row_categories[:num_top_venues] if len_categories >= num_top_venues else final_row_categories+['']*(num_top_venues-len_categories)

In [46]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sf_grouped['Neighborhood']

for ind in np.arange(sf_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sf_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alamo Square,Bar,BBQ Joint,Hotel,Seafood Restaurant,Ethiopian Restaurant,Pizza Place,Record Shop,Bakery,Food Truck,Liquor Store
1,Anza Vista,Café,Coffee Shop,Health & Beauty Service,Bus Station,Shop & Service,Sandwich Place,Southern / Soul Food Restaurant,Big Box Store,Donut Shop,Arts & Crafts Store
2,Ashbury Heights,Breakfast Spot,Convenience Store,Scenic Lookout,Coffee Shop,Park,Hill,Plaza,Bus Station,Snack Place,Café
3,Balboa Terrace,Bubble Tea Shop,ATM,Bakery,Japanese Restaurant,Korean Restaurant,Light Rail Station,Gym,Fountain,Park,Pharmacy
4,Bayview,Health & Beauty Service,,,,,,,,,


### Cluster Neighborhoods
#### Run k-means to cluster the neighborhood into 4 clusters

In [47]:
# set number of clusters
kclusters = 4

sf_grouped_clustering = sf_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sf_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0,
       0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 3, 0, 0, 2, 2, 0],
      dtype=int32)

#### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [48]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sf_merged = sf_neighborhoods.copy()

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
sf_merged = sf_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

sf_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alamo Square,DISTRICT 6 – CENTRAL NORTH,37.7763599,-122.43470002366266,0,Bar,BBQ Joint,Hotel,Seafood Restaurant,Ethiopian Restaurant,Pizza Place,Record Shop,Bakery,Food Truck,Liquor Store
1,Anza Vista,DISTRICT 6 – CENTRAL NORTH,37.7808364,-122.4431489,0,Café,Coffee Shop,Health & Beauty Service,Bus Station,Shop & Service,Sandwich Place,Southern / Soul Food Restaurant,Big Box Store,Donut Shop,Arts & Crafts Store
2,Ashbury Heights,DISTRICT 5 – CENTRAL,37.7649565,-122.4447648,0,Breakfast Spot,Convenience Store,Scenic Lookout,Coffee Shop,Park,Hill,Plaza,Bus Station,Snack Place,Café
3,Balboa Terrace,DISTRICT 4 – TWIN PEAKS WEST,37.73173,-122.468857,0,Bubble Tea Shop,ATM,Bakery,Japanese Restaurant,Korean Restaurant,Light Rail Station,Gym,Fountain,Park,Pharmacy
4,Bayview,DISTRICT 10 – SOUTH EAST,37.7329005,-122.38365140907936,1,Health & Beauty Service,,,,,,,,,


In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(rainbow)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sf_merged['Latitude'], sf_merged['Longitude'], sf_merged['Neighborhood'], sf_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

['#8000ff', '#2adddd', '#d4dd80', '#ff0000']


#### Screenshot
![sf_clusters](sf_clusters.jpg)

### Examine clusters

#### Cluster 0.
The most neighborhoods of Cluster 0 have a lot of bars, restaurants, shops. if you are young, you don’t have a family, you work and spend time with friends, most neighborhoods of the city will suit you. Also, such environment is very friendly for tourists.

In [54]:
cluster_0 = sf_merged.loc[sf_merged['Cluster Labels'] == 0, sf_merged.columns[[0,1] + list(range(5, sf_merged.shape[1]))]]
print(len(cluster_0))
cluster_0

77


Unnamed: 0,Neighborhood,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alamo Square,DISTRICT 6 – CENTRAL NORTH,Bar,BBQ Joint,Hotel,Seafood Restaurant,Ethiopian Restaurant,Pizza Place,Record Shop,Bakery,Food Truck,Liquor Store
1,Anza Vista,DISTRICT 6 – CENTRAL NORTH,Café,Coffee Shop,Health & Beauty Service,Bus Station,Shop & Service,Sandwich Place,Southern / Soul Food Restaurant,Big Box Store,Donut Shop,Arts & Crafts Store
2,Ashbury Heights,DISTRICT 5 – CENTRAL,Breakfast Spot,Convenience Store,Scenic Lookout,Coffee Shop,Park,Hill,Plaza,Bus Station,Snack Place,Café
3,Balboa Terrace,DISTRICT 4 – TWIN PEAKS WEST,Bubble Tea Shop,ATM,Bakery,Japanese Restaurant,Korean Restaurant,Light Rail Station,Gym,Fountain,Park,Pharmacy
5,Bayview Heights,DISTRICT 10 – SOUTH EAST,American Restaurant,Café,Bistro,Park,,,,,,
6,Bernal Heights,DISTRICT 9 – CENTRAL EAST,Coffee Shop,Playground,Mexican Restaurant,Italian Restaurant,Yoga Studio,Peruvian Restaurant,Food Truck,Cocktail Bar,Gourmet Shop,Grocery Store
7,Candlestick Point,DISTRICT 10 – SOUTH EAST,Football Stadium,American Restaurant,Food & Drink Shop,Soccer Field,Stadium,Café,Bistro,Picnic Area,,
8,Central Richmond,DISTRICT 1 – NORTHWEST,Korean Restaurant,Sushi Restaurant,Thai Restaurant,Café,Vietnamese Restaurant,Burger Joint,Dim Sum Restaurant,Restaurant,Deli / Bodega,Indian Restaurant
9,Central Sunset,DISTRICT 2 – CENTRAL WEST,Chinese Restaurant,Dim Sum Restaurant,Donut Shop,Bar,Coffee Shop,Sandwich Place,Playground,Vietnamese Restaurant,Bank,Bubble Tea Shop
10,Central Waterfront – Dogpatch,DISTRICT 9 – CENTRAL EAST,Gym / Fitness Center,Wine Bar,Park,Café,Coffee Shop,Brewery,Restaurant,Sandwich Place,Bakery,Gift Shop


#### Cluster 1.
This cluster has only one neighborhood Bayview. Bayview is the most isolated neighborhood. It's very industrial. 

In [51]:
sf_merged.loc[sf_merged['Cluster Labels'] == 1, sf_merged.columns[[0,1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,Neighborhood,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bayview,DISTRICT 10 – SOUTH EAST,Health & Beauty Service,,,,,,,,,


#### Cluster 2.
This cluster have neighborhoods with a lot of parks, trails, sport activities and good transportation.

In [57]:
cluster_2 = sf_merged.loc[sf_merged['Cluster Labels'] == 2, sf_merged.columns[[0,1] + list(range(5, sf_merged.shape[1]))]]
print(len(cluster_2))
cluster_2

9


Unnamed: 0,Neighborhood,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Clarendon Heights,DISTRICT 5 – CENTRAL,Trail,Playground,Monument / Landmark,Reservoir,,,,,,
13,Corona Heights,DISTRICT 5 – CENTRAL,Park,Trail,Museum,Scenic Lookout,Tennis Court,Hill,Monument / Landmark,Grocery Store,Dog Run,Shoe Store
25,Golden Gate Heights,DISTRICT 2 – CENTRAL WEST,Trail,Park,Playground,Tennis Court,Scenic Lookout,,,,,
37,Lake Mountain,DISTRICT 1 – NORTHWEST,Trail,Intersection,Dog Run,Tennis Court,Park,Tunnel,Art Gallery,Scenic Lookout,Bus Station,Playground
46,Midtown Terrace,DISTRICT 4 – TWIN PEAKS WEST,Trail,Light Rail Station,Bus Stop,Playground,Lake,Monument / Landmark,,,,
72,Sherwood Forest,DISTRICT 4 – TWIN PEAKS WEST,Trail,Bus Stop,Tree,Park,Monument / Landmark,,,,,
80,Twin Peaks,DISTRICT 5 – CENTRAL,Trail,Scenic Lookout,Bus Station,Reservoir,Garden,Tailor Shop,Hill,Bus Stop,,
85,Westwood Highlands,DISTRICT 4 – TWIN PEAKS WEST,Bus Line,Scenic Lookout,,,,,,,,
86,Westwood Park,DISTRICT 4 – TWIN PEAKS WEST,Gym / Fitness Center,Park,Bus Line,Scenic Lookout,,,,,,


#### Cluster 3.
This cluster is pretty similar to cluster 1. The neighborhood is isolated but it's more residential than commercial.

In [53]:
sf_merged.loc[sf_merged['Cluster Labels'] == 3, sf_merged.columns[[0,1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,Neighborhood,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
82,Visitacion Valley,DISTRICT 10 – SOUTH EAST,Vietnamese Restaurant,,,,,,,,,


### Conclusion
As we can see, the cluster 2 is the most beneficial for families. It contains many parks and sports spots. Most of the neighborhoods in this cluster are located in the city center (the most neighborhoods belong DiSTRICT 4 and DISTRICT 5), in close proximity to cultural sites and restaurants/shopping. This is a great combination for living with children in the city.
