#1. Introduction 

San Francisco Bay Area is a populous region in Northern California with nearly 7.8 million people within a nine-county region. It is a major job hub for high tech workers, the population has grown by over 600k since 2010 according to a report by KQED news [1]. 

There are lots of challenges for thousands of newcomers when they first arrived, and one of the most important and frequent questions is where should I live? Within a commute of 1-2 hours to either major job centers like San Francisco or Silicon Valley, there are lots of cities one can choose from. 

In this project, I want to use publicly available social and geographic data to help answer this question utilizing the Data Science tools I have learned during this course.

#2. Data source 
## 2.1 Target City Selection 
There are around ~100 cities in Bay Area, to limit the scope for this project, I filter out 25 most populated cities as candidate cities based on area size and population per square miles. First, I get a list of cities in Bay Area from wiki page [2] which include both the area in square miles and population. I filter out 50% of smaller cities in turns of area and sorted remaining cities by population per square miles. I pick the top 25 cities from this list as target cities in this study. 

Then, I will select a list of somewhat arbitrary criteria for classifying each city: schools ranking, housing costs, neighborhoods, crime rates. One can argue there might be better indicators such as commute time should be included also, but above criteria are chosen mainly because of its availability, also my personal preference or interest. 

Following discuss the data source for each of them.

## 2.2 School Ranking 
It is difficult to scrape online data directly with limited methods I know of, and to not invest too much time on this part of data collection, I looked up all 25 cities’ school ranking by browse through the website www.niche.com [5]. 

This data is the only one collected manually by average the top 3 highest ranked high school ratings of each city according to the website. I saved the ranking into a csv file and uploaded to the jupyter notebook for further analysis.

## 2.3 Housing Cost 
For housing cost, I found the average home price for each city in 2018 from Vitalsigns's website [3]. The data is in csv format, it is loaded into notebook and extracted for median home as housing cost for this project. 

## 2.4 Neighborhood 
I use the venues data from Foursquare to get information about the neighborhood around each city [6]. Most cities have similar venues like coffee shops, restaurants as most popular businesses. There are hundreds of different venues reported, to avoid dilute other criteria such as housing and school, I classified the neighborhood into 5 different labels and only use this summary label in the overall classification at the end. 
## 2.5 Crime rate 
The crime rate data is coming from Wikipedia's California locations by crime rate page [4]. It might be the most important indicator for lots of people with options on where to live.

# 3. Methodology 
The project is focusing on classification of cities around Bay Area for the purpose of providing extra information on in helping decide where to live. There are four metrics are used as features as discussed above. For the classification, I use the K-nearest neighbors (KNN) algorithm to group cities with similar metrics together and display them on a map. A final table with all the data and classification is included in the report.

# 4. Data Analysis and Results 
From the list of target cities, San Jose as biggest in terms of area, 176 Square miles, and San Francisco as most densely populated, 17k people per square mile. It also includes some notable cities in Silicon Valley, such as Mountain View and Santa Clara.

In [None]:
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

### Scrape from wikipedia for Bay Area cities 


In [None]:
wiki_url = "https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_the_San_Francisco_Bay_Area"
wiki_data = pd.read_html(wiki_url)[1]
wiki_data = wiki_data.droplevel(0, axis=1)
wiki_data.rename(columns={'Population (2010)[8][9]': 'population'}, inplace=True)
wiki_data['population_per_sq_mi'] = wiki_data['population'] / wiki_data['sq mi']
# filter out smaller half:
threshold_area = wiki_data['sq mi'].describe()['50%']
target_cities = wiki_data
target_cities = target_cities[target_cities['sq mi'] >= threshold_area]

target_cities = target_cities.sort_values(by='population_per_sq_mi', ascending=False).reset_index(drop=True)[:25]
target_cities = target_cities[['Name', 'sq mi', 'population_per_sq_mi']]
display(target_cities)

Unnamed: 0,Name,sq mi,population_per_sq_mi
0,San Francisco,46.87,17180.179219
1,Berkeley,10.47,10752.626552
2,San Mateo,12.13,8013.767519
3,Oakland,55.79,7003.477326
4,South San Francisco,9.14,6961.925602
5,Alameda,10.61,6956.833176
6,Sunnyvale,21.99,6370.213734
7,San Leandro,13.34,6368.065967
8,Santa Clara,18.41,6326.344378
9,Mountain View,12.0,6172.166667


In [None]:
target_cities.describe()

Unnamed: 0,sq mi,population_per_sq_mi
count,25.0,25.0
mean,27.234,5749.71775
std,33.312016,2972.325941
min,9.14,3480.797836
25%,13.34,3955.45829
50%,17.84,4914.64312
75%,28.35,6370.213734
max,176.53,17180.179219


## 4.1 Get Housing Cost Data


In [None]:
from google.colab import files
uploaded = files.upload()

Saving Vital_Signs__Home_Prices___by_city.csv to Vital_Signs__Home_Prices___by_city (2).csv


In [None]:
home_price = pd.read_csv("Vital_Signs__Home_Prices___by_city.csv")
home_price.keys()

Index(['City', 'County', 'Year', 'MedPrice', 'MedPrice_IA', 'PercentChngPrice',
       'PercentChngPriceIA', 'Source'],
      dtype='object')

In [None]:
#home_price[['City','County', 'MedPrice', 'Year' ]][home_price['Year']==2018]

In [None]:
target_home_price = home_price[home_price['Year']==2018]
target_home_price.rename(columns={'City': 'Name'}, inplace=True) 
target_home_price = target_home_price[['Name', 'MedPrice']]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [None]:
target_home_price.head()
target_with_home_price = pd.merge(target_cities, target_home_price, on="Name")
display(target_with_home_price[['Name','MedPrice']].sort_values('MedPrice', ascending=False).reset_index(drop=True))


Unnamed: 0,Name,MedPrice
0,Cupertino,2289033.0
1,Mountain View,1875867.0
2,Sunnyvale,1857625.0
3,Redwood City,1636433.0
4,San Mateo,1437700.0
5,Santa Clara,1385692.0
6,San Francisco,1337250.0
7,Berkeley,1258050.0
8,Milpitas,1094092.0
9,San Jose,1067425.0


## 4.2 Get school data


In [None]:
from google.colab import files
school_data_uploaded = files.upload()

Saving 25 city school.csv to 25 city school (2).csv


In [None]:
school_data = pd.read_csv("25 city school (1).csv")

In [None]:
target_with_school = pd.merge(target_with_home_price, school_data, on="Name")
display(target_with_school[['Name','Avg Ranking of top 3 highschools']].sort_values('Avg Ranking of top 3 highschools', ascending=True).reset_index(drop=True))


Unnamed: 0,Name,Avg Ranking of top 3 highschools
0,Cupertino,22
1,Mountain View,26
2,Sunnyvale,30
3,San Mateo,45
4,San Ramon,70
5,San Jose,70
6,Berkeley,80
7,Redwood City,90
8,San Francisco,120
9,Alameda,130


##4.3 Get Crime Rate Data


In [None]:
wiki_url = "https://en.wikipedia.org/wiki/California_locations_by_crime_rate"
wiki_data = pd.read_html(wiki_url)[2]
print(f"{wiki_data.shape}")
wiki_data = wiki_data[["City/Agency", "Violent crime rateper 1,000 persons"]]
wiki_data.rename(columns={'City/Agency': 'Name', 'Violent crime rateper 1,000 persons': 'Crime Rate'}, inplace=True)

#display(wiki_data)


(459, 8)


In [None]:
target_include_crime = pd.merge(target_with_home_price, wiki_data, on="Name")

In [None]:
display(target_include_crime[['Name', 'Crime Rate']].sort_values('Crime Rate', ascending=True).reset_index(drop=True))

Unnamed: 0,Name,Crime Rate
0,San Ramon,0.31
1,Cupertino,0.66
2,Sunnyvale,1.12
3,Santa Clara,1.34
4,Milpitas,1.59
5,Brentwood,1.83
6,Alameda,1.88
7,Mountain View,1.98
8,San Mateo,2.25
9,South San Francisco,2.34


## 4.4 Data collection: Venuals from Foursquare API

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [None]:
CLIENT_ID = 'FZ4SGISHJHNAUF001B3WARC00KHRISPK2UAKPDHE14CKI1X1' # your Foursquare ID
CLIENT_SECRET = 'DRD55DNPRNKX0DPMKKDAO2QFEO0FRFMHVQERIMS4RA2EDU0S' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 300
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FZ4SGISHJHNAUF001B3WARC00KHRISPK2UAKPDHE14CKI1X1
CLIENT_SECRET:DRD55DNPRNKX0DPMKKDAO2QFEO0FRFMHVQERIMS4RA2EDU0S


## get coordinates for each city


In [None]:
#! pip install pgeocode

In [None]:
target_cities.head()


Unnamed: 0,Name,sq mi,population_per_sq_mi
0,San Francisco,46.87,17180.179219
1,Berkeley,10.47,10752.626552
2,San Mateo,12.13,8013.767519
3,Oakland,55.79,7003.477326
4,South San Francisco,9.14,6961.925602


In [None]:

from geopy.geocoders import Nominatim

def get_coordinates(location):
  geo_locator = Nominatim()

  try:
    lat_lng_coords = geo_locator.geocode(location)
      
  except AttributeError:
    print(f"not able to found for {location}")
    return (0, 0)

  latitude = lat_lng_coords.latitude
  longitude = lat_lng_coords.longitude
  return (latitude, longitude)

bayarea_data=target_cities

coordinate_list = []
for loc in bayarea_data['Name']:
  lat, lon = get_coordinates(loc)
  coordinate_list.append([lat, lon])

coordinate_pd = pd.DataFrame(coordinate_list, columns=["Latitude", "Longitude"])
bayarea_data = pd.concat([bayarea_data, coordinate_pd], axis=1)




In [None]:
cu = bayarea_data[bayarea_data['Name'] =='Cupertino']

## get venues from Foursquare api



In [None]:
import folium

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    print(names.shape)
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
          results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
          print(f"not able to handle request for {name}")
          #continue
          
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
getNearbyVenues(names=mv['Name'],
                                   latitudes=mv['Latitude'],
                                   longitudes=mv['Longitude']
                                  )

(1,)
Santa Clara


Unnamed: 0,Name,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Santa Clara,37.233325,-121.6746,In-N-Out Burger,37.152432,-121.654779,Fast Food Restaurant
1,Santa Clara,37.233325,-121.6746,Santa Teresa Golf Course,37.219602,-121.777023,Golf Course
2,Santa Clara,37.233325,-121.6746,Coyote Valley Sporting Clays,37.154191,-121.708315,Gun Range
3,Santa Clara,37.233325,-121.6746,Jamba Juice,37.232273,-121.775014,Juice Bar
4,Santa Clara,37.233325,-121.6746,Mod Pizza,37.154615,-121.65079,Pizza Place
5,Santa Clara,37.233325,-121.6746,Massage Envy - Morgan Hill,37.156424,-121.651812,Spa
6,Santa Clara,37.233325,-121.6746,Nick The Greek,37.232303,-121.774626,Greek Restaurant
7,Santa Clara,37.233325,-121.6746,Peet's Coffee & Tea,37.150863,-121.656328,Coffee Shop
8,Santa Clara,37.233325,-121.6746,Five Guys,37.151351,-121.656227,Burger Joint
9,Santa Clara,37.233325,-121.6746,Coyote Creek Trailhead,37.166401,-121.649346,Trail


In [None]:
bayarea_venues = getNearbyVenues(names=bayarea_data['Name'],
                                   latitudes=bayarea_data['Latitude'],
                                   longitudes=bayarea_data['Longitude']
                                  )


(25,)
San Francisco
Berkeley
San Mateo
Oakland
South San Francisco
Alameda
Sunnyvale
San Leandro
Santa Clara
Mountain View
San Jose
Cupertino
Milpitas
Napa
Santa Rosa
Petaluma
Concord
San Ramon
Redwood City
Vallejo
Pittsburg
Antioch
Union City
San Rafael
Brentwood


In [None]:
display(bayarea_venues.shape)

(2258, 7)

In [None]:
# one hot encoding
bayarea_onehot = pd.get_dummies(bayarea_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bayarea_onehot['Name'] = bayarea_venues['Name'] 

# move neighborhood column to the first column
fixed_columns = [bayarea_onehot.columns[-1]] + list(bayarea_onehot.columns[:-1])
bayarea_onehot = bayarea_onehot[fixed_columns]

bayarea_onehot.head()

Unnamed: 0,Name,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport Lounge,Airport Service,American Restaurant,Amphitheater,Animal Shelter,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Fabric Shop,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Pool,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Island,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Marijuana Dispensary,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Monument / Landmark,Motorcycle Shop,Motorsports Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Other Great Outdoors,Outdoor Supply Store,Outdoors & Recreation,Pakistani Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Resort,Rest Area,Restaurant,River,Road,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Stadium,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stables,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Trade School,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,San Francisco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
bayarea_onehot['Name'].unique()

array(['San Francisco', 'Berkeley', 'San Mateo', 'Oakland',
       'South San Francisco', 'Alameda', 'Sunnyvale', 'San Leandro',
       'Santa Clara', 'Mountain View', 'San Jose', 'Cupertino',
       'Milpitas', 'Napa', 'Santa Rosa', 'Petaluma', 'Concord',
       'San Ramon', 'Redwood City', 'Vallejo', 'Pittsburg', 'Antioch',
       'Union City', 'San Rafael', 'Brentwood'], dtype=object)

In [None]:
bayarea_grouped = bayarea_onehot.groupby('Name').mean().reset_index()
bayarea_grouped

Unnamed: 0,Name,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport Lounge,Airport Service,American Restaurant,Amphitheater,Animal Shelter,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Fabric Shop,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Pool,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Island,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Marijuana Dispensary,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Monument / Landmark,Motorcycle Shop,Motorsports Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Other Great Outdoors,Outdoor Supply Store,Outdoors & Recreation,Pakistani Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Resort,Rest Area,Restaurant,River,Road,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Stadium,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stables,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Trade School,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Alameda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.12,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.09,0.0,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
1,Antioch,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0
2,Berkeley,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.01,0.06,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brentwood,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.07,0.0,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.06,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Concord,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0
5,Cupertino,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
6,Milpitas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.05,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.02,0.06,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
7,Mountain View,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.17,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
8,Napa,0.0,0.01,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0
9,Oakland,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.02,0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.03,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0


In [None]:
num_top_venues = 5

for hood in bayarea_grouped['Name']:
    print("----"+hood+"----")
    temp = bayarea_grouped[bayarea_grouped['Name'] == hood].T.reset_index()
    display(temp.head())
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alameda----


Unnamed: 0,index,0
0,Name,Alameda
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


            venue  freq
0     Coffee Shop  0.12
1            Park  0.09
2   Grocery Store  0.05
3           Trail  0.05
4  Ice Cream Shop  0.03


----Antioch----


Unnamed: 0,index,1
0,Name,Antioch
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0.01


                  venue  freq
0           Coffee Shop  0.09
1    Mexican Restaurant  0.07
2  Fast Food Restaurant  0.07
3           Pizza Place  0.04
4        Ice Cream Shop  0.04


----Berkeley----


Unnamed: 0,index,2
0,Name,Berkeley
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


           venue  freq
0    Coffee Shop  0.06
1           Park  0.06
2    Pizza Place  0.06
3  Grocery Store  0.05
4          Trail  0.04


----Brentwood----


Unnamed: 0,index,3
0,Name,Brentwood
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0.01


                venue  freq
0         Coffee Shop  0.10
1       Grocery Store  0.07
2            Pharmacy  0.06
3         Pizza Place  0.05
4  Mexican Restaurant  0.05


----Concord----


Unnamed: 0,index,4
0,Name,Concord
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                 venue  freq
0           Donut Shop  0.07
1  American Restaurant  0.05
2    Convenience Store  0.04
3   Mexican Restaurant  0.04
4       Sandwich Place  0.04


----Cupertino----


Unnamed: 0,index,5
0,Name,Cupertino
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


             venue  freq
0             Park  0.09
1    Grocery Store  0.08
2   Sandwich Place  0.06
3      Coffee Shop  0.05
4  Bubble Tea Shop  0.04


----Milpitas----


Unnamed: 0,index,6
0,Name,Milpitas
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                venue  freq
0  Mexican Restaurant  0.06
1         Pizza Place  0.06
2               Trail  0.05
3       Grocery Store  0.05
4              Bakery  0.05


----Mountain View----


Unnamed: 0,index,7
0,Name,Mountain View
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


               venue  freq
0               Park  0.17
1      Grocery Store  0.07
2     Sandwich Place  0.04
3  Indian Restaurant  0.04
4     Farmers Market  0.04


----Napa----


Unnamed: 0,index,8
0,Name,Napa
1,ATM,0
2,Accessories Store,0.01
3,Adult Boutique,0
4,Afghan Restaurant,0


                 venue  freq
0                Hotel  0.09
1  American Restaurant  0.07
2   Italian Restaurant  0.05
3        Grocery Store  0.05
4          Coffee Shop  0.05


----Oakland----


Unnamed: 0,index,9
0,Name,Oakland
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


            venue  freq
0     Coffee Shop  0.07
1         Brewery  0.04
2     Beer Garden  0.04
3             Bar  0.04
4  Ice Cream Shop  0.03


----Petaluma----


Unnamed: 0,index,10
0,Name,Petaluma
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


            venue  freq
0     Coffee Shop  0.07
1     Pizza Place  0.06
2            Park  0.06
3   Grocery Store  0.06
4  Ice Cream Shop  0.04


----Pittsburg----


Unnamed: 0,index,11
0,Name,Pittsburg
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                 venue  freq
0       Ice Cream Shop  0.07
1  American Restaurant  0.06
2                  Bar  0.04
3           Taco Place  0.04
4                 Park  0.04


----Redwood City----


Unnamed: 0,index,12
0,Name,Redwood City
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0.01


            venue  freq
0  Sandwich Place  0.06
1     Coffee Shop  0.05
2            Park  0.05
3   Grocery Store  0.05
4             Gym  0.04


----San Francisco----


Unnamed: 0,index,13
0,Name,San Francisco
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0.01
4,Afghan Restaurant,0


           venue  freq
0    Coffee Shop  0.08
1           Park  0.08
2         Bakery  0.06
3    Yoga Studio  0.05
4  Grocery Store  0.04


----San Jose----


Unnamed: 0,index,14
0,Name,San Jose
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                venue  freq
0  Mexican Restaurant  0.09
1      Sandwich Place  0.06
2         Pizza Place  0.03
3                 Bar  0.03
4         Coffee Shop  0.03


----San Leandro----


Unnamed: 0,index,15
0,Name,San Leandro
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                venue  freq
0  Mexican Restaurant  0.05
1        Burger Joint  0.05
2                Park  0.04
3      Ice Cream Shop  0.04
4       Deli / Bodega  0.03


----San Mateo----


Unnamed: 0,index,16
0,Name,San Mateo
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0.01


                 venue  freq
0                 Park  0.08
1        Grocery Store  0.07
2  Japanese Restaurant  0.06
3       Sandwich Place  0.05
4                Trail  0.04


----San Rafael----


Unnamed: 0,index,17
0,Name,San Rafael
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                 venue  freq
0   Mexican Restaurant  0.10
1        Grocery Store  0.08
2          Pizza Place  0.04
3  American Restaurant  0.04
4       Sandwich Place  0.04


----San Ramon----


Unnamed: 0,index,18
0,Name,San Ramon
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                      venue  freq
0                Restaurant  0.30
1                      Food  0.10
2  Mediterranean Restaurant  0.10
3             Historic Site  0.10
4        Spanish Restaurant  0.05


----Santa Clara----


Unnamed: 0,index,19
0,Name,Santa Clara
1,ATM,0.0172414
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


         venue  freq
0  Golf Course  0.10
1  Coffee Shop  0.09
2        Trail  0.07
3  Pizza Place  0.05
4         Park  0.05


----Santa Rosa----


Unnamed: 0,index,20
0,Name,Santa Rosa
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                   venue  freq
0                   Lake  0.22
1               Mountain  0.22
2             Hotel Pool  0.11
3       Department Store  0.11
4  Outdoors & Recreation  0.11


----South San Francisco----


Unnamed: 0,index,21
0,Name,South San Francisco
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                 venue  freq
0   Mexican Restaurant  0.08
1                 Park  0.06
2  Filipino Restaurant  0.05
3        Grocery Store  0.05
4       Sandwich Place  0.03


----Sunnyvale----


Unnamed: 0,index,22
0,Name,Sunnyvale
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


            venue  freq
0            Park  0.11
1   Grocery Store  0.07
2  Sandwich Place  0.06
3     Pizza Place  0.05
4     Coffee Shop  0.05


----Union City----


Unnamed: 0,index,23
0,Name,Union City
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                venue  freq
0  Mexican Restaurant  0.08
1                Park  0.07
2  Chinese Restaurant  0.06
3              Bakery  0.06
4         Coffee Shop  0.05


----Vallejo----


Unnamed: 0,index,24
0,Name,Vallejo
1,ATM,0
2,Accessories Store,0
3,Adult Boutique,0
4,Afghan Restaurant,0


                venue  freq
0         Coffee Shop  0.07
1  Mexican Restaurant  0.05
2      Breakfast Spot  0.05
3      Ice Cream Shop  0.04
4                Park  0.04




In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Name'] = bayarea_grouped['Name']

for ind in np.arange(bayarea_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bayarea_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alameda,Coffee Shop,Park,Grocery Store,Trail,Ice Cream Shop,Cosmetics Shop,Golf Course,Italian Restaurant,Pizza Place,Sushi Restaurant
1,Antioch,Coffee Shop,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Ice Cream Shop,Lingerie Store,Sandwich Place,Burger Joint,Chinese Restaurant,American Restaurant
2,Berkeley,Pizza Place,Park,Coffee Shop,Grocery Store,Brewery,Bakery,Trail,Scenic Lookout,New American Restaurant,Japanese Restaurant
3,Brentwood,Coffee Shop,Grocery Store,Pharmacy,Pizza Place,Mexican Restaurant,Park,Sandwich Place,Salon / Barbershop,Ice Cream Shop,Fast Food Restaurant
4,Concord,Donut Shop,American Restaurant,Pizza Place,Sandwich Place,Convenience Store,Café,Mexican Restaurant,Bar,Seafood Restaurant,Farm
5,Cupertino,Park,Grocery Store,Sandwich Place,Coffee Shop,Bakery,Bubble Tea Shop,Pizza Place,Japanese Restaurant,Supermarket,Korean Restaurant
6,Milpitas,Pizza Place,Mexican Restaurant,Bakery,Grocery Store,Trail,Sandwich Place,Gym,Fast Food Restaurant,Park,Burger Joint
7,Mountain View,Park,Grocery Store,Sandwich Place,Indian Restaurant,Farmers Market,Trail,Mexican Restaurant,Fast Food Restaurant,Playground,Pizza Place
8,Napa,Hotel,American Restaurant,Grocery Store,Italian Restaurant,Coffee Shop,Mexican Restaurant,Sushi Restaurant,Restaurant,French Restaurant,Bakery
9,Oakland,Coffee Shop,Brewery,Beer Garden,Bar,Mexican Restaurant,Bakery,Café,Music Venue,Ice Cream Shop,Mediterranean Restaurant


In [None]:
# set number of clusters
kclusters = 5

bayarea_grouped_clustering = bayarea_grouped.drop('Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bayarea_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

bayarea_grouped_clustering.shape

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
bayarea_grouped_clustering.head()
# add clustering labels
neighborhoods_venues_sorted.set_index('Name').head()

bayarea_merged = bayarea_data

# merge bayarea_grouped with bayarea_data to add latitude/longitude for each neighborhood
bayarea_merged = bayarea_merged.join(neighborhoods_venues_sorted.set_index('Name'), on='Name')

# not all populated -Hui
bayarea_merged = bayarea_merged.dropna()

display(bayarea_merged.head()) # check the last columns!

Unnamed: 0,Name,sq mi,population_per_sq_mi,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,San Francisco,46.87,17180.179219,37.779026,-122.419906,2,Coffee Shop,Park,Bakery,Yoga Studio,Grocery Store,Pizza Place,Ice Cream Shop,Art Museum,Brewery,Boutique
1,Berkeley,10.47,10752.626552,37.870839,-122.272864,2,Pizza Place,Park,Coffee Shop,Grocery Store,Brewery,Bakery,Trail,Scenic Lookout,New American Restaurant,Japanese Restaurant
2,San Mateo,12.13,8013.767519,37.496904,-122.333057,2,Park,Grocery Store,Japanese Restaurant,Sandwich Place,Trail,Gym,Brewery,Playground,Burger Joint,Dessert Shop
3,Oakland,55.79,7003.477326,37.804456,-122.271356,0,Coffee Shop,Brewery,Beer Garden,Bar,Mexican Restaurant,Bakery,Café,Music Venue,Ice Cream Shop,Mediterranean Restaurant
4,South San Francisco,9.14,6961.925602,37.654949,-122.408125,1,Mexican Restaurant,Park,Filipino Restaurant,Grocery Store,Sandwich Place,Bakery,Fast Food Restaurant,Vietnamese Restaurant,Sushi Restaurant,Deli / Bodega


In [None]:
clusters = ['Restaurant', 'Restaurant', "park1", "historical site", 'lake']
for i in range(5):
  print(f"cluster label: {clusters[i]} ")
  display(bayarea_merged.loc[bayarea_merged['Cluster Labels'] == i, bayarea_merged.columns[[0] + list(range(5, bayarea_merged.shape[1]))]])
  

cluster label: Restaurant 


Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Oakland,0,Coffee Shop,Brewery,Beer Garden,Bar,Mexican Restaurant,Bakery,Café,Music Venue,Ice Cream Shop,Mediterranean Restaurant
8,Santa Clara,0,Golf Course,Coffee Shop,Trail,Pizza Place,Park,Mexican Restaurant,Gun Range,Café,American Restaurant,Lake
13,Napa,0,Hotel,American Restaurant,Grocery Store,Italian Restaurant,Coffee Shop,Mexican Restaurant,Sushi Restaurant,Restaurant,French Restaurant,Bakery
16,Concord,0,Donut Shop,American Restaurant,Pizza Place,Sandwich Place,Convenience Store,Café,Mexican Restaurant,Bar,Seafood Restaurant,Farm
20,Pittsburg,0,Ice Cream Shop,American Restaurant,Taco Place,Bar,Park,Hotel,Scenic Lookout,Bakery,Italian Restaurant,Coffee Shop


cluster label: Restaurant 


Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,South San Francisco,1,Mexican Restaurant,Park,Filipino Restaurant,Grocery Store,Sandwich Place,Bakery,Fast Food Restaurant,Vietnamese Restaurant,Sushi Restaurant,Deli / Bodega
7,San Leandro,1,Burger Joint,Mexican Restaurant,Park,Ice Cream Shop,Vietnamese Restaurant,Breakfast Spot,Pizza Place,Trail,Sushi Restaurant,Deli / Bodega
10,San Jose,1,Mexican Restaurant,Sandwich Place,Grocery Store,Pizza Place,Bar,Coffee Shop,Pub,Cocktail Bar,Sushi Restaurant,Spa
12,Milpitas,1,Pizza Place,Mexican Restaurant,Bakery,Grocery Store,Trail,Sandwich Place,Gym,Fast Food Restaurant,Park,Burger Joint
15,Petaluma,1,Coffee Shop,Pizza Place,Park,Grocery Store,Ice Cream Shop,Mexican Restaurant,Brewery,Sandwich Place,Vineyard,Burger Joint
19,Vallejo,1,Coffee Shop,Mexican Restaurant,Breakfast Spot,Grocery Store,Theme Park Ride / Attraction,Ice Cream Shop,Fast Food Restaurant,Park,Burger Joint,Thai Restaurant
21,Antioch,1,Coffee Shop,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Ice Cream Shop,Lingerie Store,Sandwich Place,Burger Joint,Chinese Restaurant,American Restaurant
22,Union City,1,Mexican Restaurant,Park,Chinese Restaurant,Bakery,Coffee Shop,Vietnamese Restaurant,Breakfast Spot,Pizza Place,Sushi Restaurant,Trail
23,San Rafael,1,Mexican Restaurant,Grocery Store,Pizza Place,Sandwich Place,American Restaurant,Park,Bakery,Café,Restaurant,Coffee Shop
24,Brentwood,1,Coffee Shop,Grocery Store,Pharmacy,Pizza Place,Mexican Restaurant,Park,Sandwich Place,Salon / Barbershop,Ice Cream Shop,Fast Food Restaurant


cluster label: park1 


Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,San Francisco,2,Coffee Shop,Park,Bakery,Yoga Studio,Grocery Store,Pizza Place,Ice Cream Shop,Art Museum,Brewery,Boutique
1,Berkeley,2,Pizza Place,Park,Coffee Shop,Grocery Store,Brewery,Bakery,Trail,Scenic Lookout,New American Restaurant,Japanese Restaurant
2,San Mateo,2,Park,Grocery Store,Japanese Restaurant,Sandwich Place,Trail,Gym,Brewery,Playground,Burger Joint,Dessert Shop
5,Alameda,2,Coffee Shop,Park,Grocery Store,Trail,Ice Cream Shop,Cosmetics Shop,Golf Course,Italian Restaurant,Pizza Place,Sushi Restaurant
6,Sunnyvale,2,Park,Grocery Store,Sandwich Place,Pizza Place,Coffee Shop,Fast Food Restaurant,Bubble Tea Shop,Mexican Restaurant,Supermarket,Indian Restaurant
9,Mountain View,2,Park,Grocery Store,Sandwich Place,Indian Restaurant,Farmers Market,Trail,Mexican Restaurant,Fast Food Restaurant,Playground,Pizza Place
11,Cupertino,2,Park,Grocery Store,Sandwich Place,Coffee Shop,Bakery,Bubble Tea Shop,Pizza Place,Japanese Restaurant,Supermarket,Korean Restaurant
18,Redwood City,2,Sandwich Place,Grocery Store,Park,Coffee Shop,Gym,Brewery,Playground,Japanese Restaurant,Gastropub,Caribbean Restaurant


cluster label: historical site 


Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,San Ramon,3,Restaurant,Historic Site,Mediterranean Restaurant,Food,Italian Restaurant,Theater,Spanish Restaurant,Food & Drink Shop,Buffet,Trail


cluster label: lake 


Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Santa Rosa,4,Lake,Mountain,Restaurant,Hotel Pool,Outdoors & Recreation,Department Store,Fried Chicken Joint,Fabric Shop,Electronics Store,Ethiopian Restaurant


In [None]:
neighorhood = bayarea_merged[['Name', 'Cluster Labels']]


In [None]:
target_with_neighborhood = pd.merge(target_include_crime, neighorhood, on="Name")

In [None]:
target_with_neighborhood.head()

Unnamed: 0,Name,sq mi,population_per_sq_mi,MedPrice,Crime Rate,Cluster Labels
0,San Francisco,46.87,17180.179219,1337250.0,7.95,2
1,Berkeley,10.47,10752.626552,1258050.0,3.66,2
2,San Mateo,12.13,8013.767519,1437700.0,2.25,2
3,Oakland,55.79,7003.477326,739217.0,16.85,0
4,South San Francisco,9.14,6961.925602,1014283.0,2.34,1


In [None]:
# set number of clusters
kclusters = 5

target_with_neighborhood_clustering = target_with_neighborhood.drop('Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(target_with_neighborhood_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

target_with_neighborhood_clustering.shape

# add clustering labels
target_with_neighborhood.insert(0, 'Labels', kmeans.labels_)
target_with_neighborhood.head()
# add clustering labels
target_with_neighborhood.set_index('Name').head()

target_with_neighborhood.head()
# bayarea_merged = bayarea_data

# # merge target_with_neighborhood with bayarea_data to add latitude/longitude for each neighborhood
# bayarea_merged = bayarea_merged.join(neighborhoods_venues_sorted.set_index('Name'), on='Name')

# # not all populated -Hui
# bayarea_merged = bayarea_merged.dropna()

# display(bayarea_merged.head()) # check the last columns!

Unnamed: 0,Labels,Name,sq mi,population_per_sq_mi,MedPrice,Crime Rate,Cluster Labels
0,3,San Francisco,46.87,17180.179219,1337250.0,7.95,2
1,3,Berkeley,10.47,10752.626552,1258050.0,3.66,2
2,3,San Mateo,12.13,8013.767519,1437700.0,2.25,2
3,2,Oakland,55.79,7003.477326,739217.0,16.85,0
4,0,South San Francisco,9.14,6961.925602,1014283.0,2.34,1


In [None]:
target_with_neighborhood.sort_values(by="Labels")

Unnamed: 0,Labels,Name,sq mi,population_per_sq_mi,MedPrice,Crime Rate,Cluster Labels
12,0,Milpitas,13.59,4914.64312,1094092.0,1.59,1
22,0,Union City,19.47,3570.416025,903742.0,2.83,1
4,0,South San Francisco,9.14,6961.925602,1014283.0,2.34,1
5,0,Alameda,10.61,6956.833176,977150.0,1.88,2
10,0,San Jose,176.53,5358.53396,1067425.0,3.21,1
17,0,San Ramon,18.06,3994.905869,1040308.0,0.31,3
23,0,San Rafael,16.47,3504.128719,1008283.0,3.26,1
6,1,Sunnyvale,21.99,6370.213734,1857625.0,1.12,2
18,1,Redwood City,19.42,3955.45829,1636433.0,2.37,2
9,1,Mountain View,12.0,6172.166667,1875867.0,1.98,2


#At last, display final results on map

In [None]:
display_data = pd.merge(bayarea_data, target_with_neighborhood, on="Name") 
display_data.head()

Unnamed: 0,Name,sq mi_x,population_per_sq_mi_x,Latitude,Longitude,Labels,sq mi_y,population_per_sq_mi_y,MedPrice,Crime Rate,Cluster Labels
0,San Francisco,46.87,17180.179219,37.779026,-122.419906,3,46.87,17180.179219,1337250.0,7.95,2
1,Berkeley,10.47,10752.626552,37.870839,-122.272864,3,10.47,10752.626552,1258050.0,3.66,2
2,San Mateo,12.13,8013.767519,37.496904,-122.333057,3,12.13,8013.767519,1437700.0,2.25,2
3,Oakland,55.79,7003.477326,37.804456,-122.271356,2,55.79,7003.477326,739217.0,16.85,0
4,South San Francisco,9.14,6961.925602,37.654949,-122.408125,0,9.14,6961.925602,1014283.0,2.34,1


In [None]:
neighborhoods = display_data
# create map of SF bay area using latitude and longitude values
latitude = neighborhoods["Latitude"][2]
longitude =  neighborhoods["Longitude"][2]
map_bayarea = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city, label_code in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Name'], neighborhoods['Labels']):
    label = '{}, Label: {}'.format(city, label_code)
    label = folium.Popup(label, parse_html=True)
    colors = ['purple',  'green', 'red', 'orange', 'blue']
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=colors[label_code],
        fill=True,
        fill_color= colors[label_code],
        fill_opacity=0.7,
        parse_html=False).add_to(map_bayarea)  
    
map_bayarea