### Description of the problem and discussion of the background: 
I have received a job offer in San Francisco, CA. This city is a central hub for lots of start-up companies. A few of my friends from college are already living there. And they have assured me about good public transportation of this city (Muni Train & BART train – can be used to commute to go anywhere in the city). 

However, for me personally, I like to stay away from crowded city life – would rather like to live a neighborhood that is peaceful and quiet after work hours. I am also a photographer and like to go on long hike during the weekends. 

For this exercise – I would explore the boroughs and neighborhoods in San Francisco city and pick one neighborhood that is close to beach or has hiking trails or photographic scenarios – along with a bus stop/train stop for commuting to work. 

## Description of Data – How it will be used to solve problems

### I am using two sets of information here: 
1.First set of data is been download from San Francisco, CA – city government website in csv format. This data set has all the Boroughs (as Plan District) and Neighborhoods of the city listed along with other information. After downloading this data set, I have imported it into my work using Panda and converted into a data frame, kept the columns I need to get an idea of the names of the boroughs and their associated neighborhoods in/around San Francisco, CA.

Here’s a link to the source of the data set: https://catalog.data.gov/dataset?organization=city-of-san-francisco&tags=housing


2.I am using the names of the neighborhoods/boroughs from this data set to call the Foursquare API’s, to get location info and associated venues around these neighborhoods. My goal is to find a neighborhood that has public transportation (bus/Muni train), hiking trails, close to the beach and has tons of photographic views. 

(I am only submitting the codes that meets my criteria of choosing a location to live. Anyone can use these codes, change the name of boroughs/neighborhoods and find a location that they are interested in – for example: your priorities could be coffee shops and tons of restaurants etc.)


### Importing a csv file, downloaded from San Francisco housing department

In [327]:
#defining the path of csv file
path = 'C:\\Users\\Rakib\\Desktop\\Week 4\\IBM\\SF_Development_Pipeline_2017_Q3.csv'

#importing necessary package
import pandas as pd

#reading the data
neighborhoods = pd.read_csv(path)

#printing first few rows
neighborhoods.head(5)

Unnamed: 0,NEIGHBORHOOD,PLAN_DISTRICT,Latitude,Longitude
0,Bayview,South Bayshore,37.719383,-122.384934
1,Bernal Heights,Bernal Heights,37.737316,-122.409401
2,Bernal Heights,Central,37.743984,-122.424019
3,Bernal Heights,South Central,37.7337,-122.428124
4,Castro/Upper Market,Buena Vista,37.767509,-122.429436


In [328]:
#renaming columns
neighborhoods.columns = ['Neighborhood', 'Borough', 'Latitude', 'Longitude']

In [329]:
#rearranging column orders
neighborhoods = neighborhoods[['Borough', 'Neighborhood', 'Latitude', 'Longitude']]

#printing first 10 rows
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,South Bayshore,Bayview,37.719383,-122.384934
1,Bernal Heights,Bernal Heights,37.737316,-122.409401
2,Central,Bernal Heights,37.743984,-122.424019
3,South Central,Bernal Heights,37.7337,-122.428124
4,Buena Vista,Castro/Upper Market,37.767509,-122.429436
5,Central,Castro/Upper Market,37.764538,-122.431213
6,Northeast,Chinatown,37.791782,-122.408913
7,South Central,Crocker Amazon,37.71566,-122.440109
8,Downtown,Downtown/Civic Center,37.777302,-122.418297
9,Buena Vista,Downtown/Civic Center,37.775215,-122.419609


In [333]:
#exploring the data
neighborhoods.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 4 columns):
Borough         56 non-null object
Neighborhood    56 non-null object
Latitude        56 non-null float64
Longitude       56 non-null float64
dtypes: float64(2), object(2)
memory usage: 1.8+ KB


In [334]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 15 boroughs and 56 neighborhoods.


In [335]:
#exploring count of values
neighborhoods['Borough'].value_counts()

 Central             6
 South Central       6
 Richmond            6
 Northeast           5
 Inner Sunset        5
 Ingleside           4
 Buena Vista         4
 Western Addition    4
 South of Market     3
 Downtown            3
 South Bayshore      3
 Mission             2
 Outer Sunset        2
 Marina              2
 Bernal Heights      1
Name: Borough, dtype: int64

## This dataframe has 15 boroughs and 56 Neigborhoods

### Seems like three boroughs 'Central' , 'Sount Central' & 'Richmond' has 6 neighborhoods for each -- 

#### For this exercise, I have went through each one of those Boroughs -- however I am submitting codes for Richmond only. If you have different priorities to find a location to live, you can use the same code by changing the name of the borough/neighborhood.

In [336]:
#subsetting Richmond borough data for further exploration
df_RM = neighborhoods.loc[neighborhoods["Borough"] == " Richmond"].reset_index(drop=True)
df_RM.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Richmond,Inner Richmond,37.781105,-122.477593
1,Richmond,Outer Richmond,37.772121,-122.497284
2,Richmond,Pacific Heights,37.790627,-122.447212
3,Richmond,Presidio Heights,37.786533,-122.457779
4,Richmond,Seacliff,37.787296,-122.492645
5,Richmond,Western Addition,37.78508,-122.445526


### Borough - Richmond has 6 neighborhoods with their Latitude and Longitude from Housing data set

### However, I will also use foursquare API's to get latitude and longitude in this exercise later down the road. 

### import necessary packages to create a map

In [337]:
from geopy.geocoders import Nominatim 
import folium 
import requests 
import random 

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

print('Libraries imported.')

Libraries imported.


In [339]:
# creating a map of Richmond using latitude and longitude from Housing data
map_RM = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_RM['Latitude'], df_RM['Longitude'], df_RM['Neighborhood']):
    label = folium.Popup(label)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_RM)  
    
map_RM

## Let's define my Foursquare Credentials 

### Once the exercise is done -- before final submissing I will DELETE these credentials for privacy purpose. 

In [380]:
CLIENT_ID = '(HIDDEN)' # my Foursquare ID -- hidden for privacy
CLIENT_SECRET = '(HIDDEN)' # my Foursquare Secret -- hidden for privacy
VERSION = 'HIDDEN'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: (HIDDEN)
CLIENT_SECRET:(HIDDEN)


In [341]:
# Getting Latitude and Longitude from Four Square
address = 'Richmond, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Richmond are {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinate of Richmond are 49.1632727, -123.1376687.


### As we know their are 6 neighborhoods in Richmond Borough -- I am going to explore each one of them

In [342]:
# Let's locate the first neighborhood in Richmond Borough. 
df_RM.loc[0, 'Neighborhood']

'Inner Richmond'

In [343]:
# Get Latitude and Longitude of this Neighborhood from housing data

neighborhood_latitude = df_RM.loc[0, 'Latitude'] 
neighborhood_longitude = df_RM.loc[0, 'Longitude']

neighborhood_name = df_RM.loc[0, 'Neighborhood']

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Inner Richmond are 37.78110504, -122.4775925.


## Let's create a Foursquare API call to explore venues in this neighborhood from Richmond Borough

In [344]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL 

'https://api.foursquare.com/v2/venues/explore?&client_id=D1FYXKYWTHMILHV5E0XTBUIYOLPIPGNU0OZPXIOJPES5ZMY3&client_secret=JE2V2R3YRI4SL3LLPCP2GERQPAPM41Q0II0NPDDRVS2EI2EB&v=20180604&ll=37.78110504,-122.4775925&radius=500&limit=100'

In [345]:
# Get the results in json format
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c1e9bac4434b91ea4655b08'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5a5adf30c47cf954ca358fd4-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/mediterranean_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1c0941735',
         'name': 'Mediterranean Restaurant',
         'pluralName': 'Mediterranean Restaurants',
         'primary': True,
         'shortName': 'Mediterranean'}],
       'id': '5a5adf30c47cf954ca358fd4',
       'location': {'address': '1801 Clement St',
        'cc': 'US',
        'city': 'San Francisco',
        'country': 'United States',
        'crossStreet': '19th Avenue',
        'distance': 154,
        'formattedAddress': ['1801 Clement St (19th Avenue)',
         'San Francisco, CA 9412

In [346]:
# Create a function that extracts the category of all venues from this Json file
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [347]:
#create a data frame that puts all the venues with latitude and longitude imported from Foursquare
# API call

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Lokma,Mediterranean Restaurant,37.782246,-122.4786
1,Moscow & Tbilisi Russian Bakery,Bakery,37.780432,-122.479049
2,Joe's Ice Cream,Ice Cream Shop,37.780492,-122.477739
3,Hong Kong Lounge 穗香酒家,Dim Sum Restaurant,37.780558,-122.476644
4,Argonne Playground,Playground,37.779445,-122.477779


In [348]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

76 venues were returned by Foursquare.


# Summary So Far

## I have used Foursquare API to find venues around one selected neighborhood and placed them into a panda data frame

### Next stop is import ALL neighborhoods and venues around them for Richmond Borough. 

### Let's explore neighborhoods in Richmond using foursquare API call

In [353]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [354]:
# type your answer here
RM_venues = getNearbyVenues(names=df_RM['Neighborhood'],
                                   latitudes=df_RM['Latitude'],
                                   longitudes=df_RM['Longitude']
                                  )

Inner Richmond
Outer Richmond
Pacific Heights
Presidio Heights
Seacliff
Western Addition


In [355]:
print(RM_venues.shape)
RM_venues.head()

(249, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Inner Richmond,37.781105,-122.477593,Lokma,37.782246,-122.4786,Mediterranean Restaurant
1,Inner Richmond,37.781105,-122.477593,Moscow & Tbilisi Russian Bakery,37.780432,-122.479049,Bakery
2,Inner Richmond,37.781105,-122.477593,Joe's Ice Cream,37.780492,-122.477739,Ice Cream Shop
3,Inner Richmond,37.781105,-122.477593,Hong Kong Lounge 穗香酒家,37.780558,-122.476644,Dim Sum Restaurant
4,Inner Richmond,37.781105,-122.477593,Argonne Playground,37.779445,-122.477779,Playground


In [356]:
RM_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Inner Richmond,76,76,76,76,76,76
Outer Richmond,29,29,29,29,29,29
Pacific Heights,37,37,37,37,37,37
Presidio Heights,28,28,28,28,28,28
Seacliff,6,6,6,6,6,6
Western Addition,73,73,73,73,73,73


## I have imported all the venues in Richmond borough, arrange them by neighborhood and put them into a panda data frame. 

In [357]:
print('There are {} uniques categories.'.format(len(RM_venues['Venue Category'].unique())))

There are 101 uniques categories.


In [358]:
# one hot encoding
RM_onehot = pd.get_dummies(RM_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
RM_onehot['Neighborhood'] = RM_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [RM_onehot.columns[-1]] + list(RM_onehot.columns[:-1])
RM_onehot = RM_onehot[fixed_columns]

RM_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Beach,...,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Trail,Tunnel,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,Inner Richmond,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Inner Richmond,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Inner Richmond,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Inner Richmond,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Inner Richmond,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [308]:
RM_onehot.shape

(249, 102)

## In above data frame we can see all different kind of venues available in/around 6 different neighborhoods in Richmond borough. 

### Goal here is to find a neighborhood that meets my criteria of - Bus Stop/Muni/Public Transportation, Hiking Trail, Next to beach or has some photography opportunity around. 

In [359]:
RM_grouped = RM_onehot.groupby('Neighborhood').mean().reset_index()
RM_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Beach,...,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Trail,Tunnel,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,Inner Richmond,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.013158,0.026316,0.0,...,0.052632,0.013158,0.026316,0.0,0.0,0.052632,0.0,0.013158,0.0,0.0
1,Outer Richmond,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,...,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.034483
2,Pacific Heights,0.0,0.027027,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,...,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0
3,Presidio Heights,0.035714,0.107143,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,...,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0
4,Seacliff,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0
5,Western Addition,0.0,0.041096,0.0,0.0,0.013699,0.0,0.0,0.013699,0.0,...,0.013699,0.0,0.0,0.0,0.013699,0.0,0.013699,0.0,0.027397,0.0


In [360]:
RM_grouped.shape

(6, 102)

In [311]:
num_top_venues = 5

for hood in RM_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = RM_grouped[RM_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Inner Richmond----
                venue  freq
0       Grocery Store  0.08
1                Café  0.07
2  Chinese Restaurant  0.07
3    Sushi Restaurant  0.05
4   Korean Restaurant  0.05


----Outer Richmond----
                 venue  freq
0                 Café  0.14
1   Chinese Restaurant  0.10
2  Sporting Goods Shop  0.07
3               Bakery  0.07
4  Japanese Restaurant  0.07


----Pacific Heights----
                    venue  freq
0  Furniture / Home Store  0.08
1                    Park  0.08
2           Deli / Bodega  0.05
3                   Trail  0.05
4                     Spa  0.05


----Presidio Heights----
                 venue  freq
0       Cosmetics Shop  0.14
1  American Restaurant  0.11
2          Golf Course  0.07
3               Bakery  0.07
4          Coffee Shop  0.07


----Seacliff----
            venue  freq
0  Scenic Lookout  0.33
1           Trail  0.17
2           Beach  0.17
3        Bus Stop  0.17
4     Golf Course  0.17


----Western Addition----
 

# Bingo! I am interested in Seacliff neighborhood


### please note that -- I have reached this level of conclusion after exploring multiple boroughs -- for capston submission purposes I am only submitting the codes that helped me to reach my goals. 

In [364]:
# Let's sort the data in the order of available venues

# import necessary package
import numpy as np

#define a function
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#list top 10 venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [367]:
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = RM_grouped['Neighborhood']

# iterate over and placed them in the data frame
for ind in np.arange(RM_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(RM_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Inner Richmond,Grocery Store,Café,Chinese Restaurant,Korean Restaurant,Sushi Restaurant,Vietnamese Restaurant,Pizza Place,Thai Restaurant,Playground,Dim Sum Restaurant
1,Outer Richmond,Café,Chinese Restaurant,Bakery,Japanese Restaurant,Sporting Goods Shop,Record Shop,Ramen Restaurant,Playground,Park,New American Restaurant
2,Pacific Heights,Furniture / Home Store,Park,Gym,Scenic Lookout,Trail,Deli / Bodega,Spa,Clothing Store,Paper / Office Supplies Store,Outdoor Sculpture
3,Presidio Heights,Cosmetics Shop,American Restaurant,Golf Course,Coffee Shop,Bakery,Accessories Store,Breakfast Spot,French Restaurant,Italian Restaurant,Miscellaneous Shop
4,Seacliff,Scenic Lookout,Bus Stop,Trail,Golf Course,Beach,Zoo,Event Space,Cosmetics Shop,Cultural Center,Deli / Bodega
5,Western Addition,Café,Cosmetics Shop,American Restaurant,Coffee Shop,Gym,Spa,Furniture / Home Store,Deli / Bodega,Liquor Store,Sandwich Place


## Please note above - Seacliff has Bus Stop, Hiking Trail, Gold Course, Beach and Zoo! -- Sounds like a dream place for me to live.

### Let's cluster these 6 neighborhoods (Even though I think I found the neighborhood that I am most interested in -- just continuing the work for capstone purpose)

In [375]:
#importing necessary packages
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

import matplotlib.pyplot as plt
%matplotlib inline

import matplotlib.cm as cm

In [376]:
# set number of clusters
kclusters = 3

RM_grouped_clustering = RM_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(RM_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 2, 1, 2])

In [377]:
RM_merged = df_RM

# add clustering labels
RM_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
RM_merged = RM_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

RM_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Richmond,Inner Richmond,37.781105,-122.477593,0,Grocery Store,Café,Chinese Restaurant,Korean Restaurant,Sushi Restaurant,Vietnamese Restaurant,Pizza Place,Thai Restaurant,Playground,Dim Sum Restaurant
1,Richmond,Outer Richmond,37.772121,-122.497284,0,Café,Chinese Restaurant,Bakery,Japanese Restaurant,Sporting Goods Shop,Record Shop,Ramen Restaurant,Playground,Park,New American Restaurant
2,Richmond,Pacific Heights,37.790627,-122.447212,2,Furniture / Home Store,Park,Gym,Scenic Lookout,Trail,Deli / Bodega,Spa,Clothing Store,Paper / Office Supplies Store,Outdoor Sculpture
3,Richmond,Presidio Heights,37.786533,-122.457779,2,Cosmetics Shop,American Restaurant,Golf Course,Coffee Shop,Bakery,Accessories Store,Breakfast Spot,French Restaurant,Italian Restaurant,Miscellaneous Shop
4,Richmond,Seacliff,37.787296,-122.492645,1,Scenic Lookout,Bus Stop,Trail,Golf Course,Beach,Zoo,Event Space,Cosmetics Shop,Cultural Center,Deli / Bodega


In [378]:
k_means_labels = kmeans.labels_
k_means_labels

array([0, 0, 2, 2, 1, 2])

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
colors = plt.cm.Spectral(np.linspace(0, 1, len(set(k_means_labels))))
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(RM_merged['Latitude'], RM_merged['Longitude'], RM_merged['Neighborhood'], RM_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Thank you for reviewing my assignment. 

In short my goal was to find a neighborhood in San Francisco City that has bus stop/public transportation with ocean view, hiking trail, outside of city crowd and tons of photography opportunities. 

Started with a data set from SF housing goverment website -- to identify the names of boroughs and neighborhood in this city. 

Followed by Foursquare API -- to find the venues around in different neighborhoods. 

Finally picked one neighborhood for my future home. 

# Thank you