### Problem Statement:
When a young family moves to a new area, a number of factors are looked into to choose the right neighborhood/town for the family to move to.
This project assesses the areas in San Francisco city where the family is moving to and tries to identify the neighborhood that meets the family priorities, which are-
a. Housing cost and availability
b.Crime rate
c.Schools in the vicinity
d. Venues such as parks, restaurants

## Data Sources:
Crime data – The crime data is taken from police incidents reports from 2018 and 2019.https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783 
Crimes grouped by neighborhood are charted and the neighborhoods with lowest crime rates are chosen.
Housing data – The current houses for sale are obtained from http://www.redfin.com. The data included is limited to 2 bedroom houses.
Public school listing – The list of public schools with their locations are available at https://data.sfgov.org/Economy-and-Community/Schools/tpp3-epx2 . Only elementary schools are included in the analysis. 
Low crime neighborhoods are assessed further for available houses and elementary schools in the vicinity of the housing.
Venue details and distribution across neighborhoods is sourced from Foursquare API.
A combination of all the above data is used to narrow and select neighborhood that meets the family’s priorities.

#import all libraries required for the analysis

In [47]:
import pandas as pd
import numpy as np
import requests

import json 
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.pyplot as plt # for plotting 
import matplotlib.cm as cm
import matplotlib.colors as colors
import seaborn as sns

from sklearn.cluster import KMeans #for clustering

In [48]:
!conda install -c conda-forge folium=0.5.0 --yes #for map visuals
import folium #

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



#get San Francisco (SF) coordinates

In [49]:
address2 = 'San Francisco, USA'

geolocator = Nominatim(user_agent="sf")
location2 = geolocator.geocode(address2)

latitude2 = location2.latitude
longitude2 = location2.longitude

print('The geograpical coordinate of San Francisco, CA are {}, {}.'.format(latitude2, longitude2))

The geograpical coordinate of San Francisco, CA are 37.7790262, -122.4199061.


#gather housing, neighborhood names and schools data and transform to dataframes

In [50]:

#get SFO house data
sfo_homes = pd.read_csv('/Users/Bhaven/Documents/Personal/Shylaja/Coursera/Capstone-Project/sf_homes_for_sale.csv')

#elementary schools data
sfo_schools = pd.read_csv('/Users/Bhaven/Documents/Personal/Shylaja/Coursera/Capstone-Project/sfo_schools.csv')

#crime data
sfo_crime_2018 = pd.read_csv('/Users/Bhaven/Documents/Personal/Shylaja/Coursera/Capstone-Project/sfo_crimes_2018.csv', encoding = 'ISO-8859-1')

#analyze crime data

In [51]:
#keep only category and location data
sfo_crime_2018 = sfo_crime_2018[['Incident Category', 'Analysis Neighborhood', 'Latitude', 'Longitude']]

sfo_crime_2018

Unnamed: 0,Incident Category,Analysis Neighborhood,Latitude,Longitude
0,Missing Person,Lakeshore,37.726950,-122.476039
1,Stolen Property,Mission,37.752440,-122.415172
2,Non-Criminal,Financial District/South Beach,37.784560,-122.407337
3,Lost Property,,,
4,Miscellaneous Investigation,Pacific Heights,37.787112,-122.440250
...,...,...,...,...
313935,Suspicious Occ,South of Market,37.777494,-122.416292
313936,Lost Property,South of Market,37.780699,-122.403921
313937,Warrant,Mission,37.769199,-122.417783
313938,Larceny Theft,,,


#aggregate crime data by neighborhood

In [52]:
#group by neighborhood
sfo_crime_grouped = sfo_crime_2018.groupby('Analysis Neighborhood')\
       .agg({'Incident Category':'size', 'Latitude':'mean', 'Longitude' : 'mean'}) \
       .rename(columns={'Incident Category':'Count','Latitude':'Latitude', 'Longitude':'Longitude'}) \
       .reset_index()

sfo_crime_grouped.sort_values(by='Count', ascending = False)

Unnamed: 0,Analysis Neighborhood,Count,Latitude,Longitude
18,Mission,34363,37.761487,-122.416917
35,Tenderloin,30893,37.783283,-122.414454
5,Financial District/South Beach,28104,37.789309,-122.401061
33,South of Market,25972,37.778262,-122.40726
0,Bayview Hunters Point,17097,37.733019,-122.390965
40,Western Addition,9771,37.782292,-122.42845
22,North Beach,9675,37.805058,-122.411099
2,Castro/Upper Market,9015,37.763161,-122.432725
34,Sunset/Parkside,8680,37.749499,-122.490828
20,Nob Hill,8535,37.789999,-122.416184


#map crime, housing and schools by neighborhoods

In [53]:
#neighborhood map
sf_nb_geo = '/Users/Bhaven/Documents/Personal/Shylaja/Coursera/Capstone-Project/sf_planning_neighborhoods.geojson'

In [54]:
map_sfo_top = folium.Map(location=[latitude2, longitude2], zoom_start=12)

# crime heat map
map_sfo_top.choropleth(
        geo_data=sf_nb_geo,
        data=sfo_crime_grouped,
        columns=['Analysis Neighborhood','Count'],
        key_on='feature.properties.neighborho',
        fill_color='YlOrRd',
        fill_opacity='0.7',
        line_opacity='0.7')

map_sfo_top

#Conclusion - east, south east of the city has high crime and no affordable housing within budget.

In [55]:
#choose the top 20 neighborhoods with least crime
sfo_crime_top20 = sfo_crime_grouped.sort_values(by='Count', ascending = True).head(20)

sfo_crime_top20 

Unnamed: 0,Analysis Neighborhood,Count,Latitude,Longitude
17,McLaren Park,290,37.719865,-122.414827
32,Seacliff,330,37.786285,-122.484768
14,Lincoln Park,370,37.781395,-122.49841
29,Presidio,730,37.802747,-122.456258
36,Treasure Island,956,37.824153,-122.372701
6,Glen Park,1496,37.738044,-122.433462
37,Twin Peaks,1511,37.751949,-122.446489
30,Presidio Heights,1714,37.785568,-122.450474
21,Noe Valley,2725,37.749092,-122.432114
23,Oceanview/Merced/Ingleside,3059,37.717666,-122.460268


#gather venue data from foursquare. we use the top15 neighborhoods identified so far based on crime and housing

#Foursquare credentials

In [56]:
CLIENT_ID = 'J41BIAY1KQ2HXLKN22DF3RK2AESXSW231JNNZWTQA5YHFZJK' # Foursquare ID
CLIENT_SECRET = 'ZFYKD3J1AS2GMB55VBIKCNEGIL20C1JQZCAQWDPE0UTDGYSI' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

#get FourSquare venues data for the SF

In [57]:
LIMIT=100
radius=500
sfo = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude2, 
    longitude2, 
    radius, 
    LIMIT)

#get the results

In [58]:
sfo_results = requests.get(sfo).json()
sfo_results

{'meta': {'code': 200, 'requestId': '5e43254f9fcb92001bae04a9'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Civic Center',
  'headerFullLocation': 'Civic Center, San Francisco',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 134,
  'suggestedBounds': {'ne': {'lat': 37.7835262045, 'lng': -122.41422325588267},
   'sw': {'lat': 37.774526195499995, 'lng': -122.42558894411734}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4aa48566f964a520024720e3',
       'name': 'Louise M. Davies Symphony Hall',
       'location': {'address': '201 Van Ness Ave',
        'crossStreet': 'btwn Grove & Hayes St',
        'lat': 37.777976164

#function that extracts the category of the venue

In [59]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#format SF venue results into dataframe

In [60]:
sfo_venues = sfo_results['response']['groups'][0]['items']
    
sfo_venues = json_normalize(sfo_venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
sfo_venues =sfo_venues.loc[:, filtered_columns]

# filter the category for each row
sfo_venues['venue.categories'] = sfo_venues.apply(get_category_type, axis=1)

# clean columns
sfo_venues.columns = [col.split(".")[-1] for col in sfo_venues.columns]

sfo_venues.shape

(100, 4)

#function to get venues from all neighborhoods

In [61]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#gather venues for the top15 neighborhoods

In [62]:
sfo_venues = getNearbyVenues(names=sfo_crime_top20['Analysis Neighborhood'],
                                   latitudes=sfo_crime_top20['Latitude'],
                                   longitudes=sfo_crime_top20['Longitude']
                                  )
sfo_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,McLaren Park,37.719865,-122.414827,John McLaren Park,37.718556,-122.417307,Park
1,McLaren Park,37.719865,-122.414827,John McLaren Dog Run,37.719981,-122.418961,Dog Run
2,McLaren Park,37.719865,-122.414827,Louis Sutter Playground,37.722388,-122.413928,Baseball Field
3,McLaren Park,37.719865,-122.414827,Philosopher's Way,37.718133,-122.412717,Trail
4,Seacliff,37.786285,-122.484768,Seacliff,37.788259,-122.486401,Neighborhood
...,...,...,...,...,...,...,...
709,Potrero Hill,37.758627,-122.395346,Caltrain #428,37.754811,-122.395215,Train
710,Potrero Hill,37.758627,-122.395346,Dogpatch Arts Plaza,37.761404,-122.391799,Park
711,Potrero Hill,37.758627,-122.395346,Kate's Closet,37.762590,-122.395763,Clothing Store
712,Potrero Hill,37.758627,-122.395346,Parklet,37.762718,-122.396034,Park


In [63]:
sfo_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Glen Park,25,25,25,25,25,25
Golden Gate Park,56,56,56,56,56,56
Inner Richmond,78,78,78,78,78,78
Inner Sunset,59,59,59,59,59,59
Japantown,100,100,100,100,100,100
Lakeshore,15,15,15,15,15,15
Lincoln Park,20,20,20,20,20,20
Lone Mountain/USF,26,26,26,26,26,26
McLaren Park,4,4,4,4,4,4
Noe Valley,80,80,80,80,80,80


In [64]:
# one hot encoding
sfo_onehot = pd.get_dummies(sfo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sfo_onehot['Neighborhood'] = sfo_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sfo_onehot.columns[-1]] + list(sfo_onehot.columns[:-1])
sfo_onehot = sfo_onehot[fixed_columns]

sfo_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Alternative Healer,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [65]:
sfo_grouped = sfo_onehot.groupby('Neighborhood').mean().reset_index()
sfo_grouped

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Alternative Healer,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,...,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Glen Park,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Golden Gate Park,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Inner Richmond,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,...,0.0,0.012821,0.0,0.0,0.0,0.038462,0.012821,0.0,0.0,0.0
3,Inner Sunset,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,...,0.0,0.0,0.0,0.016949,0.0,0.033898,0.0,0.0,0.0,0.0
4,Japantown,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0
5,Lakeshore,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Lincoln Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0
7,Lone Mountain/USF,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462
8,McLaren Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Noe Valley,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.025,0.0125,0.0125


In [66]:
num_top_venues = 5

for hood in sfo_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sfo_grouped[sfo_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Glen Park----
                venue  freq
0               Trail  0.08
1         Coffee Shop  0.08
2  Italian Restaurant  0.04
3         Cheese Shop  0.04
4    Sushi Restaurant  0.04


----Golden Gate Park----
            venue  freq
0          Garden  0.16
1  Science Museum  0.07
2         Exhibit  0.05
3       Gift Shop  0.05
4            Park  0.05


----Inner Richmond----
                 venue  freq
0   Chinese Restaurant  0.08
1  Japanese Restaurant  0.08
2    Korean Restaurant  0.05
3               Bakery  0.05
4     Sushi Restaurant  0.05


----Inner Sunset----
              venue  freq
0       Pizza Place  0.05
1            Bakery  0.05
2    Sandwich Place  0.05
3  Sushi Restaurant  0.05
4    Cosmetics Shop  0.03


----Japantown----
                 venue  freq
0            Gift Shop  0.06
1               Bakery  0.04
2             Tea Room  0.04
3  Japanese Restaurant  0.04
4        Shopping Mall  0.03


----Lakeshore----
                venue  freq
0                Café  

In [67]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [68]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sfo_grouped['Neighborhood']

for ind in np.arange(sfo_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sfo_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Glen Park,Coffee Shop,Trail,Bakery,Café,French Restaurant,Bubble Tea Shop,Breakfast Spot,Park,Bookstore,Cheese Shop
1,Golden Gate Park,Garden,Science Museum,Exhibit,Park,Gift Shop,Food Truck,Lake,Fountain,Tea Room,Sculpture Garden
2,Inner Richmond,Japanese Restaurant,Chinese Restaurant,Bakery,Sushi Restaurant,Korean Restaurant,Thai Restaurant,Bar,Vietnamese Restaurant,BBQ Joint,Asian Restaurant
3,Inner Sunset,Pizza Place,Sandwich Place,Sushi Restaurant,Bakery,Vietnamese Restaurant,Gym,Chinese Restaurant,Thai Restaurant,Korean Restaurant,Mediterranean Restaurant
4,Japantown,Gift Shop,Bakery,Japanese Restaurant,Tea Room,Café,Grocery Store,Ice Cream Shop,Shopping Mall,Boutique,Ramen Restaurant
5,Lakeshore,Sandwich Place,Café,Snack Place,Coffee Shop,Performing Arts Venue,Fish Market,Tennis Court,Gym,Cocktail Bar,Mexican Restaurant
6,Lincoln Park,Outdoor Sculpture,Golf Course,Playground,Burmese Restaurant,Martial Arts Dojo,Sandwich Place,Café,Moroccan Restaurant,Sushi Restaurant,Bar
7,Lone Mountain/USF,Coffee Shop,Mexican Restaurant,Bank,Café,Salon / Barbershop,Supermarket,Car Wash,Gas Station,Pub,Pool Hall
8,McLaren Park,Dog Run,Park,Baseball Field,Trail,Yoga Studio,Field,French Restaurant,Fountain,Food Truck,Food & Drink Shop
9,Noe Valley,Coffee Shop,Bookstore,Bakery,Gift Shop,Gym,Italian Restaurant,Breakfast Spot,Pizza Place,Mexican Restaurant,Bagel Shop


In [69]:
# set number of clusters
kclusters = 3

sfo_grouped_clustering = sfo_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sfo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 0, 1], dtype=int32)

#add clustering data and crime count along with top10 venue categories

In [70]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sfo_merged = sfo_crime_top20

sfo_merged = sfo_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Analysis Neighborhood')
#drop null
sfo_merged.dropna(inplace=True)
sfo_merged['Cluster Labels'] = sfo_merged['Cluster Labels'].astype(int)
sfo_merged

Unnamed: 0,Analysis Neighborhood,Count,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,McLaren Park,290,37.719865,-122.414827,0,Dog Run,Park,Baseball Field,Trail,Yoga Studio,Field,French Restaurant,Fountain,Food Truck,Food & Drink Shop
32,Seacliff,330,37.786285,-122.484768,1,Grocery Store,Restaurant,Café,Chinese Restaurant,Coffee Shop,Pizza Place,Sushi Restaurant,Pet Store,Thai Restaurant,Burrito Place
14,Lincoln Park,370,37.781395,-122.49841,1,Outdoor Sculpture,Golf Course,Playground,Burmese Restaurant,Martial Arts Dojo,Sandwich Place,Café,Moroccan Restaurant,Sushi Restaurant,Bar
29,Presidio,730,37.802747,-122.456258,1,Museum,Food Truck,Café,Gym,Sporting Goods Shop,Street Food Gathering,Field,Gift Shop,Scenic Lookout,Theater
36,Treasure Island,956,37.824153,-122.372701,0,Park,Island,Bus Station,Flea Market,Grocery Store,Gym,Athletics & Sports,Breakfast Spot,American Restaurant,Eastern European Restaurant
6,Glen Park,1496,37.738044,-122.433462,1,Coffee Shop,Trail,Bakery,Café,French Restaurant,Bubble Tea Shop,Breakfast Spot,Park,Bookstore,Cheese Shop
37,Twin Peaks,1511,37.751949,-122.446489,2,Scenic Lookout,Trail,Mountain,Yoga Studio,Fast Food Restaurant,Food Truck,Food & Drink Shop,Food,Flower Shop,Flea Market
30,Presidio Heights,1714,37.785568,-122.450474,1,Coffee Shop,Café,New American Restaurant,Spa,Cosmetics Shop,American Restaurant,Italian Restaurant,Bed & Breakfast,Furniture / Home Store,Bookstore
21,Noe Valley,2725,37.749092,-122.432114,1,Coffee Shop,Bookstore,Bakery,Gift Shop,Gym,Italian Restaurant,Breakfast Spot,Pizza Place,Mexican Restaurant,Bagel Shop
23,Oceanview/Merced/Ingleside,3059,37.717666,-122.460268,1,Playground,Japanese Restaurant,Bus Station,Liquor Store,Chinese Restaurant,Dive Bar,Dog Run,Fountain,Food Truck,Food & Drink Shop


#based on crime data and desired venue data, the top neighborhoods of choice are - SeaCliff, Lincoln Park,Glen Park, Oceanview, Portola

In [71]:
from folium.features import DivIcon

map_clusters2 = folium.Map(location=[latitude2, longitude2], zoom_start=12)

for lat, lon, text in zip(sfo_merged['Latitude'], sfo_merged['Longitude'], sfo_merged['Analysis Neighborhood']):
    folium.map.Marker(
        [lat, lon],
        icon=DivIcon(
        icon_size=(150,36),
        icon_anchor=(0,0),
        html='<div style="font-size: 8pt">%s</div>' % text,
        )
    ).add_to(map_clusters2)
    
# create map with clusters
kclusters=3

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sfo_merged['Latitude'], sfo_merged['Longitude'], sfo_merged['Analysis Neighborhood'], sfo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7
).add_to(map_clusters2)
       

# add markers to map for homes
for lat, lng, label in zip(sfo_homes['LATITUDE'], sfo_homes['LONGITUDE'], '$ ' + sfo_homes['PRICE'].astype(str)+ sfo_homes['ADDRESS']):      
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters2)  
    
# add markers to map for schools
for lat, lng, label in zip(sfo_schools['Latitude'], sfo_schools['Longitude'], sfo_schools['Campus Address']):      
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters2)  


map_clusters2

#Based on a. low crime rate, b.availability of housing, c. elementary schools in vicinity, d. diversity of restaurants and e. park/trail - the top 3 neighborhoods for the family are  
Lincoln Park, Oceanview/Merced/Ingleside and Seacliff