# Capstone Final

## Table of Contents

* [Introduction/Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction/Business Problem <a name="introduction"></a>
Founded in 1776 and incorporated in 1850, San Francisco, CA is the 5th most populated city in the United States and one of the world’s most recognizable cities. Although the Financial District, Union Square, Fisherman’s Wharf and the Golden Gate Bridge are known around the world, San Francisco is also characterized by its numerous culturally rich streetscapes featuring mixed-use neighborhoods anchored around central commercial corridors to which residents and visitors alike can walk. With all of these options, it can be difficult to decide where to go and what to see when visiting, so it’s best to plan ahead. 

When looking for activities to do in any city as large as San Francisco it can be difficult to decide where to go and what to do in an order which maximizes activities and minimizes time lost due to traveling between activities. This project will assist those in San Francisco at doing just that, seeing the most attractions in the least amount of time.

## Data <a name="data"></a>

The data in this project comes from several sources:

•	The neighborhood geocoding data comes from  https://github.com/codeforamerica/click_that_hood/blob/master/public/data/san-francisco.geojson. This gives the outlines and names of the neighborhoods in San Francisco which will be used to generate maps.

•	The top San Francisco landmarks were found on google at: https://www.google.com/destination/map/topsights?q=things+to+do+in+san+francisco&rlz=1C1CHFX_enUS551US551&output=search&dest_mid=/m/0d6lp&sa=X These were manually compiled into a list due to API restrictions. The data will be used to locate landmarks within the neighborhoods listed above.

•	The venues data was located using the Foursquare API. This will be used to find attractions near and around the landmarks listed above.


## Install initial Packages

In [1]:
!conda install -c conda-forge folium
!conda install -c conda-forge geopy --yes
import pandas as pd
import numpy as np
import folium
from geopy.geocoders import Nominatim
import json
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
print("done")

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.8.3-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  46.47 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  37.38 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  42.85 MB/s
folium-0.8.3-p 100% |################################| Time: 0:00:00  50.08 MB/s
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.19.0-py_0 conda-forge

geographiclib- 100% |###################

## Gather Data

I'll be using 3 data sources for this. The first is Google. Google had the best list of places to see in San Francisco vs all the other sites I searched. The list wasn't too long and wasn't too short and only included places, not events such as 10k runs or festivals. Unfortunately doesn't allow webscraping and their API doesn't allow for the gathering of data from the page which lists the landmarks individually. Thus, I have compiled the list manually.

### Landmark Info

In [2]:
## List of places from Google
places = ['Golden Gate Bridge', 'Alcatraz Island', "Fisherman's Wharf", 'Golden Gate Park', 'Pier 39', 'Union Square', 'Lombard Street', 'Chinatown', 'Coit Tower', 'Palace of Fine Arts', 
            'Presidio of San Francisco', 'Twin Peaks', 'California Academy of Sciences', 'North Beach', 'Painted Ladies', 'Japanese Tea Garden', 'Haight-Ashbury', 'Exploratorium', 'de Young Museum', 'Embarcadero', 
          'Ghirardelli Square', 'Crissy Field', 'Baker Beach', 'Nob Hill', 'Alamo Square', 'Aquarium of the Bay', 'Conservatory of Flowers', 'San Francisco City Hall', 'San Francisco Zoo', 
          'Cable Car Museum', 'Walt Disney Family Museum', 'Oracle Park', 'Sutro Baths', 'Legion of Honor', 'Mission Dolores Park', 'San Francisco Botanical Garden', 'San Francisco Civic Center']

#Put places into a dataframe and display
places_df = pd.DataFrame({'Landmark': places})
places_df

Unnamed: 0,Landmark
0,Golden Gate Bridge
1,Alcatraz Island
2,Fisherman's Wharf
3,Golden Gate Park
4,Pier 39
5,Union Square
6,Lombard Street
7,Chinatown
8,Coit Tower
9,Palace of Fine Arts


#### Get Lat and Long of landmarks using Nominatim

In [3]:
data = [] #create a blank list

for index, row in places_df.iterrows(): #loop through each row of the dataframe
    address = (row['Landmark'] + " San Francisco, CA, USA") #add "San Francisco, CA, USA to the end of the Landmark name to assist with getting the correct coordinates"
#    print(address)
    geolocator = Nominatim(user_agent="San_Francisco_Landmarks")#name of project per Nominatim ToS
    location = geolocator.geocode(address)
    if location is None: #If the geocoder can't find a site, it will return None and generate an error. This forces that data to be 0,0 so that it will continue running
        latitude = 0
        longitude = 0
    else: 
        latitude = location.latitude
        longitude = location.longitude
    data.append([row['Landmark'], latitude, longitude])
landmarks_df = pd.DataFrame(data, columns=['Landmark', 'Latitude', 'Longitude']) #place the new lat/long data along with the landmarks from the list into a dataframe
landmarks_df

Unnamed: 0,Landmark,Latitude,Longitude
0,Golden Gate Bridge,37.830321,-122.47975
1,Alcatraz Island,37.826746,-122.422741
2,Fisherman's Wharf,37.809167,-122.416599
3,Golden Gate Park,37.769368,-122.482184
4,Pier 39,37.809785,-122.410266
5,Union Square,37.787936,-122.407517
6,Lombard Street,37.802076,-122.418809
7,Chinatown,37.794301,-122.406376
8,Coit Tower,37.802379,-122.405834
9,Palace of Fine Arts,37.802919,-122.448403


### Neighborhood info

To get the neighborhood info, I'm using a geojson file of San Francisco found at https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/san-francisco.geojson

#### Get geojson data

In [4]:
#download the geojson file
!wget --quiet https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/san-francisco.geojson -O sanfran.json 
    
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


#### Pull neighborhood names out of json

In [5]:
d = [] #create blank list

with open('sanfran.json') as f:
    data = json.load(f) #load the json file

for feature in data['features']:
#    print(feature['properties']['name'])
    d.append(feature['properties']['name'])    #pull the name of each geocoded area into the list

neighborhoods = pd.DataFrame(d, columns=['Neighborhood']) #conver the list to a dataframe
neighborhoods

Unnamed: 0,Neighborhood
0,Seacliff
1,Marina
2,Pacific Heights
3,Nob Hill
4,Presidio Heights
5,Downtown/Civic Center
6,Excelsior
7,Bernal Heights
8,Western Addition
9,Chinatown


#### Get coords of neighborhoods

In [6]:
data = [] #create a blank list

for index, row in neighborhoods.iterrows():   #loop through each row of the dataframe
    address = (row['Neighborhood'] + " San Francisco, CA, USA") #append city/state/country to assist in locating
#    print(address)
    geolocator = Nominatim(user_agent="San_Francisco_Landmarks") #name of project per Nominatim ToS
    location = geolocator.geocode(address)
    if location is None:  #Stops python from erroring should the location not be found
        latitude = 0
        longitude = 0
    else: 
        latitude = location.latitude
        longitude = location.longitude
    data.append([row['Neighborhood'], latitude, longitude]) #place data into list
neighborhoods_df = pd.DataFrame(data, columns=['Neighborhood', 'Latitude', 'Longitude']) #convert list into dataframe
neighborhoods_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Seacliff,37.788541,-122.486916
1,Marina,37.799793,-122.435205
2,Pacific Heights,37.792717,-122.435644
3,Nob Hill,37.794479,-122.415592
4,Presidio Heights,37.788751,-122.453027
5,Downtown/Civic Center,37.787514,-122.407159
6,Excelsior,37.721794,-122.435382
7,Bernal Heights,37.741001,-122.414214
8,Western Addition,37.779559,-122.42981
9,Chinatown,37.794301,-122.406376


Some rows came out to 0, meaning the geocoding couldn't find them. For my purposes, I'll remove those rows.

In [7]:
neighborhoods_df.drop([14,17,21,24,26,34,35], axis=0, inplace=True) #delete the rows where lat/long is 0.
neighborhoods_df.reset_index(drop=True) #set the index back to 0-X sequentially
neighborhoods_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Seacliff,37.788541,-122.486916
1,Marina,37.799793,-122.435205
2,Pacific Heights,37.792717,-122.435644
3,Nob Hill,37.794479,-122.415592
4,Presidio Heights,37.788751,-122.453027
5,Downtown/Civic Center,37.787514,-122.407159
6,Excelsior,37.721794,-122.435382
7,Bernal Heights,37.741001,-122.414214
8,Western Addition,37.779559,-122.42981
9,Chinatown,37.794301,-122.406376


#### Get coords of San Francisco

In [8]:
#Find coordinates of San Francisco itself
address = ('San Francisco, CA, USA')
geolocator = Nominatim(user_agent="San_Francisco_Landmarks")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

sanfran_lat = latitude
sanfran_lon = longitude
print(sanfran_lat, sanfran_lon)

37.7792808 -122.4192363


## Methodology <a name="methodology"></a>


### Verify the data looks reasonable

#### View Map of San Francisco Neighborhoods

In [9]:
sanfran_lat = 37.76 #The geocoded information with the zoom level I'm using was cutting off part of the map, so I had to improvise
sanfran_geo = r'sanfran.json' #the geojson we loaded earlier

sanfran_map = folium.Map(location = [sanfran_lat, sanfran_lon], zoom_start = 13)

sanfran_map.choropleth(
    geo_data=sanfran_geo,
    key_on='feature.properties.name', #each section is divided by name
    fill_color='BuGn', #just a random choice, the opacity of 20% makes it so that it really doesn't matter what color was used
    fill_opacity=0.2, 
    line_opacity=1,
)

for lat, lon, sanfran in zip(neighborhoods_df['Latitude'], neighborhoods_df['Longitude'],  #loop through the neighborhood data
                                 neighborhoods_df['Neighborhood']):
    folium.CircleMarker( #create markers to go onto the map
        [lat, lon],
        radius = 7, 
        popup = folium.Popup(sanfran),
        color = 'black',
        fill = True,
        fill_color = '#7dba00',
        fill_opacity = 0.7).add_to(sanfran_map)
        
#icon=folium.Icon(color='red', icon='star')).add_to(sanfran_map)  #I was going to use map markers and was playing around with icons, but those turned out to be to large to be useful
# display map
sanfran_map



#### Now add the landmarks

In [10]:
for lat, lon, sanfran in zip(landmarks_df['Latitude'], landmarks_df['Longitude'], #loop through landmark data
                                 landmarks_df['Landmark']):
    #place markers of each landmark on the map
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'red'
        ).add_to(sanfran_map)

sanfran_map

The districts with the most landmarks are Golden Gate Park, Presidio and North Beach. Russian Hill, Nob Hill, and Chinatown have fewer landmarks, but they're close enough to each other that we can group them together.  As well as Haight Ashbury being close enough to Golden Gate Park. We'll get the Foursquare data for these districts

In [11]:
#create new dataframe with only those districts with the highest concentration of landmarks
districts = neighborhoods_df[neighborhoods_df['Neighborhood'].isin(['Golden Gate Park', 'Presidio', 'North Beach', 'Russian Hill', 'Nob Hill', 'Chinatown', 'Haight Ashbury'])]
districts.reset_index(drop=True) #reset the index to be sequential 0-X

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Nob Hill,37.794479,-122.415592
1,Chinatown,37.794301,-122.406376
2,North Beach,37.801175,-122.409002
3,Haight Ashbury,37.770015,-122.446952
4,Russian Hill,37.797707,-122.414971
5,Golden Gate Park,37.769368,-122.482184
6,Presidio,37.798746,-122.464589


### Foursquare Data

#### Credentials

In [12]:
# The code was removed by Watson Studio for sharing.

In [13]:
LIMIT = 100 #The documentation says it limits to 50, but I've seen other examples where people did 500. No matter what number I set my limit it, it came out to 100, so I just left it at 100
radius = 1609  #1 mile in meters

def getNearbyVenues(names, latitudes, longitudes, radius=radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
       # print(url)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
#run the data for each Neighborhood through the above function
sanfran_venues = getNearbyVenues(names=districts['Neighborhood'],
                                   latitudes=districts['Latitude'],
                                   longitudes=districts['Longitude']
                                  )

Nob Hill
Chinatown
North Beach
Haight Ashbury
Russian Hill
Golden Gate Park
Presidio


In [15]:
print(sanfran_venues.shape) #check to see how many rows

(700, 7)


So by gathering the foursquare data, we have venues. I'll add these to the map to see how it looks

In [16]:
sanfran_venues.head() #take a quick look at the data

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Nob Hill,37.794479,-122.415592,Hot Sauce and Panko,37.794576,-122.41808,Wings Joint
1,Nob Hill,37.794479,-122.415592,Le Beau Market,37.792934,-122.416205,Grocery Store
2,Nob Hill,37.794479,-122.415592,Keiko à Nob Hill,37.793251,-122.414214,French Restaurant
3,Nob Hill,37.794479,-122.415592,Cafe Meuse,37.795476,-122.41836,Wine Bar
4,Nob Hill,37.794479,-122.415592,Collis P. Huntington Park,37.792162,-122.412154,Park


Since San Francisco is relatively small in area, there are bound to be overlappings between districts which are close together, so we'll remove venues with the same name

In [17]:
#remove duplicates based on Venue name
sanfran_venues = sanfran_venues.sort_values('Venue', ascending=True)
sanfran_venues = sanfran_venues.drop_duplicates(subset='Venue', keep='first')
print(sanfran_venues.head())
print(sanfran_venues.shape)

         Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
227       North Beach              37.801175             -122.409002   
526  Golden Gate Park              37.769368             -122.482184   
287       North Beach              37.801175             -122.409002   
380    Haight Ashbury              37.770015             -122.446952   
98           Nob Hill              37.794479             -122.415592   

                              Venue  Venue Latitude  Venue Longitude  \
227                       15 Romolo       37.798134      -122.406380   
526            22nd & Irving Market       37.763323      -122.480479   
287         343 Sansome Roof Garden       37.793650      -122.401489   
380              4505 Burgers & BBQ       37.776125      -122.438142   
98   Akiko’s Restaurant & Sushi Bar       37.790623      -122.404657   

       Venue Category  
227      Cocktail Bar  
526     Grocery Store  
287            Garden  
380         BBQ Joint  
98   Sushi Res

This brings us down to 485 venues

In [18]:
#create map showing all venues on top of the map we created earlier
for lat, lon, sanfran in zip(sanfran_venues['Venue Latitude'], sanfran_venues['Venue Longitude'], 
                                 sanfran_venues['Venue']):
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'blue'
        ).add_to(sanfran_map)

sanfran_map

## Data Analysis <a name="analysis"></a>

### Grouping

#### First we'll group venues by neighborhood

In [19]:
sanfran_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chinatown,55,55,55,55,55,55
Golden Gate Park,99,99,99,99,99,99
Haight Ashbury,100,100,100,100,100,100
Nob Hill,47,47,47,47,47,47
North Beach,46,46,46,46,46,46
Presidio,100,100,100,100,100,100
Russian Hill,38,38,38,38,38,38


#### Check for unique categories

In [20]:
print('There are {} uniques categories.'.format(len(sanfran_venues['Venue Category'].unique())))

There are 174 uniques categories.


#### Analyze Each Neighborhood

In [21]:

# one hot encoding
sanfran_onehot = pd.get_dummies(sanfran_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sanfran_onehot['Neighborhood'] = sanfran_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sanfran_onehot.columns[-1]] + list(sanfran_onehot.columns[:-1])
sanfran_onehot = sanfran_onehot[fixed_columns]

sanfran_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,American Restaurant,Amphitheater,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Track Stadium,Trade School,Trail,Trattoria/Osteria,Tunnel,Vietnamese Restaurant,Waterfall,Wine Bar,Wine Shop,Wings Joint
227,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
526,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
287,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
380,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
98,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
sanfran_onehot.shape #check number of rows again

(485, 174)

#### group by neighborhood

In [23]:
sanfran_grouped = sanfran_onehot.groupby('Neighborhood').mean().reset_index()
sanfran_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,American Restaurant,Amphitheater,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Track Stadium,Trade School,Trail,Trattoria/Osteria,Tunnel,Vietnamese Restaurant,Waterfall,Wine Bar,Wine Shop,Wings Joint
0,Chinatown,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0
1,Golden Gate Park,0.0,0.0,0.010101,0.0,0.010101,0.0,0.010101,0.010101,0.0,...,0.0,0.0,0.020202,0.0,0.0,0.040404,0.010101,0.0,0.0,0.0
2,Haight Ashbury,0.02,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.01,...,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0
3,Nob Hill,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0
4,North Beach,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.065217,0.0,0.0
5,Presidio,0.01,0.0,0.02,0.01,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.0
6,Russian Hill,0.052632,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,...,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.026316,0.026316


In [24]:
sanfran_grouped.shape

(7, 174)

#### Find top 5 venues per neighborhood

In [25]:
num_top_venues = 5

for hood in sanfran_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sanfran_grouped[sanfran_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Chinatown----
                venue  freq
0         Coffee Shop  0.05
1            Boutique  0.05
2         Pizza Place  0.05
3  Seafood Restaurant  0.04
4               Hotel  0.04


----Golden Gate Park----
                venue  freq
0                Park  0.06
1              Bakery  0.05
2  Chinese Restaurant  0.04
3          Playground  0.04
4              Garden  0.04


----Haight Ashbury----
                venue  freq
0                Park  0.06
1         Coffee Shop  0.04
2  Mexican Restaurant  0.03
3      Breakfast Spot  0.03
4                Café  0.03


----Nob Hill----
                venue  freq
0                Park  0.09
1    Sushi Restaurant  0.06
2  Italian Restaurant  0.06
3         Coffee Shop  0.06
4           Speakeasy  0.04


----North Beach----
                     venue  freq
0              Pizza Place  0.09
1                 Wine Bar  0.07
2  New American Restaurant  0.07
3       Seafood Restaurant  0.04
4           Scenic Lookout  0.04


----Presidio----


#### Now we'll put that data into a dataframe

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
#find the top 5 venues per neighborhood
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sanfran_grouped['Neighborhood']

for ind in np.arange(sanfran_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sanfran_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Chinatown,Pizza Place,Coffee Shop,Boutique,Cocktail Bar,Sushi Restaurant
1,Golden Gate Park,Park,Bakery,Garden,Vietnamese Restaurant,Chinese Restaurant
2,Haight Ashbury,Park,Coffee Shop,Breakfast Spot,Bookstore,Pizza Place
3,Nob Hill,Park,Coffee Shop,Sushi Restaurant,Italian Restaurant,Cocktail Bar
4,North Beach,Pizza Place,Wine Bar,New American Restaurant,Seafood Restaurant,Scenic Lookout
5,Presidio,Scenic Lookout,Trail,Park,Historic Site,Café
6,Russian Hill,Yoga Studio,Grocery Store,Hotel,Sandwich Place,Beer Bar


### Cluster Neighborhoods

In [28]:
# set number of clusters
kclusters = 5

sanfran_grouped_clustering = sanfran_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sanfran_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 4, 1, 3, 2], dtype=int32)

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sanfran_merged = districts

#merge sanfran_grouped with sanfran_data to add latitude/longitude for each neighborhood
sanfran_merged = sanfran_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

sanfran_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Nob Hill,37.794479,-122.415592,4,Park,Coffee Shop,Sushi Restaurant,Italian Restaurant,Cocktail Bar
9,Chinatown,37.794301,-122.406376,1,Pizza Place,Coffee Shop,Boutique,Cocktail Bar,Sushi Restaurant
10,North Beach,37.801175,-122.409002,1,Pizza Place,Wine Bar,New American Restaurant,Seafood Restaurant,Scenic Lookout
11,Haight Ashbury,37.770015,-122.446952,0,Park,Coffee Shop,Breakfast Spot,Bookstore,Pizza Place
23,Russian Hill,37.797707,-122.414971,2,Yoga Studio,Grocery Store,Hotel,Sandwich Place,Beer Bar
28,Golden Gate Park,37.769368,-122.482184,0,Park,Bakery,Garden,Vietnamese Restaurant,Chinese Restaurant
33,Presidio,37.798746,-122.464589,3,Scenic Lookout,Trail,Park,Historic Site,Café


#### Create Map

In [30]:

# create new map with clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

map_clusters.choropleth(
    geo_data=sanfran_geo,
    key_on='feature.properties.name',
    fill_color='BuGn', 
    fill_opacity=0.2, 
    line_opacity=1,
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sanfran_merged['Latitude'], sanfran_merged['Longitude'], sanfran_merged['Neighborhood'], sanfran_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters



#### Examine Clusters

In [31]:
sanfran_merged.loc[sanfran_merged['Cluster Labels'] == 0, sanfran_merged.columns[[0] + list(range(4, sanfran_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,Haight Ashbury,Park,Coffee Shop,Breakfast Spot,Bookstore,Pizza Place
28,Golden Gate Park,Park,Bakery,Garden,Vietnamese Restaurant,Chinese Restaurant


In [32]:
sanfran_merged.loc[sanfran_merged['Cluster Labels'] == 1, sanfran_merged.columns[[0] + list(range(4, sanfran_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Chinatown,Pizza Place,Coffee Shop,Boutique,Cocktail Bar,Sushi Restaurant
10,North Beach,Pizza Place,Wine Bar,New American Restaurant,Seafood Restaurant,Scenic Lookout


In [33]:
sanfran_merged.loc[sanfran_merged['Cluster Labels'] == 2, sanfran_merged.columns[[0] + list(range(4, sanfran_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,Russian Hill,Yoga Studio,Grocery Store,Hotel,Sandwich Place,Beer Bar


In [34]:
sanfran_merged.loc[sanfran_merged['Cluster Labels'] == 3, sanfran_merged.columns[[0] + list(range(4, sanfran_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
33,Presidio,Scenic Lookout,Trail,Park,Historic Site,Café


In [35]:
sanfran_merged.loc[sanfran_merged['Cluster Labels'] == 4, sanfran_merged.columns[[0] + list(range(4, sanfran_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Nob Hill,Park,Coffee Shop,Sushi Restaurant,Italian Restaurant,Cocktail Bar


#### Cluster the Landmarks

In [36]:
# set number of clusters
kclusters = 5

sanfran_landmark_clustering = landmarks_df.drop('Landmark', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sanfran_landmark_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 1, 1, 2, 1, 1, 1, 1, 1, 3], dtype=int32)

#### Add Clustering labels

In [37]:
# add clustering labels
landmarks_df.insert(0, 'Cluster Labels', kmeans.labels_)

landmarks_merged = landmarks_df

landmarks_merged

Unnamed: 0,Cluster Labels,Landmark,Latitude,Longitude
0,3,Golden Gate Bridge,37.830321,-122.47975
1,1,Alcatraz Island,37.826746,-122.422741
2,1,Fisherman's Wharf,37.809167,-122.416599
3,2,Golden Gate Park,37.769368,-122.482184
4,1,Pier 39,37.809785,-122.410266
5,1,Union Square,37.787936,-122.407517
6,1,Lombard Street,37.802076,-122.418809
7,1,Chinatown,37.794301,-122.406376
8,1,Coit Tower,37.802379,-122.405834
9,3,Palace of Fine Arts,37.802919,-122.448403


#### Now adding these to the map

In [38]:
markers_colors = []
for lat, lon, poi, cluster in zip(landmarks_merged['Latitude'], landmarks_merged['Longitude'], landmarks_merged['Landmark'], landmarks_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Add venues

In [39]:
for lat, lon, sanfran in zip(sanfran_venues['Venue Latitude'], sanfran_venues['Venue Longitude'], 
                                 sanfran_venues['Venue']):
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'blue'
        ).add_to(map_clusters)

map_clusters

These categories are too vague, you don't know if you're looking at a grocery store or a Nightclub. That being the case, I've manually assigned broader categories to the individual categories and place them in the venues list below: 

In [40]:
#create list of broad categories based on the categories in the dataframe. This was done externally in Excel with a vlookup.
broad_cat = ['Bar', 'Retail', 'Attraction', 'Restaurant', 'Restaurant', 'Retail', 'Retail', 'Sports', 'Retail', 'Attraction', 'Retail', 'Trail', 'Retail', 'Park', 'Restaurant', 'Attraction', 'Retail', 'Restaurant', 'Retail', 
'Restaurant', 'Restaurant', 'Attraction', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Trail', 'Attraction', 'Park', 'Attraction', 'Trail', 'Attraction', 'Trail', 'Retail', 'Bar', 'Retail', 
'Retail', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Retail', 'Bar', 'Restaurant', 'Park', 'Attraction', 
'Sports', 'Restaurant', 'Retail', 'Bar', 'Restaurant', 'Restaurant', 'Restaurant', 'Attraction', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 
'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Attraction', 'Retail', 'Attraction', 'Bar', 'Attraction', 'Park', 'Retail', 'Bar', 'Garden', 'Attraction', 'Park', 'Park', 'Attraction', 'Restaurant', 'Park', 
'Park', 'Park', 'Attraction', 'Park', 'Garden', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Bar', 'Park', 'Restaurant', 'Restaurant', 'Attraction', 'Retail', 'Attraction', 'Retail', 'Bar', 'Retail', 
'Trail', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Bar', 'Bar', 'Attraction', 'Attraction', 'Park', 'Sports', 'Restaurant', 'Attraction', 'Attraction', 'Park', 'Retail', 'Restaurant', 'Park', 'Park', 
'Retail', 'Attraction', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Attraction', 'Retail', 'Attraction', 'Park', 'Attraction', 'Park', 'Attraction', 'Attraction', 'Park', 'Attraction', 'Attraction', 
'Retail', 'Retail', 'Retail', 'Park', 'Attraction', 'Retail', 'Restaurant', 'Retail', 'Trail', 'Restaurant', 'Retail', 'Retail', 'Restaurant', 'Attraction', 'Restaurant', 'Retail', 'Park', 'Retail', 'Retail', 
'Park', 'Attraction', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Bar', 'Restaurant', 'Attraction', 'Restaurant', 'Attraction', 'Attraction', 'Park', 
'Restaurant', 'Restaurant', 'Attraction', 'Attraction', 'Restaurant', 'Restaurant', 'Restaurant', 'Park', 'Restaurant', 'Garden', 'Park', 'Retail', 'Attraction', 'Park', 'Restaurant', 'Restaurant', 'Restaurant', 
'Bar', 'Sports', 'Park', 'Restaurant', 'Restaurant', 'Restaurant', 'Park', 'Sports', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Sports', 'Attraction', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant',
'Retail', 'Retail', 'Restaurant', 'Park', 'Bar', 'Attraction', 'Restaurant', 'Retail', 'Attraction', 'Bar', 'Attraction', 'Restaurant', 'Attraction', 'Restaurant', 'Trail', 'Retail', 'Retail', 'Retail', 'Bar', 
'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Bar', 'Attraction', 'Retail', 'Restaurant', 'Park', 'Park', 'Restaurant', 'Restaurant', 'Attraction', 'Restaurant', 'Retail', 
'Attraction', 'Retail', 'Attraction', 'Park', 'Park', 'Park', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Park', 'Attraction', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Retail', 'Restaurant', 
'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Attraction', 'Restaurant', 'Attraction', 'Retail', 'Attraction', 'Attraction', 
'Restaurant', 'Trail', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Park', 'Restaurant', 'Retail', 'Restaurant', 'Sports', 'Sports', 'Park', 'Attraction', 
'Restaurant', 'Retail', 'Attraction', 'Attraction', 'Attraction', 'Attraction', 'Attraction', 'Retail', 'Park', 'Attraction', 'Park', 'Trail', 'Restaurant', 'Sports', 'Retail', 'Retail', 'Sports', 'Park', 
'Park', 'Retail', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Park', 'Park', 'Restaurant', 'Garden', 'Retail', 'Bar', 
'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Attraction', 'Attraction', 'Retail', 'Restaurant', 'Trail', 'Retail', 
'Attraction', 'Retail', 'Retail', 'Restaurant', 'Restaurant', 'Garden', 'Attraction', 'Retail', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Attraction', 'Restaurant', 'Park', 
'Attraction', 'Retail', 'Restaurant', 'Attraction', 'Attraction', 'Trail', 'Restaurant', 'Park', 'Restaurant', 'Retail', 'Park', 'Restaurant', 'Restaurant', 'Retail', 'Restaurant', 'Retail', 'Restaurant', 'Park', 
'Restaurant', 'Restaurant', 'Bar', 'Retail', 'Retail', 'Sports', 'Retail', 'Retail', 'Restaurant', 'Bar', 'Retail', 'Retail', 'Restaurant', 'Bar', 'Restaurant', 'Retail', 'Retail', 'Bar', 'Restaurant', 'Restaurant', 
'Retail', 'Retail', 'Sports', 'Retail', 'Bar', 'Park', 'Restaurant', 'Retail', 'Attraction', 'Restaurant', 'Attraction', 'Attraction', 'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Retail', 
'Restaurant', 'Retail', 'Park', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Attraction', 'Restaurant', 'Sports', 'Restaurant', 'Bar', 'Park', 'Retail', 'Bar', 'Bar', 'Attraction', 'Bar', 'Restaurant', 
'Trail', 'Retail', 'Park', 'Restaurant', 'Park', 'Attraction', 'Retail', 'Retail', 'Retail', 'Retail', 'Retail', 'Bar', 'Retail', 'Attraction', 'Attraction', 'Restaurant', 'Retail', 'Attraction', 'Retail', 'Retail', 
'Retail', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Restaurant', 'Bar', 'Restaurant', 'Garden', 'Attraction', 'Attraction', 'Restaurant', 'Restaurant', 'Bar', 'Restaurant', 'Bar']

sanfran_venues['Broad'] = broad_cat #add a column to our existing dataframe
sanfran_venues.head()



Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Broad
227,North Beach,37.801175,-122.409002,15 Romolo,37.798134,-122.40638,Cocktail Bar,Bar
526,Golden Gate Park,37.769368,-122.482184,22nd & Irving Market,37.763323,-122.480479,Grocery Store,Retail
287,North Beach,37.801175,-122.409002,343 Sansome Roof Garden,37.79365,-122.401489,Garden,Attraction
380,Haight Ashbury,37.770015,-122.446952,4505 Burgers & BBQ,37.776125,-122.438142,BBQ Joint,Restaurant
98,Nob Hill,37.794479,-122.415592,Akiko’s Restaurant & Sushi Bar,37.790623,-122.404657,Sushi Restaurant,Restaurant


#### Showing a view with venues and their categories

In [41]:

# create map
map_clusters2 = folium.Map(location=[latitude, longitude], zoom_start=13)

map_clusters2.choropleth(
    geo_data=sanfran_geo,
    key_on='feature.properties.name',
    fill_color='BuGn', 
    fill_opacity=0.2, 
    line_opacity=1,
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map for neighborhoods, bordered in black
markers_colors = []
for lat, lon, poi, cluster in zip(sanfran_merged['Latitude'], sanfran_merged['Longitude'], sanfran_merged['Neighborhood'], sanfran_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters2)

#add markers to the map for landmarks
markers_colors = []
for lat, lon, poi, cluster in zip(landmarks_merged['Latitude'], landmarks_merged['Longitude'], landmarks_merged['Landmark'], landmarks_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters2)
    
#add blue markers to the map for restaurants
df_rest = sanfran_venues.loc[sanfran_venues['Broad'] == 'Restaurant']

for lat, lon, sanfran in zip(df_rest['Venue Latitude'], df_rest['Venue Longitude'], 
                                 df_rest['Venue']):
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_clusters2)

#add red markers to the map for retail stores
df_ret = sanfran_venues.loc[sanfran_venues['Broad'] == 'Retail']

for lat, lon, sanfran in zip(df_ret['Venue Latitude'], df_ret['Venue Longitude'], 
                                 df_ret['Venue']):
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_clusters2)

#add black markers to the map for Attractions
df_att = sanfran_venues.loc[sanfran_venues['Broad'] == 'Attraction']
    
for lat, lon, sanfran in zip(df_att['Venue Latitude'], df_att['Venue Longitude'], 
                                 df_att['Venue']):
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7).add_to(map_clusters2)
    
df_bar = sanfran_venues.loc[sanfran_venues['Broad'] == 'Bar']
    
#add yellow markers to the map for bars
for lat, lon, sanfran in zip(df_bar['Venue Latitude'], df_bar['Venue Longitude'], 
                                 df_bar['Venue']):
    folium.Circle(
        [lat, lon],
        radius = 25, 
        popup = folium.Popup(sanfran),
        color = 'yellow',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7
        ).add_to(map_clusters2)
    
#Though using a choropleth map, I don't need a "heatmap" or "density" legend which is what would be created by default
#so, using the fact that folium is html, we create a legend as an html div, styling it appropriately and using bootstrap
#icons. This is then added in as a chiled element of the map and appears in the position which we provided

legend_html = '<div style="position: fixed; bottom: 500px; left: 50px; width: 150px; height: 150px; border:2px solid grey; z-index:9999; font-size:14px; background-color:#AAD3DF;">&nbsp; Legend \
<br> &nbsp; Restaurant &nbsp; <i class="fa fa-circle fa-2x" style="color:blue"></i>\
<br> &nbsp; Retail &nbsp; <i class="fa fa-circle fa-2x" style="color:red"></i>\
<br> &nbsp; Attraction &nbsp; <i class="fa fa-circle fa-2x" style="color:black"></i>\
<br> &nbsp; Bar &nbsp; <i class="fa fa-circle fa-2x" style="color:yellow"></i></div>'

#add html to map
map_clusters2.get_root().html.add_child(folium.Element(legend_html))
map_clusters2



## Results <a name="results"></a>

Walking Tours
Day 1 – Golden Gate Park
1.	Begin the day on Irving street where there are over 25 restaurants within walking distance of Golden Gate Park
2.	After breakfast, stroll through Golden Gate Park, for those interested there is disc golf just to the north of JFK drive and west of Crossover drive.
3.	If you’re not into disc golf, you can head to the east of Crossover Drive and take in sights such as Prayerbrook Falls, Huntington Falls and the Chinese Pavilion
4.	For lunch continue east and drop in on Sam’s ChowderMobile or swing by the Jack Hirose Tea House
5.	Following lunch, head just a few feet east to the de Young Museum where you’ll find the Hamon Observation Deck, sculpture garden, and the Pool of Enchantment.
6.	As the day winds down there are two options, you can either walk back to Irving street for shopping and dining, or if you’re in the mood for something a bit stronger, you can head east to Haight-Ashbury and visit one of the many bars.

Day 2 – Haight Ashbury
1.	Day two is all about shopping. Beginning at the northeast of the district head down Castro street where you’ll find over 20 restaurants and shops.
2.	Next, head down Haight street to the center of the hippie counterculture. Once you cross Central Avenue you’ll be surrounded by over 25 more restaurants and shops sure to be full of flower power.
3.	For the more adventurous, you can visit Buena Vista Park Summit and Corona Heights summit which are 173 and 158m tall respectively.
4.	As the day winds down, you can stay on Haight Street with their many restaurants and bars, or continue back to Castro street which has twice as many bars.
 
 
Day 3 – Chinatown/Nob Hill/Russian Hill/North Beach
1.	Begin the day in Chinatown where there are over 25 restaurants to choose from
2.	Check out the 343 Sansome Roof Garden
3.	Head west to Nob Hill where you’ll find the Labyrinth at Grace Cathedral, the Spoke Art Gallery and the San Francisco Cable Car Museum
4.	From here, it’s a short walk north Russian Hill where you can see the World Famous Lombard street, the curviest road in the world.
5.	Now head to North Beach,  near the intersection of Columbus Avenue and Broadway are over 15 restaurants to choose from
6.	Heading North to the bay you can see the Alcatraz Overlook, the Sea Lions at Pier 39, and visit the Exploratorium
7.	Go back to the south of Broadway on Columbus avenue where there are 5 bars and more than 15 restaurants to choose from.

Day 4 – Presidio
1.	There aren’t a lot of restaurant choices in the Presidio, so start off at Starbucks at the northeast corner of the district
2.	From here, head down Presidio Boulevard where you’ll find attractions such as Free Shakespeare in the Park and the Walt Disney Family Museum.
3.	Stay in the area for lunch, the Transit Café and Picnic in the Presidio are your best options and both are near the Walt Disney museum.
4.	Go Northwest towards the bay, here is where you can see the spectacular view of the Golden Gate Bridge from the Golden Gate Overlook
5.	Visit Fort Point National Historic Site just below the bridge
6.	Walk just south to the Fort Point Lighthouse.
7.	From here, it’s a short walk to Crissy Field Overlook which borders the West Bluff Picnic Area and Beach
8.	Finally, you’ll need to catch an Uber or Lyft or Taxi to go back to one of the other areas you’ve visited and go to that restaurant or bar you missed, there’s not much else around the Presidio.


## Discussion <a name="discussion"></a>

I built this walking tour guide as I had seen that San Francisco was named the 2nd most walkable city in the US behind New York. And while that may be the case, I’m rather disappointed in the number of venues that were located around the areas with the most landmarks. Typically, in areas with high tourist traffic, you’ll also have a large collection of restaurants, attractions, retail, and bars, but that doesn’t seem to be the case in San Francisco. A reason for this it seems is because it’s the 2nd most densely populated city in the US at only 46.9 square miles. When you take into account that the Presidio is 2.347 square miles, Golden Gate Park is 1.58 square miles, and Lake Merced Park is 1.016 square miles, you’re left with 883,000 people living in 42 square miles, or 21,000 people per square mile. Compare this to Los Angeles, the most populated city in California, which has a population of 4 million within 469 square miles. Nearly 5 times as many people, but 10 times the area, giving it a population density of only 8,528 people per square mile. Once this is considered it can be easy to see why there are such limited choices; there’s no room to put anything further.

It was much more difficult to make these tours than I thought it would be. For starters, while everything is within a 1 mile radius of the coordinates that I got from Nominatim, I still think it’d be better to either bike or drive to a lot of these attractions. Again with Golden Gate Park, all the restaurants and stores are one block over from the street which borders the park to the south. There are also groups of shops and restaurants 4 blocks to the north, but the only attraction there is a Nursery, and for my taste, that’s not an attraction.
    
I was also surprised by the most common venue list. Chinatown’s 5 most common venues are Pizza Place, Coffee Shop, Boutique, Cocktail Bar, and Sushi Restaurant, none of which I identify with being something I’d expect to see in Chinatown. If I were to run this in more detail in the future I think I’d like to see the review scores per category and rank by that as I believe it would turn out differently for several of the districts. 
    
I also question the Foursquare data somewhat. While I’m not completely certain how the k-means clustering decides how to cluster, nor how the geocoding decides what the coordinates of any given neighborhood are, I find it strange that some clusters appear in areas where there are no landmarks nearby from the given landmark data. I also find it questionable at the number of venues found. Yes, it was limited to 100, but there were a lot of duplicates amongst the districts which are near each other. Even given that, there are some things that appear on the map if you zoom in all the way that were not picked up in the Foursquare data.


## Conclusion <a name="conclusion"></a>

In conclusion, it never occurred to me just how small San Francisco was. I’ve been to Los Angeles and it feels like as your driving through that it takes hours to get from one side to the other (which it probably does if there’s traffic). But San Francisco is only 8 miles across at its widest point. With San Francisco being the 2nd most walkable city I feel like these walking tours are good as a general guide for what there is to see/do in San Francisco in a short period of time. And with only 68 days a year with rainfall on average, there’s only an 18% chance on any given day that your walking tour will be rained out. However, with what I feel is a lack of a good portion of tourist attractions, if I were to go it would probably only be for a couple of days; see the Golden Gate bridge, Lombard Street, the Painted Ladies, and I’d be good to go. 