# Analysis of San Francisco Neighborhoods to open an Indian Restaurant

### Introduction

In this analysis, I have attempted to analyze the best possible location to open an Indian restaurant in San Francisco. California has the highest number of Indian-Americans of which San Francisco has the highest number of Indians living in the city. Thus it is an interesting prospect to open an Indian restaurant keeping in mind the high number of Indian population living there. I have made use of the Foursquare API to explore neighborhoods in San Francisco City. I used the explore function based on a search criteria to search Indian restaurants in the city. Finally, I used the Folium library to visualize the neighborhoods in San Francisco City and their emerging clusters. This should help understand whether it would be a good idea to set up in a high or low concentrated place with Indian restaurants nearby other Indian restaurants keeping in mind competition from similar cuisine restaurant. Simultaneously, I have made use of San Francisco Crimes data obtained from the various Police Departments within the city and converted the addresses into their equivalent latitude and longitude values. The subsequent sections describe in detail my work as below.

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Let's download and import the data on police department incidents using pandas read_csv() method.

Download the dataset and read it into a pandas dataframe:

In [None]:
df_incidents = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv')

print('Dataset downloaded and read into a pandas dataframe!')

Let's take a look at the first five items in our dataset.

In [None]:
df_incidents.head()

So each row consists of 13 features:

        1. IncidntNum: Incident Number
        2. Category: Category of crime or incident
        3. Descript: Description of the crime or incident
        4. DayOfWeek: The day of week on which the incident occurred
        5. Date: The Date on which the incident occurred
        6. Time: The time of day on which the incident occurred
        7. PdDistrict: The police department district
        8. Resolution: The resolution of the crime in terms whether the perpetrator was arrested or not
        9. Address: The closest address to where the incident took place
        10. X: The longitude value of the crime location
        11. Y: The latitude value of the crime location
        12. Location: A tuple of the latitude and the longitude values, and finally
        13. PdId: The police department ID

Let's find out how many entries there are in our dataset.

In [None]:
df_incidents.shape

So the dataframe consists of 150,500 crimes, which took place in the year 2016.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(10,12)) #Increasing the figure size to 10 by 12
plt.plot(df_incidents.groupby(['PdDistrict']).count())
#plt.show()
plt.savefig('Desktop/crimes_by_pd.png')

In [None]:
df_incidents['PdDistrict'].unique()

In [None]:
#Counting the number of nan rows in PdDistrict column
df_incidents['PdDistrict'].isna().sum()

We see there is just 1 row with nan value for PdDistricts. So we remove it from our dataset

In [None]:
df_incident = df_incidents.dropna(subset = ['PdDistrict'], inplace = True)
df_incidents.shape

In [None]:
df_day = df_incidents.groupby(['DayOfWeek','PdDistrict'], sort=True).size().reset_index(name='Count')
print (df_day)

In [None]:
df_day.boxplot('Count','PdDistrict',rot = 30, figsize=(10,12))
plt.savefig('Desktop/boxplot_PdDistrict.png')
df_day.boxplot('Count', 'DayOfWeek', rot = 30, figsize=(10,12))
plt.savefig('Desktop/boxplot_DayOfWeek.png')

In [None]:
pd_loc = df_incidents.groupby('PdDistrict', as_index=False)['X'].mean()
pd_y = df_incidents.groupby('PdDistrict', as_index=False)['Y'].mean()

In [None]:
pd_loc

In [None]:
pd_y

In [None]:
pd_loc['Y'] = pd_y['Y']
pd_loc.head()

Let's check the data grouped by the 'Resolution' of the crime to see how many crimes were committed under each category

In [None]:
df_incidents.groupby(['Resolution']).count()

We see that 107780 cases with resolution as 'None'. So we decide to only consider cases in this dataset for which a judgment was rendered as 'Arrest, Booked' to reduce computational cost. 

In [None]:
df = df_incidents[df_incidents['Resolution'] == "ARREST, BOOKED"]

df = df.loc[:, ['PdDistrict', 'X', 'Y']]

In [None]:
df.shape

#### Use geopy library to get the latitude and longitude values of San Francisco City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent sf_explorer, as shown below.

In [None]:
address = 'San Francisco, US'

geolocator = Nominatim(user_agent="sf_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco City are {}, {}.'.format(latitude, longitude))

Create a map of San Francisco with crime locations superimposed on top.

In [None]:
# create map of San Francisco using latitude and longitude values
map_sf = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, pdd in zip(df['Y'], df['X'], df['PdDistrict']):
    label = '{}'.format(pdd)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.4,
        parse_html=False).add_to(map_sf)  
    
#add markers of police districts to this
for lat, lng, pdd in zip(pd_loc['Y'], pd_loc['X'], pd_loc['PdDistrict']):
    label = '{}'.format(pdd)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#03090e',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)
    
map_sf

### Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

##### Define Foursquare Credentials and Version

In [None]:
CLIENT_ID = 'P2K5NE5IKNOXSQLD25PSE3C0UL42FTA1OURHPJYGM5PH4V3N' # your Foursquare ID
CLIENT_SECRET = 'DNCD4LYOOS5LGQSCFYHFDJUHAUCLRRVENODCY1BG4OD2UP4T' # your Foursquare Secret
VERSION = '20200605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

##### Let's explore the first Police District in our dataframe.

In [None]:
df.loc[0, 'PdDistrict']

Get the district's latitude and longitude values.

In [None]:
df_latitude = df.loc[0, 'Y'] # latitude value
df_longitude = df.loc[0, 'X'] # longitude value

df_name = df.loc[0, 'PdDistrict'] # Police District name

print('Latitude and longitude values of {} are {}, {}.'.format(df_name, 
                                                               df_latitude, 
                                                               df_longitude))

##### Let's search for Indian Restaurants in SF

In [None]:
#Create the url using the appropriate search criteria. 
search_query = 'Indian'

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 10000 # Define radius of search

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    search_query,
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

Send the GET request and examine the results

In [None]:
results = requests.get(url).json()
results

##### Get relevant part of JSON and transform it into a pandas dataframe

In [None]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

In [None]:
dataframe.shape

##### Define information of interest and filter dataframe

In [None]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered = dataframe_filtered[dataframe_filtered['categories'] == "Indian Restaurant"]

dataframe_filtered

##### Let's visualize the Indian restaurants that are nearby

In [None]:
ind_rest_map = folium.Map(location=[latitude, longitude], zoom_start=12) # generate map centred around the Conrad Hotel

# add a blue circle marker to represent the lat and long of San Francisco
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='blue',
    popup='San Francisco',
    fill = True,
    fill_color = 'blue',
    fill_opacity = 0.6
).add_to(ind_rest_map)

# add the Indian restaurants as red circle markers
for lat, lng, name in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=name,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(ind_rest_map)

# display map
ind_rest_map

In [None]:
ind_rest_map = folium.Map(location=[latitude, longitude], zoom_start=12) # generate map centred around the Conrad Hotel

# add a blue circle marker to represent the lat and long of San Francisco
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='yellow',
    popup='San Francisco',
    fill = True,
    fill_color = 'yellow',
    fill_opacity = 0.6
).add_to(ind_rest_map)

# add markers to map
for lat, lng, pdd in zip(df['Y'], df['X'], df['PdDistrict']):
    label = '{}'.format(pdd)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.4,
        parse_html=False).add_to(ind_rest_map)  
    
#add markers of police districts to this
for lat, lng, pdd in zip(pd_loc['Y'], pd_loc['X'], pd_loc['PdDistrict']):
    label = '{}'.format(pdd)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#03090e',
        fill_opacity=0.7,
        parse_html=False).add_to(ind_rest_map)

# add the Indian restaurants as red circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=8,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(ind_rest_map)  
    
# display map
ind_rest_map