<a href="https://colab.research.google.com/github/kozinofsky/Coursera_Capstone/blob/master/The_Battle_of_Neighborhoods_koz.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IBM Capstone Project - The Battle of the Neighborhoods (Week 2)

## Introduction: Business Problem
The company X asked my team to do research about finding the best location for their new Greek restaurant in San Francisco. They have three main requirements in the appropriate order, which should be taken to the account during the research:

1.  **Safety**. The location should be safety, because X always takes care about their employees and customers.
2.   **Easy competition**. The location should not have a lot of another existing Greek restaurants.
3. **High popularity**. The place should have high attendance to attract as much customers as it possible.



## Data Section:
To satisfy the first requirement - **safety**, I am going to use data from official police report which is already in convenient **csv** format. This will let me get the number of crimes for each Neighborhood.

In [5]:
# file name is "Police_Department_Incindents_-_Previous_Year__2016_.csv"
import json
from google.colab import files
uploaded = files.upload()

Saving Police_Department_Incidents_-_Previous_Year__2016_.csv to Police_Department_Incidents_-_Previous_Year__2016_.csv


For the second requirement - **easy competition**, I will use **Foursquare API** to get all the Greek restaurant in safest Neighborhood.
For that I will declire the following constants: **CLIENT_ID**, **CLIENT_SECRET**.
These I get when created an account on https://foursquare.com/

Let's get data from uploaded file with crime statistics

In [7]:
import pandas as pd

df_san_crimes = pd.read_csv('Police_Department_Incidents_-_Previous_Year__2016_.csv')

df_san_crimes.head()



Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


Let's add column "Count" and set value "1" for all the rows. We will use it to count number of crimes for each Neighborhood in future

In [8]:
df_san_crimes['Count'] = 1
df_san_crimes.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId,Count
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120,1
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168,1
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010,1
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000,1
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000,1


Let's rename columns to get it more clear to analize the data

In [0]:
df_san_crimes.rename(columns={'PdDistrict':'Neighborhood'}, inplace = True)

And lets group crimes for each Neighborhood and get the total number of crimes. Also let's get rid of unnesessary information (meaning excess columns 

In [10]:
df_total_district = df_san_crimes.groupby('Neighborhood', axis=0).sum()
df_total_district.reset_index(inplace=True)
df_total_district.drop(['IncidntNum', 'X', 'Y', 'PdId'], axis=1, inplace=True)
df_total_district.sort_values(axis=0, ascending=True, inplace=True, by='Count')
df_total_district.reset_index(inplace=True)
df_total_district.drop(['index'], axis=1, inplace=True)
df_total_district
#df_total_district.columns

Unnamed: 0,Neighborhood,Count
0,PARK,8699
1,RICHMOND,8922
2,TENDERLOIN,9942
3,TARAVAL,11325
4,INGLESIDE,11594
5,BAYVIEW,14303
6,CENTRAL,17666
7,MISSION,19503
8,NORTHERN,20100
9,SOUTHERN,28445


To separate all Neighborhoods into the 3 group with low, medium and high crime rate, I need to find **max** and **min** crime rate:

In [11]:
max_crime = df_total_district.max(axis=0, numeric_only=True)[0]
print("Max crime rate = " + str(max_crime))
min_crime = df_total_district.min(axis=0, numeric_only=True)[0]
print("Minimum crime rate = " + str(min_crime))
delta = (max_crime-min_crime)/3
print("Delta = " + str(delta).replace('.0', ''))
print("The safest Neighborhoods have crime rate layed between " + str(min_crime) + " and " + str(min_crime+delta).replace('.0', ''))

Max crime rate = 28445
Minimum crime rate = 8699
Delta = 6582
The safest Neighborhoods have crime rate layed between 8699 and 15281


We are interested in the safest Neighborhoods, so we will leave only Neigborhoods with crime rate between 6582 to 15281

In [12]:
df_safe_neighborhoods = df_total_district[df_total_district['Count']<15281]
print(df_safe_neighborhoods)

  Neighborhood  Count
0         PARK   8699
1     RICHMOND   8922
2   TENDERLOIN   9942
3      TARAVAL  11325
4    INGLESIDE  11594
5      BAYVIEW  14303


Let's visualize the obtaining data by show the folium map. We will use df_san_crimes, because it contains latitude and longitude for each crime.

But before creating map, we need to get coordinates of the berders for each Neighborhood, because I am going to create choropleth map, which I guess best fit visualizing of crimes intensity.
Let's load file with geographical data.

In [6]:
uploaded = files.upload()

Saving san-francisco.geojson to san-francisco.geojson


In [159]:
import folium

san_location = [37.775420706711, -122.403404791479] # here is a latitude and longitude for the city San Francisco
san_geo = 'san-francisco.geojson' # copy data from geojson file to the san_geo variable

san_map = folium.Map(location=san_location, zoom_start=13)

san_map.choropleth(data=df_total_district,
                    columns=['Neighborhood', 'Count'],
                    key_on='feature.properties.DISTRICT',
                    fill_color='YlOrRd',
                    fill_opacity=0.7,
                    line_opacity=0.2,
                    geo_data=san_geo,
                    legend_name='Crime Rate in San Francisco'
                 )
san_map



Let's do the border of each Neighborhood more visible

In [15]:
!pip install geopandas

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/83/c5/3cf9cdc39a6f2552922f79915f36b45a95b71fd343cfc51170a5b6ddb6e8/geopandas-0.7.0-py2.py3-none-any.whl (928kB)
[K     |████████████████████████████████| 931kB 2.8MB/s 
[?25hCollecting pyproj>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/ce/37/705ee471f71130d4ceee41bbcb06f3b52175cb89273cbb5755ed5e6374e0/pyproj-2.6.0-cp36-cp36m-manylinux2010_x86_64.whl (10.4MB)
[K     |████████████████████████████████| 10.4MB 44.2MB/s 
[?25hCollecting fiona
[?25l  Downloading https://files.pythonhosted.org/packages/ec/20/4e63bc5c6e62df889297b382c3ccd4a7a488b00946aaaf81a118158c6f09/Fiona-1.8.13.post1-cp36-cp36m-manylinux1_x86_64.whl (14.7MB)
[K     |████████████████████████████████| 14.7MB 265kB/s 
Collecting click-plugins>=1.0
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Collecting 

In [160]:
import geopandas

gdf = geopandas.read_file(san_geo)

folium.GeoJson(
    gdf,
).add_to(san_map)

san_map


Let's get centroids for each of Neighborhood. This will help us in future research

In [38]:
gdf['centroid_lon'] = gdf['geometry'].centroid.x
gdf['centroid_lat'] = gdf['geometry'].centroid.y
centroids_df = gdf
centroids_df.drop(['OBJECTID', 'COMPANY', 'geometry'], axis=1, inplace=True)
#centroids_df.reset_index(inplace=True)
centroids_df.rename(columns={'DISTRICT':'Neighborhood'}, inplace=True)

centroids_df

Unnamed: 0,Neighborhood,centroid_lon,centroid_lat
0,CENTRAL,-122.409866,37.798491
1,SOUTHERN,-122.391915,37.79226
2,BAYVIEW,-122.389887,37.737144
3,MISSION,-122.422636,37.757564
4,PARK,-122.448333,37.764442
5,RICHMOND,-122.478983,37.778167
6,INGLESIDE,-122.431617,37.727905
7,TARAVAL,-122.483012,37.736183
8,NORTHERN,-122.430508,37.791176
9,TENDERLOIN,-122.412554,37.78389


Now add centroids to the map for visualization purpose

In [162]:
for lat, lng, label in zip(centroids_df.centroid_lat, centroids_df.centroid_lon, centroids_df.Neighborhood):
  folium.Marker(
      [lat, lng-0.0025],
      icon=folium.DivIcon(label)
  ).add_to(san_map)
san_map
  

Now I will start to work on the second part of the research, specifically to find out what neighborhoods (from the safest one) has low amoung of existing Greek restaurants. To get this done, first I need to get centroids for each Neighborhood in dataFrame **df_safe_neighborhoods**

In [50]:
safe_dis_df = pd.merge(df_safe_neighborhoods, centroids_df, on='Neighborhood')

safe_dis_df


Unnamed: 0,Neighborhood,Count,centroid_lon,centroid_lat
0,PARK,8699,-122.448333,37.764442
1,RICHMOND,8922,-122.478983,37.778167
2,TENDERLOIN,9942,-122.412554,37.78389
3,TARAVAL,11325,-122.483012,37.736183
4,INGLESIDE,11594,-122.431617,37.727905
5,BAYVIEW,14303,-122.389887,37.737144


Now we have all the needed data to do a request to Foursquire API.
We have:
CLIENT_ID (which is declared in hidden cell due to the sensitive information)
CLIENT_SECRET

The only variables we need to set is "radius" (radius of search from centroids), "VERSION" (usually it it today date in format '20120609', means 'YearMonthDay'), coordinates of each centroid and most important the final request.

after that we just save the result in new variable, we will do it for each Neighborhood in the table above and we will join all the results to one dataFrame for easier future research. Let's do it!

In [102]:
import requests
from pandas import json_normalize

VERSION = '20120609'
radius = 1500
LIMIT = 100 # because my Foursquaire account it a free account, I am limited in amount of requests I can do per day, let's set the limit for each request as no more than 100

centroids_lat_long = [0,0,0,0,0,0]

for i in range(0,6):
    centroids_lat_long[i] = [safe_dis_df['centroid_lat'][i], safe_dis_df['centroid_lon'][i]]

centroids_lat_long

[[37.76444227049715, -122.44833303758082],
 [37.77816734557196, -122.47898343791394],
 [37.783889676868924, -122.41255405465894],
 [37.736183115189256, -122.4830118963146],
 [37.72790488613459, -122.43161665269517],
 [37.73714403652807, -122.3898868766557]]

Now, when we have array of centroids coordinates, let's do the main loop with requests

In [153]:
#the category for "Greek restaurant" is 4bf58dd8d48988d10e941735 this data was taken from foursquare website: https://developer.foursquare.com/docs/build-with-foursquare/categories/

results_greek = [0,0,0,0,0,0]
venues = [0,0,0,0,0,0]

df_result = pd.DataFrame()
#print(df_result)
first_time = True


for i in range(0,6):
    url = 'https://api.foursquare.com/v2/venues/search?categoryId=4bf58dd8d48988d10e941735&client_id={}&client_secret={}&ll={},{}&v={}&&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, centroids_lat_long[i][0], centroids_lat_long[i][1], VERSION,  radius, LIMIT)
    results_greek[i] = requests.get(url).json()
    venues[i] = results_greek[i]['response']['venues']
    df_greek_rest = pd.json_normalize(venues[i])
    df_greek_rest['Neighborhood'] = safe_dis_df['Neighborhood'][i] 
    print(safe_dis_df['Neighborhood'][i])
    print(df_greek_rest)
    if first_time:
        columns_res = df_greek_rest.columns
        df_result = df_result.reindex(df_result.columns.union(columns_res), axis=1)
        first_time = False
    df_result = df_result.append(df_greek_rest, ignore_index=True)
    #print(df_result)
    #if (i == 2):
      #break
    
df_result


PARK
                         id        name  ... venuePage.id  Neighborhood
0  5748e1f0498e1c0c6d214098      Souvla  ...          NaN          PARK
1  4a10e48af964a52007771fe3  Park Gyros  ...    486224657          PARK
2  4d40d13acb84b60c78c489ab     Palmyra  ...    100010971          PARK

[3 rows x 38 columns]
RICHMOND
Empty DataFrame
Columns: [Neighborhood]
Index: []
TENDERLOIN
                          id  ... Neighborhood
0   5b4e4222ea1e44002ce09019  ...   TENDERLOIN
1   4aab1d37f964a520fc5820e3  ...   TENDERLOIN
2   501e259ee4b067f6ad3265f2  ...   TENDERLOIN
3   4f3203e819833175d609e567  ...   TENDERLOIN
4   49fb9750f964a5205f6e1fe3  ...   TENDERLOIN
5   4a79ddb7f964a520d6e71fe3  ...   TENDERLOIN
6   5904e16c396de02ff8c46d88  ...   TENDERLOIN
7   44e36f32f964a52041371fe3  ...   TENDERLOIN
8   4a947cccf964a520c42120e3  ...   TENDERLOIN
9   533cc33f498e604ebad95d8c  ...   TENDERLOIN
10  49eb82e8f964a520eb661fe3  ...   TENDERLOIN
11  5428707b498e49927eea28b9  ...   TENDERLOIN
12 

Unnamed: 0,id,name,categories,verified,referralId,venueChains,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,stats.tipCount,stats.usersCount,stats.checkinsCount,stats.visitsCount,beenHere.count,beenHere.lastCheckinExpiredAt,beenHere.marked,beenHere.unconfirmedCount,hereNow.count,hereNow.summary,hereNow.groups,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id,Neighborhood
0,5748e1f0498e1c0c6d214098,Souvla,"[{'id': '52e81612bcbc57f1066b79f3', 'name': 'S...",0.0,v-1588443483,[],0.0,531 Divisadero St,btwn Hayes & Fell St,37.774577,-122.437809,"[{'label': 'display', 'lat': 37.77457655200334...",1459.0,94117.0,US,San Francisco,CA,United States,"[531 Divisadero St (btwn Hayes & Fell St), San...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,PARK
1,4a10e48af964a52007771fe3,Park Gyros,"[{'id': '4bf58dd8d48988d1c0941735', 'name': 'M...",1.0,v-1588443483,[],0.0,1201 9th Ave,Lincoln Way,37.76579,-122.466431,"[{'label': 'display', 'lat': 37.76578986565052...",1599.0,94122.0,US,San Francisco,CA,United States,"[1201 9th Ave (Lincoln Way), San Francisco, CA...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],57756.0,https://www.grubhub.com/restaurant/park-gyros-...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,486224657.0,PARK
2,4d40d13acb84b60c78c489ab,Palmyra,"[{'id': '4bf58dd8d48988d1c0941735', 'name': 'M...",1.0,v-1588443483,[],0.0,700 Haight St,Pierce St,37.771749,-122.433825,"[{'label': 'display', 'lat': 37.77174872018897...",1513.0,94117.0,US,San Francisco,CA,United States,"[700 Haight St (Pierce St), San Francisco, CA ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],268063.0,https://www.grubhub.com/restaurant/palmyra-700...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,100010971.0,PARK
3,5b4e4222ea1e44002ce09019,The Argentum Project,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",0.0,v-1588443483,[],0.0,47 6th St,,37.781688,-122.409234,"[{'label': 'display', 'lat': 37.781688, 'lng':...",381.0,94103.0,US,San Francisco,CA,United States,"[47 6th St, San Francisco, CA 94103, United St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],1048383.0,https://www.grubhub.com/restaurant/the-argentu...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,TENDERLOIN
4,4aab1d37f964a520fc5820e3,Estia Greek Restaurant,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",0.0,v-1588443483,[],0.0,1224 Grant Ave,Columbuse Ave,37.798302,-122.407221,"[{'label': 'display', 'lat': 37.798302, 'lng':...",1671.0,94133.0,US,San Francisco,CA,United States,"[1224 Grant Ave (Columbuse Ave), San Francisco...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,TENDERLOIN
5,501e259ee4b067f6ad3265f2,Gyros On Wheels,"[{'id': '4bf58dd8d48988d1cb941735', 'name': 'F...",0.0,v-1588443483,[],0.0,,,37.769045,-122.414337,"[{'label': 'display', 'lat': 37.76904478501817...",1659.0,94103.0,US,San Francisco,CA,United States,"[San Francisco, CA 94103, United States]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,TENDERLOIN
6,4f3203e819833175d609e567,O'Mythos Greek Tavern,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",0.0,v-1588443483,[],0.0,2424 Van Ness Ave,,37.797843,-122.423728,"[{'label': 'display', 'lat': 37.797843, 'lng':...",1838.0,94109.0,US,San Francisco,CA,United States,"[2424 Van Ness Ave, San Francisco, CA 94109, U...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,TENDERLOIN
7,49fb9750f964a5205f6e1fe3,Eden's Mediterranean Turkish & Greek Restaurant,"[{'id': '4bf58dd8d48988d1c0941735', 'name': 'M...",0.0,v-1588443483,[],0.0,552 Jones St,at Geary St,37.786612,-122.413159,"[{'label': 'display', 'lat': 37.78661222741304...",307.0,94102.0,US,San Francisco,CA,United States,"[552 Jones St (at Geary St), San Francisco, CA...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,TENDERLOIN
8,4a79ddb7f964a520d6e71fe3,Krivaar Cafe,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",0.0,v-1588443483,[],0.0,475 Pine St,,37.791608,-122.403672,"[{'label': 'display', 'lat': 37.79160800000000...",1161.0,94104.0,US,San Francisco,CA,United States,"[475 Pine St, San Francisco, CA 94104, United ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,TENDERLOIN
9,5904e16c396de02ff8c46d88,Troy,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",0.0,v-1588443483,[],0.0,2226 Polk St,,37.797329,-122.42207,"[{'label': 'display', 'lat': 37.79732921841157...",1714.0,94109.0,US,San Francisco,CA,United States,"[2226 Polk St, San Francisco, CA 94109, United...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,,,,,,,TENDERLOIN


We did the request six times for each Neighborhood and appended each request result to the previous one. In the result we have go a final data frame with all Greek restaurants in these 6 Neighborhoods. Using the verify_integrity=True argument I excluded an opportunity for duplicate during 'append' command

Now let's clear little bit our dataFrame

In [154]:
df_result.drop(['id','categories','verified','referralId','venueChains','hasPerk','location.address','location.labeledLatLngs','location.distance','location.postalCode','location.cc','location.city','location.state','location.country','location.formattedAddress','stats.tipCount','stats.usersCount','stats.checkinsCount','stats.visitsCount','delivery.id','delivery.url','delivery.provider.name','delivery.provider.icon.prefix','delivery.provider.icon.sizes','delivery.provider.icon.name','beenHere.count','beenHere.lastCheckinExpiredAt',	'beenHere.marked'	,'beenHere.unconfirmedCount',	'hereNow.count',	'hereNow.summary',	'hereNow.groups',	'location.crossStreet',	'venuePage.id'],axis=1, inplace=True)
df_result.rename(columns={'name':'Restaurant Name', 'location.lat':'latitude', 'location.lng':'longitude'}, inplace=True)
#df_san_crimes.rename(columns={'PdDistrict':'Neighborhood'}, inplace = True)
df_result

Unnamed: 0,Restaurant Name,latitude,longitude,Neighborhood
0,Souvla,37.774577,-122.437809,PARK
1,Park Gyros,37.76579,-122.466431,PARK
2,Palmyra,37.771749,-122.433825,PARK
3,The Argentum Project,37.781688,-122.409234,TENDERLOIN
4,Estia Greek Restaurant,37.798302,-122.407221,TENDERLOIN
5,Gyros On Wheels,37.769045,-122.414337,TENDERLOIN
6,O'Mythos Greek Tavern,37.797843,-122.423728,TENDERLOIN
7,Eden's Mediterranean Turkish & Greek Restaurant,37.786612,-122.413159,TENDERLOIN
8,Krivaar Cafe,37.791608,-122.403672,TENDERLOIN
9,Troy,37.797329,-122.42207,TENDERLOIN


Let's put our restaurant to our map for vizualising

In [0]:
greek_map = san_map

In [163]:
for lat, lng, in zip(df_result['latitude'], df_result['longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        color='yellow',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        #fill_color='#3186cc',
        #fill_opacity=0.7,
        parse_html=False).add_to(greek_map)
    
greek_map

For the future research I need to exclude Neighborhood "Tenderloin" from my dataFrame. Let's do it!

In [172]:
safe_dis_df.drop(safe_dis_df[safe_dis_df.Neighborhood == 'TENDERLOIN'].index, inplace=True)
safe_dis_df.reset_index(inplace=True)
safe_dis_df

Unnamed: 0,index,Neighborhood,Count,centroid_lon,centroid_lat
0,0,PARK,8699,-122.448333,37.764442
1,1,RICHMOND,8922,-122.478983,37.778167
2,3,TARAVAL,11325,-122.483012,37.736183
3,4,INGLESIDE,11594,-122.431617,37.727905
4,5,BAYVIEW,14303,-122.389887,37.737144


Now we proceed to the last block of study - popularity.
This block is very close to the previous one.

All we need to do are requests to Foursquare API but with different category ID. Now we will use category id for all "food-related" 


In [254]:
#safe_dis_df.drop(['index'], axis=1, inplace=True)
# categoryID = 4d4b7105d754a06374d81259
safe_dis_df
print(safe_dis_df['centroid_lat'][0])

37.76444227049715


In [272]:
radius = 1500
limit = 250
#the category for all "food-related" venues is 4d4b7105d754a06374d81259 this data was taken from foursquare website: https://developer.foursquare.com/docs/build-with-foursquare/categories/

results_all = [0,0,0,0,0]
venues_new = [0,0,0,0,0]

df_result_final = pd.DataFrame()
#print(df_result)
first_time_new = True


for i in range(0,5):
    url = 'https://api.foursquare.com/v2/venues/search?categoryId=4bf58dd8d48988d14e941735&client_id={}&client_secret={}&ll={},{}&v={}&&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, safe_dis_df['centroid_lat'][i], safe_dis_df['centroid_lon'][i], VERSION,  radius, LIMIT)
    results_all[i] = requests.get(url).json()
    venues_new[i] = results_all[i]['response']['venues']
    #print(venues_new[i])
    df_all_rest = pd.json_normalize(venues_new[i])
    #print(df_all_rest.columns)
    df_all_rest['Neighborhood'] = safe_dis_df['Neighborhood'][i] 
    #print(df_all_rest.columns)
    #print(safe_dis_df['Neighborhood'][i])
    #print(df_greek_rest)
    if first_time_new:
        columns_res_new = df_all_rest.columns
        df_result_final = df_result_final.reindex(df_result_final.columns.union(columns_res_new), axis=1)
        first_time_new = False
    df_result_final = df_result_final.append(df_all_rest, ignore_index=True, verify_integrity=True)
    print(df_result_final)
    #if i == 1:
    #  break
    
    
df_result_final

                          id  ... Neighborhood
0   3fd66200f964a520f4f01ee3  ...         PARK
1   44646408f964a52026331fe3  ...         PARK
2   4c05b74b8f8fa5939e55f20d  ...         PARK
3   4a789bbbf964a52004e61fe3  ...         PARK
4   413e4b80f964a520501c1fe3  ...         PARK
5   50ab8755e4b0869279a80872  ...         PARK
6   56e5c70d498e302d2a32f323  ...         PARK
7   44e36c53f964a5203e371fe3  ...         PARK
8   52531f14498ea5d7b23ab29d  ...         PARK
9   53b37277498eeeca1f0ba730  ...         PARK
10  5952ab570d2be71b97207852  ...         PARK
11  55b31d83498e5ab2eee81351  ...         PARK
12  5111a558e4b0e457054d5a0a  ...         PARK
13  552d9a9a498ef6abfdae677a  ...         PARK
14  4bbd5b33078095218c93da91  ...         PARK
15  457afa25f964a520ec3e1fe3  ...         PARK
16  4aa08ca7f964a520064020e3  ...         PARK
17  4e87adb55c5c9a0ba0e01c10  ...         PARK
18  4b6a41f2f964a5208ecf2be3  ...         PARK
19  506b557be4b0c4c151cfe675  ...         PARK
20  511f3376e

Unnamed: 0,id,name,categories,verified,referralId,venueChains,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,stats.tipCount,stats.usersCount,stats.checkinsCount,stats.visitsCount,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,beenHere.count,beenHere.lastCheckinExpiredAt,beenHere.marked,beenHere.unconfirmedCount,hereNow.count,hereNow.summary,hereNow.groups,venuePage.id,Neighborhood,location.neighborhood
0,3fd66200f964a520f4f01ee3,Crepes on Cole,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",0.0,v-1588452137,[],0.0,100 Carl St,at Cole,37.765858,-122.450037,"[{'label': 'display', 'lat': 37.76585766840731...",217.0,94117,US,San Francisco,CA,United States,"[100 Carl St (at Cole), San Francisco, CA 9411...",0.0,0.0,0.0,0.0,745126,https://www.grubhub.com/restaurant/crepes-on-c...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,PARK,
1,44646408f964a52026331fe3,Nopa,"[{'id': '4bf58dd8d48988d157941735', 'name': 'N...",1.0,v-1588452137,[],0.0,560 Divisadero St,at Hayes St,37.774888,-122.437532,"[{'label': 'display', 'lat': 37.774888, 'lng':...",1501.0,94117,US,San Francisco,CA,United States,"[560 Divisadero St (at Hayes St), San Francisc...",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],50071951,PARK,
2,4c05b74b8f8fa5939e55f20d,Matt & Jess Kitchen,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",0.0,v-1588452137,[],0.0,288 Grand View Ave,,37.753622,-122.441885,"[{'label': 'display', 'lat': 37.753622, 'lng':...",1331.0,94114,US,San Francisco,CA,United States,"[288 Grand View Ave, San Francisco, CA 94114, ...",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,PARK,
3,4a789bbbf964a52004e61fe3,Starbelly,"[{'id': '4bf58dd8d48988d157941735', 'name': 'N...",1.0,v-1588452137,[],0.0,3583 16th St,at Market St.,37.764074,-122.432563,"[{'label': 'display', 'lat': 37.7640744, 'lng'...",1388.0,94114,US,San Francisco,CA,United States,"[3583 16th St (at Market St.), San Francisco, ...",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],105306618,PARK,
4,413e4b80f964a520501c1fe3,Harvey's,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",1.0,v-1588452137,[],0.0,500 Castro St,at 18th St,37.760829,-122.435116,"[{'label': 'display', 'lat': 37.76082906854015...",1230.0,94114,US,San Francisco,CA,United States,"[500 Castro St (at 18th St), San Francisco, CA...",0.0,0.0,0.0,0.0,1056497,https://www.grubhub.com/restaurant/harveys-500...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,PARK,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,4f56b36a6b7406c0ea010e9b,Radio Africa & Kitchen,"[{'id': '4bf58dd8d48988d1c8941735', 'name': 'A...",0.0,v-1588452138,[],0.0,4800 3rd St,Oakdale Ave,37.734826,-122.390764,"[{'label': 'display', 'lat': 37.73482577157674...",269.0,94124,US,San Francisco,CA,United States,"[4800 3rd St (Oakdale Ave), San Francisco, CA ...",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,BAYVIEW,
59,4a58ec3ff964a52030b81fe3,Bonanza Restaurant,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",0.0,v-1588452138,[],0.0,16 Toland St,Evans,37.746917,-122.396546,"[{'label': 'display', 'lat': 37.74691699999999...",1235.0,94124,US,San Francisco,CA,United States,"[16 Toland St (Evans), San Francisco, CA 94124...",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,BAYVIEW,
60,509c60cde4b078734779358d,Corner Cafe,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",0.0,v-1588452138,[],0.0,5800 3rd St,,37.725240,-122.394492,"[{'label': 'display', 'lat': 37.72524, 'lng': ...",1385.0,94124,US,San Francisco,CA,United States,"[5800 3rd St, San Francisco, CA 94124, United ...",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,BAYVIEW,
61,57ddf3c1498e666bd77106df,KitchenBeard's Pop-Up,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",0.0,v-1588452138,[],0.0,,,37.744169,-122.385910,"[{'label': 'display', 'lat': 37.744169, 'lng':...",856.0,,US,San Francisco,CA,United States,"[San Francisco, CA, United States]",0.0,0.0,0.0,0.0,,,,,,,0.0,0.0,0.0,0.0,0.0,Nobody here,[],,BAYVIEW,


In [273]:
df_result_final.drop(['id','categories','verified','referralId','venueChains','hasPerk','location.address','location.labeledLatLngs','location.distance','location.postalCode','location.cc','location.city','location.state','location.country','location.formattedAddress','stats.tipCount','stats.usersCount','stats.checkinsCount','stats.visitsCount','delivery.id','delivery.url','delivery.provider.name','delivery.provider.icon.prefix','delivery.provider.icon.sizes','delivery.provider.icon.name','beenHere.count','beenHere.lastCheckinExpiredAt',	'beenHere.marked'	,'beenHere.unconfirmedCount',	'hereNow.count',	'hereNow.summary',	'hereNow.groups',	'location.crossStreet'],axis=1, inplace=True)
df_result_final.rename(columns={'name':'Restaurant Name', 'location.lat':'latitude', 'location.lng':'longitude'}, inplace=True)
df_san_crimes.rename(columns={'PdDistrict':'Neighborhood'}, inplace = True)
df_result_final

Unnamed: 0,Restaurant Name,latitude,longitude,venuePage.id,Neighborhood,location.neighborhood
0,Crepes on Cole,37.765858,-122.450037,,PARK,
1,Nopa,37.774888,-122.437532,50071951,PARK,
2,Matt & Jess Kitchen,37.753622,-122.441885,,PARK,
3,Starbelly,37.764074,-122.432563,105306618,PARK,
4,Harvey's,37.760829,-122.435116,,PARK,
...,...,...,...,...,...,...
58,Radio Africa & Kitchen,37.734826,-122.390764,,BAYVIEW,
59,Bonanza Restaurant,37.746917,-122.396546,,BAYVIEW,
60,Corner Cafe,37.725240,-122.394492,,BAYVIEW,
61,KitchenBeard's Pop-Up,37.744169,-122.385910,,BAYVIEW,


In [274]:
#df_result_final.drop(['location.neighborhood'], axis=1, inplace = True)
df_result_final

Unnamed: 0,Restaurant Name,latitude,longitude,venuePage.id,Neighborhood,location.neighborhood
0,Crepes on Cole,37.765858,-122.450037,,PARK,
1,Nopa,37.774888,-122.437532,50071951,PARK,
2,Matt & Jess Kitchen,37.753622,-122.441885,,PARK,
3,Starbelly,37.764074,-122.432563,105306618,PARK,
4,Harvey's,37.760829,-122.435116,,PARK,
...,...,...,...,...,...,...
58,Radio Africa & Kitchen,37.734826,-122.390764,,BAYVIEW,
59,Bonanza Restaurant,37.746917,-122.396546,,BAYVIEW,
60,Corner Cafe,37.725240,-122.394492,,BAYVIEW,
61,KitchenBeard's Pop-Up,37.744169,-122.385910,,BAYVIEW,


Let's count number of restaurants for each Neighborhood

In [275]:
# df_san_crimes['Count'] = 1
#df_total_district = df_san_crimes.groupby('Neighborhood', axis=0).sum()
#df_total_district.reset_index(inplace=True)
#df_total_district.drop(['IncidntNum', 'X', 'Y', 'PdId'], axis=1, inplace=True)
#df_total_district.sort_values(axis=0, ascending=True, inplace=True, by='Count')
#df_total_district.reset_index(inplace=True)
#df_total_district.drop(['index'], axis=1, inplace=True)
#df_total_district
df_total_rest = df_result_final.copy(deep=True)
df_total_rest['Count'] = 1
df_total_rest.drop(['Restaurant Name', 'latitude', 'longitude'], axis=1, inplace=True)
df_total_rest = df_total_rest.groupby('Neighborhood', axis=0).sum()
df_total_rest.reset_index(inplace=True)

df_total_rest.sort_values(axis=0, ascending=False, inplace=True, by='Count')
df_total_rest.reset_index(inplace=True)
df_total_rest.drop(['index'], axis=1, inplace=True)
df_total_rest

Unnamed: 0,Neighborhood,Count
0,PARK,30
1,RICHMOND,11
2,TARAVAL,10
3,INGLESIDE,7
4,BAYVIEW,5


In [276]:
#df_total_rest.drop(['latitude', 'longitude'], axis=1, inplace=True)
df_total_rest

Unnamed: 0,Neighborhood,Count
0,PARK,30
1,RICHMOND,11
2,TARAVAL,10
3,INGLESIDE,7
4,BAYVIEW,5


Let's visualize this data on folium map

In [277]:

final_map = folium.Map(location=san_location, zoom_start=13)
folium.GeoJson(
    gdf,
).add_to(final_map)
for lat, lng, label in zip(centroids_df.centroid_lat, centroids_df.centroid_lon, centroids_df.Neighborhood):
  folium.Marker(
      [lat, lng-0.0025],
      icon=folium.DivIcon(label)
  ).add_to(final_map)
for lat, lng, in zip(df_result_final['latitude'], df_result_final['longitude']):
  folium.CircleMarker(
        [lat, lng],
        radius=10,
        color='yellow',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        #fill_color='#3186cc',
        #fill_opacity=0.7,
        parse_html=False).add_to(final_map)
final_map