## <center> Opening a Chinese Restaurant in DC </center>

<center> In this notebook I will be explaining and extracting data from websites like dc.gov and foursquare to determine the best place to start a business, specifically a chinese restaurant </center>

### Data Defined:

Below I will be importing the 2019 crime rates database from https://dc.gov/ website, cleaning up the database and then displaying it. With this data, we can factor it in to our decision as to where the best place to put our new business.

In [1]:
#Import to read dataframe
import types
import numpy as np # library to handle data in a vectorized manner
import pandas as pd
from botocore.client import Config
import ibm_boto3

In [2]:
# The code was removed by Watson Studio for sharing.

### Lets take a look at the data in the database

In [3]:
df_data_1 = pd.read_csv(body)
df_data_1.head()

Unnamed: 0,CCN,OFFENSE,BLOCK,WARD,ANC,DISTRICT,PSA,NEIGHBORHOOD_CLUSTER,BLOCK_GROUP,LATITUDE,LONGITUDE,OBJECTID,OCTO_RECORD_ID
0,17084415,HOMICIDE,130 - 199 BLOCK OF IRVINGTON STREET SW,8,8D,7.0,708.0,Cluster 39,010900 2,38.820461,-77.010375,305273937,17084415-01
1,18208996,THEFT/OTHER,2400 BLOCK OF MARKET STREET NE,5,5C,5.0,503.0,Cluster 24,009000 1,38.920536,-76.952663,305329181,18208996-01
2,18204218,THEFT F/AUTO,900 - 999 BLOCK OF G STREET NW,2,2C,2.0,209.0,Cluster 8,005800 1,38.89831,-77.024958,305331558,18204218-01
3,19005282,ROBBERY,4800 - 4899 BLOCK OF CENTRAL AVENUE NE,7,7C,6.0,608.0,Cluster 33,007804 3,38.890393,-76.933411,305332216,19005282-01
4,19005286,BURGLARY,4700 - 4798 BLOCK OF EASTERN AVENUE NE,5,5B,4.0,405.0,Cluster 20,009503 1,38.94692,-76.979005,305332217,19005286-01


#### I am going to take the data and sort the crimes into wards. DC is split up into 8 wards, which sorting it in these categories will allow us to determine how many crimes happen in each ward.

In [4]:
#Create new dataframe with incidents by ward
df_incidents = df_data_1.groupby(['WARD']).size().reset_index(name="Count")

In [5]:
df_incidents

Unnamed: 0,WARD,Count
0,1,1996
1,2,2998
2,3,920
3,4,1087
4,5,2152
5,6,2338
6,7,1635
7,8,1255


### As we can see, ward 2 in DC has the highest number of crimes in 2019. Now, we're going to import folium and visualize it for us to see.

In [6]:
!conda install -c conda-forge folium=0.5.0 --yes #install folium
import folium

Solving environment: done

# All requested packages already installed.



In [7]:
#Put everything into a map
# download countries geojson file NEED THIS FOR DEFINING AREAS/BOUNDARIES
!wget --quiet "http://data.codefordc.org/dataset/a9512704-4ece-47cd-b3c0-402d28609364/resource/8ca2fd50-06cc-497f-89f9-a7937ff3650d/download/washington-dc-wards-2012.geojson"
    
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [8]:
world_geo = r'washington-dc-wards-2012.geojson' # geojson file to get the borders of the wards


# Washington DC latitude and longitude values
latitude = 38.91
longitude = -77.04

# create a plain world map
world_map = folium.Map(location=[latitude, longitude], zoom_start=12)
world_map.choropleth(
    geo_data=world_geo,
    data=df_incidents,
    columns=['WARD', 'Count'],
    key_on='feature.properties.WARD',
    #threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rates in Washington DC',
    reset=True
)
world_map

### Explanation of Data

As you can see above, the data is split up into the 8 wards, just like how DC is. Now that we have the data where the crime rates are generally located, we can now incorporate the Foursquare data to provide us with average reviews of restaurants and business within these districts, as well as type of business (i.e. restaurant, office building, hotel).

## <center> Utilizing Foursquare to find restaurants in each ward </center>

### First lets put our Foursquare Credentials here

In [9]:
import requests # library to handle requests
from pandas.io.json import json_normalize# tranforming json file into a pandas dataframe library
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
CLIENT_ID = '1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN' # your Foursquare ID
CLIENT_SECRET = 'KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN
CLIENT_SECRET:KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM


### Below we will use the coordinates of the center of DC so we can find all the businesses. We put it in a url to query in Foursquare

In [10]:
# Washington DC latitude and longitude values
latitude = 38.91
longitude = -77.04

LIMIT = 300 # limit of number of venues returned by Foursquare API
radius = 3500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN&client_secret=KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM&v=20180604&ll=38.91,-77.04&radius=3500&limit=300'

### This section will provide the response of the query in JSON format, which will be cleaned up into usable data

In [11]:
results = requests.get(url).json() #results will display the query request response

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']



### Below we normalize the JSON to pull out the information necessary to answer our questions

In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.postalCode']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng,postalCode
0,Duke's Grocery,Gastropub,38.910187,-77.038262,20036
1,Komi,Greek Restaurant,38.910058,-77.038231,20036
2,Keegan Theatre,Theater,38.910346,-77.039933,20036
3,Little Serow,Thai Restaurant,38.910135,-77.038357,20036
4,Dupont Circle,Park,38.909704,-77.043783,20036


In [14]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the DC
x = -1
venues_map.choropleth(
    geo_data=world_geo,
    data=df_incidents,
    columns=['WARD', 'Count'],
    key_on='feature.properties.WARD',
    #threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rates in Washington DC',
    reset=True
)
# add the businesses as blue circle markers
for lat, lng, name in zip(nearby_venues.lat, nearby_venues.lng, nearby_venues.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=str(name.replace("'", "").replace('"', "")), 
        fill = False,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### As you see above there are some clusters of Asian businesses. We can hone in on these areas by using k-means clustering to find out a good location to put a Chinese restaurant that will guarentee business. We will begin by importing the libraries to handle clustering

In [15]:
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

### Lets quickly check the size of our dataframe

In [16]:
print(nearby_venues.shape)

(100, 5)


### Group the businesses by area code

In [17]:
nearby_venues.groupby('postalCode').count()

Unnamed: 0_level_0,name,categories,lat,lng
postalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
20001,15,15,15,15
20002,1,1,1,1
20004,1,1,1,1
20005,13,13,13,13
20006,7,7,7,7
20007,3,3,3,3
20008,1,1,1,1
20009,31,31,31,31
20009-4307,1,1,1,1
20036,15,15,15,15


### Let's find out how many unique business categories we have

In [18]:
print('There are {} uniques categories.'.format(len(nearby_venues['categories'].unique())))

There are 53 uniques categories.


### Time to analyze each zip code for what is the most popular business

In [19]:
# one hot encoding
dc_onehot = pd.get_dummies(nearby_venues[['categories']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dc_onehot['postalCode'] = nearby_venues['postalCode'] 

# move postalcode column to the first column
fixed_columns = [dc_onehot.columns[-1]] + list(dc_onehot.columns[:-1])
dc_onehot = dc_onehot[fixed_columns]

dc_onehot.head()

Unnamed: 0,postalCode,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Bakery,Beer Bar,Beer Garden,Bookstore,Cocktail Bar,...,Seafood Restaurant,Spanish Restaurant,Steakhouse,Thai Restaurant,Theater,Trail,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
4,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Next, lets group rows by postal code and by taking the mean of the frequency of occurrence of each category

In [20]:
dc_grouped = dc_onehot.groupby('postalCode').mean().reset_index()
dc_grouped

Unnamed: 0,postalCode,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Bakery,Beer Bar,Beer Garden,Bookstore,Cocktail Bar,...,Seafood Restaurant,Spanish Restaurant,Steakhouse,Thai Restaurant,Theater,Trail,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,20001,0.0,0.066667,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0
1,20002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,20004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,20005,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,...,0.076923,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,0.0
4,20006,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,20007,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,20008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
7,20009,0.032258,0.064516,0.032258,0.032258,0.0,0.0,0.0,0.0,0.064516,...,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.032258,0.0,0.032258
8,20009-4307,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,20036,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,...,0.0,0.066667,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0


### Now let's show the top 5 most common businesses in each postal code

In [21]:
num_top_venues = 5

for hood in dc_grouped['postalCode']:
    print("----"+hood+"----")
    temp = dc_grouped[dc_grouped['postalCode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----20001----
                         venue  freq
0           Italian Restaurant  0.07
1               Ice Cream Shop  0.07
2               Sandwich Place  0.07
3                Movie Theater  0.07
4  Eastern European Restaurant  0.07


----20002----
                venue  freq
0  Spanish Restaurant   1.0
1   Afghan Restaurant   0.0
2          Restaurant   0.0
3  Israeli Restaurant   0.0
4  Italian Restaurant   0.0


----20004----
                venue  freq
0               Hotel   1.0
1   Afghan Restaurant   0.0
2          Restaurant   0.0
3  Israeli Restaurant   0.0
4  Italian Restaurant   0.0


----20005----
                 venue  freq
0                Hotel  0.23
1          Coffee Shop  0.15
2         Cycle Studio  0.08
3  American Restaurant  0.08
4             Beer Bar  0.08


----20006----
                 venue  freq
0                Hotel  0.29
1          Coffee Shop  0.29
2            Hotel Bar  0.14
3  American Restaurant  0.14
4    Indian Restaurant  0.14


----20007----


### I will now sort below to display the top ten venues for each postal code

In [22]:
# function to sort in descending order
def return_most_common_venues(row, num_top_venues): 
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['postalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['postalCode'] = dc_grouped['postalCode']

for ind in np.arange(dc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,postalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20001,Sandwich Place,Wine Shop,Eastern European Restaurant,Mexican Restaurant,Movie Theater,Italian Restaurant,Ice Cream Shop,Coffee Shop,Cocktail Bar,Latin American Restaurant
1,20002,Spanish Restaurant,Yoga Studio,Cycle Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market
2,20004,Hotel,Yoga Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market,Falafel Restaurant
3,20005,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
4,20006,Coffee Shop,Hotel,Hotel Bar,American Restaurant,Indian Restaurant,Eastern European Restaurant,Gym,Grocery Store,Greek Restaurant,Government Building
5,20007,Hotel,Falafel Restaurant,Deli / Bodega,Yoga Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant
6,20008,Trail,Yoga Studio,Cycle Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market
7,20009,Ice Cream Shop,Coffee Shop,Grocery Store,American Restaurant,Cocktail Bar,Yoga Studio,Mediterranean Restaurant,Art Gallery,Art Museum,Cycle Studio
8,20009-4307,Seafood Restaurant,Yoga Studio,Cycle Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market
9,20036,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant


### Time to Cluster Neighborhoods!

I will be running K-means to cluster postalcodes into 10 clusters, which will make it easier to visualize business concentration

In [24]:
# set number of clusters
kclusters = 10

dc_grouped_clustering = dc_grouped.drop('postalCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 6, 3, 7, 7, 9, 4, 1, 0, 1], dtype=int32)

Lets take the cluster and combine into a new dataframe with top 10 venues of each neighborhood

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dc_merged = nearby_venues

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dc_merged = dc_merged.join(neighborhoods_venues_sorted.set_index('postalCode'), on='postalCode')



In [26]:
dc_merged

Unnamed: 0,name,categories,lat,lng,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Duke's Grocery,Gastropub,38.910187,-77.038262,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
1,Komi,Greek Restaurant,38.910058,-77.038231,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
2,Keegan Theatre,Theater,38.910346,-77.039933,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
3,Little Serow,Thai Restaurant,38.910135,-77.038357,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
4,Dupont Circle,Park,38.909704,-77.043783,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
5,Kramerbooks & Afterwords Cafe,Bookstore,38.910756,-77.043880,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
6,Dupont Circle FRESHFARM Market,Farmers Market,38.910974,-77.044795,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
7,sweetgreen,Salad Place,38.910449,-77.044244,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
8,Second Story Books,Bookstore,38.909488,-77.045102,20036,1.0,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
9,Dolcezza Artisanal Gelato,Ice Cream Shop,38.912786,-77.045628,20009,1.0,Ice Cream Shop,Coffee Shop,Grocery Store,American Restaurant,Cocktail Bar,Yoga Studio,Mediterranean Restaurant,Art Gallery,Art Museum,Cycle Studio


Convert Cluster labels from float to int

In [27]:
dc_merged['Cluster Labels'].fillna(0, inplace=True)
dc_merged['Cluster Labels'] = dc_merged['Cluster Labels'].astype(int)
dc_merged

Unnamed: 0,name,categories,lat,lng,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Duke's Grocery,Gastropub,38.910187,-77.038262,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
1,Komi,Greek Restaurant,38.910058,-77.038231,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
2,Keegan Theatre,Theater,38.910346,-77.039933,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
3,Little Serow,Thai Restaurant,38.910135,-77.038357,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
4,Dupont Circle,Park,38.909704,-77.043783,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
5,Kramerbooks & Afterwords Cafe,Bookstore,38.910756,-77.043880,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
6,Dupont Circle FRESHFARM Market,Farmers Market,38.910974,-77.044795,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
7,sweetgreen,Salad Place,38.910449,-77.044244,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
8,Second Story Books,Bookstore,38.909488,-77.045102,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
9,Dolcezza Artisanal Gelato,Ice Cream Shop,38.912786,-77.045628,20009,1,Ice Cream Shop,Coffee Shop,Grocery Store,American Restaurant,Cocktail Bar,Yoga Studio,Mediterranean Restaurant,Art Gallery,Art Museum,Cycle Studio


### Time to visualize the clusters and see our dataset

In [28]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
map_clusters.choropleth(
    geo_data=world_geo,
    data=df_incidents,
    columns=['WARD', 'Count'],
    key_on='feature.properties.WARD',
    #threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.6, 
    line_opacity=0.2,
    legend_name='Crime Rates in Washington DC',
    reset=True
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dc_merged['lat'], dc_merged['lng'], dc_merged['categories'], dc_merged['Cluster Labels']): #Changed dc_merged[2] to categories instead of postal code
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### What we can see above is that the concentration of most businesses is directly related to the crimes committed. Below I will display all the businesses within each cluster and dive into the information to help us determine the best place to put our restaurant.

In [29]:
dc_merged.loc[dc_merged['Cluster Labels'] == 0, dc_merged.columns[[1] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,categories,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Seafood Restaurant,0,Seafood Restaurant,Yoga Studio,Cycle Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market
78,Park,0,,,,,,,,,,


In [30]:
dc_merged.loc[dc_merged['Cluster Labels'] == 1, dc_merged.columns[[4] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
1,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
2,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
3,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
4,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
5,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
6,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
7,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
8,20036,1,Bookstore,Park,Spanish Restaurant,Israeli Restaurant,Greek Restaurant,Pizza Place,Salad Place,Convenience Store,Sandwich Place,Thai Restaurant
9,20009,1,Ice Cream Shop,Coffee Shop,Grocery Store,American Restaurant,Cocktail Bar,Yoga Studio,Mediterranean Restaurant,Art Gallery,Art Museum,Cycle Studio


In [31]:
dc_merged.loc[dc_merged['Cluster Labels'] == 2, dc_merged.columns[[4] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
85,20052,2,College Quad,Yoga Studio,Deli / Bodega,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market


In [32]:
dc_merged.loc[dc_merged['Cluster Labels'] == 3, dc_merged.columns[[4] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
92,20004,3,Hotel,Yoga Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market,Falafel Restaurant


In [33]:
dc_merged.loc[dc_merged['Cluster Labels'] == 4, dc_merged.columns[[4] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,20008,4,Trail,Yoga Studio,Cycle Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market


In [34]:
dc_merged.loc[dc_merged['Cluster Labels'] == 5, dc_merged.columns[[4] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,20071,5,Coffee Shop,Yoga Studio,Deli / Bodega,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market


In [35]:
dc_merged.loc[dc_merged['Cluster Labels'] == 6, dc_merged.columns[[4] + list(range(5, dc_merged.shape[1]))]]

Unnamed: 0,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,20002,6,Spanish Restaurant,Yoga Studio,Cycle Studio,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market


### Cluster 7 Below looks promising. The data we see includes three separate zip codes (20005, 20006 and 20037) that have hotels as their 1st most common venue. Hotels means people who may not have the means to cook a meal and need to eat out.

In [36]:
dc_merged.loc[dc_merged['Cluster Labels'] == 7, dc_merged.columns[list(range(2, dc_merged.shape[1]))]]

Unnamed: 0,lat,lng,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,38.910029,-77.032013,20005,7,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
15,38.907861,-77.032158,20005,7,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
18,38.909956,-77.031782,20005,7,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
23,38.908047,-77.032911,20005,7,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
26,38.905236,-77.034082,20005,7,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
28,38.90851,-77.031709,20005,7,Hotel,Coffee Shop,Pizza Place,Steakhouse,Salon / Barbershop,Cycle Studio,Beer Bar,Seafood Restaurant,Theater,American Restaurant
30,38.905065,-77.048079,20037,7,Hotel,Indian Restaurant,French Restaurant,Salad Place,New American Restaurant,Gym,Deli / Bodega,Grocery Store,Greek Restaurant,Government Building
45,38.901041,-77.04168,20006,7,Coffee Shop,Hotel,Hotel Bar,American Restaurant,Indian Restaurant,Eastern European Restaurant,Gym,Grocery Store,Greek Restaurant,Government Building
46,38.90051,-77.036885,20006,7,Coffee Shop,Hotel,Hotel Bar,American Restaurant,Indian Restaurant,Eastern European Restaurant,Gym,Grocery Store,Greek Restaurant,Government Building
47,38.905701,-77.051075,20037,7,Hotel,Indian Restaurant,French Restaurant,Salad Place,New American Restaurant,Gym,Deli / Bodega,Grocery Store,Greek Restaurant,Government Building


### Looking at the data the latitude and longitude of cluster 7 could be a good spot to do more research of businesses within the area to see if there is potential. I will be using the explore function from Foursquare below to check

In [37]:
lat_7 = 38.90
long_7 = -77.03
radius = 500 #500 meter radius

In [38]:
url_7 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat_7, long_7, VERSION, radius, LIMIT)
url_7

'https://api.foursquare.com/v2/venues/explore?client_id=1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN&client_secret=KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM&ll=38.9,-77.03&v=20180604&radius=500&limit=300'

#### Send GET request and examine results

In [39]:
results = requests.get(url_7).json()
'There are {} businesses in cluster 7'.format(len(results['response']['groups'][0]['items']))

'There are 100 businesses in cluster 7'

#### Get relevant part of JSON

In [40]:
items = results['response']['groups'][0]['items']
items[0]

{'reasons': {'count': 0,
  'items': [{'reasonName': 'globalInteractionReason',
    'summary': 'This spot is popular',
    'type': 'general'}]},
 'referralId': 'e-0-4b66f0bff964a52065312be3-0',
 'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/museum_art_',
     'suffix': '.png'},
    'id': '4bf58dd8d48988d18f941735',
    'name': 'Art Museum',
    'pluralName': 'Art Museums',
    'primary': True,
    'shortName': 'Art Museum'}],
  'id': '4b66f0bff964a52065312be3',
  'location': {'address': '1250 New York Ave NW',
   'cc': 'US',
   'city': 'Washington',
   'country': 'United States',
   'crossStreet': 'at 13th St NW',
   'distance': 63,
   'formattedAddress': ['1250 New York Ave NW (at 13th St NW)',
    'Washington, D.C. 20005',
    'United States'],
   'labeledLatLngs': [{'label': 'display',
     'lat': 38.89988597562643,
     'lng': -77.02927604118754}],
   'lat': 38.89988597562643,
   'lng': -77.02927604118754,
   'postalCode': '2

#### Process JSON and convert it to a clean dataframe

In [41]:
dataframe = json_normalize(items) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

In [42]:
dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,National Museum of Women in the Arts,Art Museum,1250 New York Ave NW,US,Washington,United States,at 13th St NW,63,"[1250 New York Ave NW (at 13th St NW), Washing...","[{'label': 'display', 'lat': 38.89988597562643...",38.899886,-77.029276,,20005,D.C.,4b66f0bff964a52065312be3
1,Compass Coffee,Coffee Shop,1401 I St NW,US,Washington,United States,at 14th St NW,256,"[1401 I St NW (at 14th St NW), Washington, D.C...","[{'label': 'display', 'lat': 38.90152219803539...",38.901522,-77.032230,,20005,D.C.,5a3a7dde234724761e0eeb30
2,Mastro's Steakhouse,Steakhouse,600 13th Street NW,US,Washington,United States,,248,"[600 13th Street NW, Washington, D.C. 20005, U...","[{'label': 'display', 'lat': 38.8977735, 'lng'...",38.897773,-77.029810,,20005,D.C.,554984c2498eed234444931d
3,Ocean Prime,Seafood Restaurant,1341 G St NW,US,Washington,United States,,207,"[1341 G St NW, Washington, D.C. 20005, United ...","[{'label': 'display', 'lat': 38.89857728095185...",38.898577,-77.031544,,20005,D.C.,57e3d227498e2064353c039c
4,Swing's Coffee,Coffee Shop,640 14th St NW,US,Washington,United States,at G St NW,275,"[640 14th St NW (at G St NW), Washington, D.C....","[{'label': 'display', 'lat': 38.89807332447161...",38.898073,-77.031991,,20005,D.C.,5894e87d76b8b24a3400c1ff
5,Woodward Takeout Food,Sandwich Place,1426 H St NW,US,Washington,United States,,245,"[1426 H St NW, Washington, D.C. 20005, United ...","[{'label': 'display', 'lat': 38.899989, 'lng':...",38.899989,-77.032830,,20005,D.C.,54245c57498e5efd4a646cb9
6,Buredo,Sushi Restaurant,825 14th St NW,US,Washington,United States,at I St NW,176,"[825 14th St NW (at I St NW), Washington, D.C....","[{'label': 'display', 'lat': 38.90090266508559...",38.900903,-77.031675,Downtown-Penn Quarter-Chinatown,20005,D.C.,55916c95498e4c955f392181
7,District Taco,Taco Place,1309 F St. NW,US,Washington,United States,at 13th St NW,267,"[1309 F St. NW (at 13th St NW), Washington, D....","[{'label': 'display', 'lat': 38.897597, 'lng':...",38.897597,-77.030156,Downtown-Penn Quarter-Chinatown,20004,D.C.,4fa1622fe4b03a1030b13c5b
8,Dangerously Delicious Pie Truck,Food Truck,,US,Washington,United States,See Twitter for current location,195,"[See Twitter for current location, Washington,...","[{'label': 'display', 'lat': 38.90170350427239...",38.901704,-77.030564,,,D.C.,4ccafaffc9b846882ad5b2c3
9,Fruitive,Juice Bar,1094 Palmer Aly NW,US,Washington,United States,,268,"[1094 Palmer Aly NW, Washington, D.C. 20001, U...","[{'label': 'display', 'lat': 38.900321, 'lng':...",38.900321,-77.026929,,20001,D.C.,5661d923498e05208d892307


### Let's find out how many different types of businesses there are around cluster 7 and see if there's any chinese restaurants!

In [43]:
dataframe_filtered.groupby('categories').count()

Unnamed: 0_level_0,name,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
American Restaurant,4,4,4,4,4,3,4,4,4,4,4,0,4,4,4
Art Museum,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1
Arts & Crafts Store,1,0,1,1,1,0,1,1,1,1,1,0,0,1,1
Asian Restaurant,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1
Bakery,2,2,2,2,2,2,2,2,2,2,2,0,2,2,2
Bar,2,2,2,2,2,0,2,2,2,2,2,0,2,2,2
Belgian Restaurant,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1
Brewery,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1
Burger Joint,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1
Camera Store,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1


### Only one asian restaurant! Below I will pull up specific information about the chinese restaurant

In [44]:
dataframe_filtered.loc[dataframe_filtered['categories'] == 'Asian Restaurant']

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
61,Momofuku CCDC,Asian Restaurant,1090 I St NW,US,Washington,United States,11th St,292,"[1090 I St NW (11th St), Washington, D.C. 2000...","[{'label': 'display', 'lat': 38.90058120718301...",38.900581,-77.026708,,20001,D.C.,562c0479498e2c64fb092ae9


### Looks like Momofuku CCDC is the only asian restaurant in this cluster. Below I will visualize all businesses with Momofuku as the red marker.

In [45]:
cluster_map = folium.Map(location=[lat_7, long_7], zoom_start=15) # generate map centred around Ecco

# add popular spots to the map as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    if label == 'Asian Restaurant':
        folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='red',
        fill_color='red',
        fill_opacity=0.6
        ).add_to(cluster_map)
    else:
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6
            ).add_to(cluster_map)

# display map
cluster_map

### Finally, let's take a closer look at Momofuku and see if we can compete with them!

In [46]:
venue_id = '562c0479498e2c64fb092ae9' # ID of Harry's Italian Pizza Bar
url_fuku = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url_fuku

'https://api.foursquare.com/v2/venues/562c0479498e2c64fb092ae9?client_id=1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN&client_secret=KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM&v=20180604'

### Send GET request for result and get the overall rating

In [47]:
result_fuku = requests.get(url_fuku).json()

#returns overall rating
try:
    print(result_fuku['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

7.5


### 7.5 is a pretty high rating! We should not start the business too close to it, and put it closer to Mcphearson square. Since we only have only Momofuku as our sole competitor, we should have a good chance of succeeding putting our restaurant there. Thanks for tuning in!