## <center> DC Crime Rates Database Explained </center>

### Data Defined:

Below I will be importing the 2019 crime rates database from https://dc.gov/ website, cleaning up the database and then displaying it. With this data, we can factor it in to our decision as to where the best place to put our new business.

In [66]:
#Import to read dataframe
import types
import numpy as np # library to handle data in a vectorized manner
import pandas as pd
from botocore.client import Config
import ibm_boto3

In [3]:
# The code was removed by Watson Studio for sharing.

In [4]:
df_data_1 = pd.read_csv(body)
df_data_1.head()

Unnamed: 0,CCN,OFFENSE,BLOCK,WARD,ANC,DISTRICT,PSA,NEIGHBORHOOD_CLUSTER,BLOCK_GROUP,LATITUDE,LONGITUDE,OBJECTID,OCTO_RECORD_ID
0,17084415,HOMICIDE,130 - 199 BLOCK OF IRVINGTON STREET SW,8,8D,7.0,708.0,Cluster 39,010900 2,38.820461,-77.010375,305273937,17084415-01
1,18208996,THEFT/OTHER,2400 BLOCK OF MARKET STREET NE,5,5C,5.0,503.0,Cluster 24,009000 1,38.920536,-76.952663,305329181,18208996-01
2,18204218,THEFT F/AUTO,900 - 999 BLOCK OF G STREET NW,2,2C,2.0,209.0,Cluster 8,005800 1,38.89831,-77.024958,305331558,18204218-01
3,19005282,ROBBERY,4800 - 4899 BLOCK OF CENTRAL AVENUE NE,7,7C,6.0,608.0,Cluster 33,007804 3,38.890393,-76.933411,305332216,19005282-01
4,19005286,BURGLARY,4700 - 4798 BLOCK OF EASTERN AVENUE NE,5,5B,4.0,405.0,Cluster 20,009503 1,38.94692,-76.979005,305332217,19005286-01


#### I am going to take the data and sort the crimes into wards. DC is split up into 8 wards, which sorting it in these categories will allow us to determine how many crimes happen in each ward.

In [5]:
#Create new dataframe with incidents by ward
df_incidents = df_data_1.groupby(['WARD']).size().reset_index(name="Count")

In [6]:
df_incidents

Unnamed: 0,WARD,Count
0,1,1996
1,2,2998
2,3,920
3,4,1087
4,5,2152
5,6,2338
6,7,1635
7,8,1255


##### As we can see, ward 2 in DC has the highest number of crimes in 2019. Now, we're going to import folium and map it all.

In [7]:
!conda install -c conda-forge folium=0.5.0 --yes #install folium
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [8]:
#Put everything into a map
# download countries geojson file NEED THIS FOR DEFINING AREAS/BOUNDARIES
!wget --quiet "http://data.codefordc.org/dataset/a9512704-4ece-47cd-b3c0-402d28609364/resource/8ca2fd50-06cc-497f-89f9-a7937ff3650d/download/washington-dc-wards-2012.geojson"
    
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [9]:
world_geo = r'washington-dc-wards-2012.geojson' # geojson file to get the borders of the wards


# Washington DC latitude and longitude values
latitude = 38.91
longitude = -77.04

# create a plain world map
world_map = folium.Map(location=[latitude, longitude], zoom_start=12)
world_map.choropleth(
    geo_data=world_geo,
    data=df_incidents,
    columns=['WARD', 'Count'],
    key_on='feature.properties.WARD',
    #threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rates in Washington DC',
    reset=True
)
world_map

### Explanation of Data

As you can see above, the data is split up into the 8 wards, just like how DC is. Now that we have the data where the crime rates are generally located, we can now incorporate the Foursquare data to provide us with average reviews of restaurants and business within these districts, as well as type of business (i.e. restaurant, office building, hotel).

## <center> Utilizing Foursquare to find restaurants in each ward </center>

### First lets put our Foursquare Credentials here

In [10]:
import requests # library to handle requests
from pandas.io.json import json_normalize# tranforming json file into a pandas dataframe library
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
CLIENT_ID = '1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN' # your Foursquare ID
CLIENT_SECRET = 'KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN
CLIENT_SECRET:KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM


### Below we will use the coordinates of the center of DC so we can find all the businesses. We put it in a url to query in Foursquare

In [184]:
# Washington DC latitude and longitude values
latitude = 38.91
longitude = -77.04

LIMIT = 300 # limit of number of venues returned by Foursquare API
radius = 3500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=1YZGINGMDDORH40PP2QN0WWPUDY5JXBOT5TL3MNKCU4NG0SN&client_secret=KMSKN0NBS15INTGFPDQC4QMYE3NVJED0XT0C5FAB0KPGJHBM&v=20180604&ll=38.91,-77.04&radius=3500&limit=300'

### This section will provide the response of the query in JSON format, which will be cleaned up into usable data

In [185]:
results = requests.get(url).json() #results will display the query request response
results

{'meta': {'code': 200, 'requestId': '5d111a5453159300399b9eb7'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-522a9a0511d2b2f9a85cedb4-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/gastropub_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d155941735',
         'name': 'Gastropub',
         'pluralName': 'Gastropubs',
         'primary': True,
         'shortName': 'Gastropub'}],
       'id': '522a9a0511d2b2f9a85cedb4',
       'location': {'address': '1513 17th St NW',
        'cc': 'US',
        'city': 'Washington',
        'country': 'United States',
        'crossStreet': 'btwn Church St & P St NW',
        'distance': 151,
        'formattedAddress': ['1513 17th St NW (btwn Church St & P St NW)',
         'Washington, D.C. 20036',
         'Un

In [186]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']



### Below we normalize the JSON to pull out the information necessary to answer our questions

In [187]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.postalCode']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng,postalCode
0,Duke's Grocery,Gastropub,38.910187,-77.038262,20036
1,Keegan Theatre,Theater,38.910346,-77.039933,20036
2,Komi,Greek Restaurant,38.910058,-77.038231,20036
3,Little Serow,Thai Restaurant,38.910135,-77.038357,20036
4,Dupont Circle,Park,38.909704,-77.043783,20036


In [188]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the DC
x = -1
venues_map.choropleth(
    geo_data=world_geo,
    data=df_incidents,
    columns=['WARD', 'Count'],
    key_on='feature.properties.WARD',
    #threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rates in Washington DC',
    reset=True
)
# add the businesses as blue circle markers
for lat, lng, name in zip(nearby_venues.lat, nearby_venues.lng, nearby_venues.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=str(name.replace("'", "").replace('"', "")), 
        fill = False,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### As you see above there are some clusters of Asian businesses. We can hone in on these areas by using k-means clustering to find out a good location to put a Chinese restaurant that will guarentee business. We will begin by importing the libraries to handle clustering

In [189]:
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

### Lets quickly check the size of our dataframe

In [190]:
print(nearby_venues.shape)


(100, 5)


In [191]:
nearby_venues.groupby('postalCode').count()

Unnamed: 0_level_0,name,categories,lat,lng
postalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
20001,15,15,15,15
20002,1,1,1,1
20004,1,1,1,1
20005,13,13,13,13
20006,7,7,7,7
20007,3,3,3,3
20008,1,1,1,1
20009,31,31,31,31
20009-4307,1,1,1,1
20036,15,15,15,15


### Let's find out how many unique business categories we have

In [195]:
print('There are {} uniques categories.'.format(len(nearby_venues['categories'].unique())))

There are 52 uniques categories.


### Time to analyze each zip code of types of businesses

In [197]:
# one hot encoding
dc_onehot = pd.get_dummies(nearby_venues[['categories']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dc_onehot['postalCode'] = nearby_venues['postalCode'] 

# move postalcode column to the first column
fixed_columns = [dc_onehot.columns[-1]] + list(dc_onehot.columns[:-1])
dc_onehot = dc_onehot[fixed_columns]

dc_onehot.head()

Unnamed: 0,postalCode,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Bakery,Beer Bar,Beer Garden,Bookstore,Cocktail Bar,...,Sandwich Place,Seafood Restaurant,Spanish Restaurant,Steakhouse,Thai Restaurant,Theater,Trail,Whisky Bar,Wine Bar,Yoga Studio
0,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
2,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,20036,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by postal code and by taking the mean of the frequency of occurrence of each category

In [199]:
dc_grouped = dc_onehot.groupby('postalCode').mean().reset_index()
dc_grouped

Unnamed: 0,postalCode,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Bakery,Beer Bar,Beer Garden,Bookstore,Cocktail Bar,...,Sandwich Place,Seafood Restaurant,Spanish Restaurant,Steakhouse,Thai Restaurant,Theater,Trail,Whisky Bar,Wine Bar,Yoga Studio
0,20001,0.0,0.133333,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,...,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0
1,20002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,20004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,20005,0.0,0.076923,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,...,0.0,0.076923,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0
4,20006,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,20007,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,20008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
7,20009,0.032258,0.064516,0.032258,0.032258,0.0,0.0,0.0,0.0,0.064516,...,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.032258,0.032258
8,20009-4307,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,20036,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,...,0.066667,0.0,0.066667,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0


### Now let's show the top 5 most common businesses in each postal code

In [201]:
num_top_venues = 5

for hood in dc_grouped['postalCode']:
    print("----"+hood+"----")
    temp = dc_grouped[dc_grouped['postalCode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----20001----
                         venue  freq
0          American Restaurant  0.13
1               Sandwich Place  0.07
2                 Cocktail Bar  0.07
3           Italian Restaurant  0.07
4  Eastern European Restaurant  0.07


----20002----
                venue  freq
0  Spanish Restaurant   1.0
1   Afghan Restaurant   0.0
2         Pizza Place   0.0
3   Indian Restaurant   0.0
4  Israeli Restaurant   0.0


----20004----
                venue  freq
0               Hotel   1.0
1      Ice Cream Shop   0.0
2   Indian Restaurant   0.0
3  Israeli Restaurant   0.0
4  Italian Restaurant   0.0


----20005----
                venue  freq
0               Hotel  0.23
1         Coffee Shop  0.15
2  Salon / Barbershop  0.08
3         Pizza Place  0.08
4        Cycle Studio  0.08


----20006----
                 venue  freq
0          Coffee Shop  0.29
1                Hotel  0.29
2            Hotel Bar  0.14
3    Indian Restaurant  0.14
4  American Restaurant  0.14


----20007----
      

### I will now sort below to display the top ten venues for each postal code

In [204]:
# function to sort in descending order
def return_most_common_venues(row, num_top_venues): 
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [205]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['postalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['postalCode'] = dc_grouped['postalCode']

for ind in np.arange(dc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,postalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,20001,American Restaurant,Cocktail Bar,Sandwich Place,Eastern European Restaurant,Mexican Restaurant,Movie Theater,Italian Restaurant,Ice Cream Shop,Coffee Shop,Latin American Restaurant
1,20002,Spanish Restaurant,Yoga Studio,Cycle Studio,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market,Falafel Restaurant
2,20004,Hotel,Wine Bar,Gym,Grocery Store,Greek Restaurant,Government Building,Gastropub,French Restaurant,Farmers Market,Falafel Restaurant
3,20005,Hotel,Coffee Shop,Seafood Restaurant,Pizza Place,Salon / Barbershop,Cycle Studio,Beer Bar,Steakhouse,Theater,American Restaurant
4,20006,Hotel,Coffee Shop,American Restaurant,Hotel Bar,Indian Restaurant,Eastern European Restaurant,Grocery Store,Greek Restaurant,Government Building,Gastropub


### Time to Cluster Neighborhoods!

I will be running K-means to cluster postalcodes into 6 clusters

In [208]:
# set number of clusters
kclusters = 6

dc_grouped_clustering = dc_grouped.drop('postalCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 4, 1, 1, 1, 1, 0, 1, 5, 1], dtype=int32)

Lets take the cluster and combine into a new dataframe with top 10 venues of each neighborhood

In [211]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dc_merged = nearby_venues

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dc_merged = dc_merged.join(neighborhoods_venues_sorted.set_index('postalCode'), on='postalCode')



In [212]:
dc_merged

Unnamed: 0,name,categories,lat,lng,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Duke's Grocery,Gastropub,38.910187,-77.038262,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
1,Keegan Theatre,Theater,38.910346,-77.039933,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
2,Komi,Greek Restaurant,38.910058,-77.038231,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
3,Little Serow,Thai Restaurant,38.910135,-77.038357,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
4,Dupont Circle,Park,38.909704,-77.043783,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
5,Dupont Circle FRESHFARM Market,Farmers Market,38.910974,-77.044795,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
6,Kramerbooks & Afterwords Cafe,Bookstore,38.910756,-77.043880,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
7,sweetgreen,Salad Place,38.910449,-77.044244,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
8,Second Story Books,Bookstore,38.909488,-77.045102,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
9,CAVA,Mediterranean Restaurant,38.906639,-77.042132,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant


Convert Cluster labels from float to int

In [217]:
dc_merged['Cluster Labels'].fillna(0, inplace=True)
#dc_merged['Cluster Labels'] = dc_merged['Cluster Labels'].astype(int)
dc_merged

Unnamed: 0,name,categories,lat,lng,postalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Duke's Grocery,Gastropub,38.910187,-77.038262,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
1,Keegan Theatre,Theater,38.910346,-77.039933,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
2,Komi,Greek Restaurant,38.910058,-77.038231,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
3,Little Serow,Thai Restaurant,38.910135,-77.038357,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
4,Dupont Circle,Park,38.909704,-77.043783,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
5,Dupont Circle FRESHFARM Market,Farmers Market,38.910974,-77.044795,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
6,Kramerbooks & Afterwords Cafe,Bookstore,38.910756,-77.043880,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
7,sweetgreen,Salad Place,38.910449,-77.044244,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
8,Second Story Books,Bookstore,38.909488,-77.045102,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant
9,CAVA,Mediterranean Restaurant,38.906639,-77.042132,20036,1.0,Bookstore,Salad Place,Gastropub,Farmers Market,Israeli Restaurant,Convenience Store,Mediterranean Restaurant,Park,Pizza Place,Greek Restaurant


### Time to visualize the clusters and see our dataset

In [213]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dc_merged['lat'], dc_merged['lng'], dc_merged['postalCode'], dc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float