##### This is project is part of IBM's capstone project.
## Introduction

The aim of this notebook is to perform segmentation and clustering on the  neighbourhoods of Los Angeles, USA. Also, we will use the Foursquare API to explore neighborhoods in Los Angeles City and then use this feature to get the most common venue categories in each neighborhood and group the neighborhoods into clusters.

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [26]:
import numpy as np 
import pandas as pd
from bs4 import BeautifulSoup
import requests

We scrape the required data from various sources.

In [27]:
data = requests.get('http://www.laalmanac.com/communications/cm02_communities.php').text
soup = BeautifulSoup(data, 'html.parser')

In [28]:
postalCodeList = []
boroughList = []

In [29]:
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalCodeList.append(cells[1].text)
        boroughList.append(cells[0].text)
       

In [30]:
ps=list(map(lambda x:x.split(',')[0],postalCodeList))
ps


['93510',
 '91301',
 '91376',
 '91390',
 '91801',
 '91804',
 '91802',
 '91001',
 '91003',
 '91210',
 '91006',
 '91066',
 '91331',
 '90019',
 '90701',
 '90702',
 '90044',
 '90039',
 '90704',
 '91746',
 '91010',
 '90008',
 '91706',
 '91746',
 '90049',
 '90077',
 '90201',
 '90202',
 '90096',
 '91307',
 '90201',
 '90202',
 '90706',
 '90707',
 '90803',
 '90853',
 '90077',
 '90210',
 '90209',
 '90639',
 '90807',
 '91350',
 '91304',
 '90033',
 '91008',
 '90049',
 '91501',
 '91522',
 '91504',
 '91503',
 '91521',
 '91522',
 '91797',
 '90006',
 '91302',
 '91125',
 '90840',
 '91330',
 '90506',
 '91303',
 '91305',
 '91386',
 '91351',
 '90745',
 '90747',
 '90895',
 '90749',
 '91310',
 '91383',
 '90704',
 '90272',
 '90067',
 '90703',
 '91724',
 '91311',
 '91313',
 '90064',
 '90012',
 '90030',
 '90189',
 '93551',
 '90063',
 '91711',
 '90040',
 '90091',
 '90023',
 '90220',
 '90223',
 '91301',
 '90019',
 '91722',
 '90008',
 '93544',
 '90201',
 '90066',
 '90231',
 '90065',
 '93536',
 '91765',
 '90810',


In [31]:
postalCodeList = list(map(int, ps)) 

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [32]:
la_df=pd.DataFrame({"PostalCode": postalCodeList,
                           "Borough": boroughList})
la_df.head()

Unnamed: 0,PostalCode,Borough
0,93510,Acton
1,91301,Agoura Hills
2,91376,Agoura Hills (PO Boxes)
3,91390,Agua Dulce
4,91801,Alhambra


In [33]:
la_df.shape

(643, 2)

In [34]:
df1 = la_df[la_df['Borough'].str.contains("(Los Angeles)",regex=False)] 
df1.reset_index(drop=True,inplace=True)
df1.shape
df1.head()

Unnamed: 0,PostalCode,Borough
0,91331,Arleta (Los Angeles)
1,90019,Arlington Heights (Los Angeles)
2,90039,Atwater Village (Los Angeles)
3,90008,Baldwin Hills (Los Angeles)
4,90049,Bel Air Estates (Los Angeles)


similarly we get another dataset

In [35]:
url="https://gist.githubusercontent.com/senning/58a8c82e0c97712eabbe4700ce2187a1/raw/3e78d6cfb3542dc520570d07648721924cca8b3d/US%2520Zip%2520Codes%2520from%25202016%2520Government%2520Data"
df2 = pd.read_csv(url)
df2.head()

Unnamed: 0,ZIP,LAT,LNG
0,601,18.180555,-66.749961
1,602,18.361945,-67.175597
2,603,18.455183,-67.119887
3,606,18.158345,-66.932911
4,610,18.295366,-67.125135


In [36]:
df2.rename(columns={"ZIP": "PostalCode"}, inplace=True)
df2.head()

Unnamed: 0,PostalCode,LAT,LNG
0,601,18.180555,-66.749961
1,602,18.361945,-67.175597
2,603,18.455183,-67.119887
3,606,18.158345,-66.932911
4,610,18.295366,-67.125135


In [37]:
df2.PostalCode.astype(int)

0          601
1          602
2          603
3          606
4          610
5          612
6          616
7          617
8          622
9          623
10         624
11         627
12         631
13         637
14         638
15         641
16         646
17         647
18         650
19         652
20         653
21         656
22         659
23         660
24         662
25         664
26         667
27         669
28         670
29         674
         ...  
33114    99786
33115    99788
33116    99789
33117    99790
33118    99791
33119    99801
33120    99820
33121    99824
33122    99825
33123    99826
33124    99827
33125    99829
33126    99830
33127    99832
33128    99833
33129    99835
33130    99836
33131    99840
33132    99841
33133    99901
33134    99903
33135    99918
33136    99919
33137    99921
33138    99922
33139    99923
33140    99925
33141    99926
33142    99927
33143    99929
Name: PostalCode, Length: 33144, dtype: int64

In [38]:
la_df3 = df1.merge(df2, on="PostalCode", how="left")
la_df3.rename(columns={"LAT": "Latitude"}, inplace=True)
la_df3.rename(columns={"LNG": "Longitude"}, inplace=True)
la_df3.head()

Unnamed: 0,PostalCode,Borough,Latitude,Longitude
0,91331,Arleta (Los Angeles),34.255442,-118.421314
1,90019,Arlington Heights (Los Angeles),34.049841,-118.33846
2,90039,Atwater Village (Los Angeles),34.111885,-118.261033
3,90008,Baldwin Hills (Los Angeles),34.009552,-118.346724
4,90049,Bel Air Estates (Los Angeles),34.09254,-118.491064


In [39]:
la_df3.isnull().sum()

PostalCode     0
Borough        0
Latitude      45
Longitude     45
dtype: int64

#### After cleaning we get our final dataset

In [40]:
la_df3=la_df3.dropna().reset_index(drop=True)

la_df3.drop_duplicates(subset=['PostalCode'], keep='first', inplace=True)

la_df3.shape

(80, 4)

In [41]:
import json # library to handle JSON files
#!conda install -c conda-forge folium=0.5.0 --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

In [50]:
#@hidden_cell
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [51]:


address = 'Los Angeles'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Los Angeles are 34.0536909, -118.2427666.


we use foilum to plot the map of city.

In [52]:
map_la = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(la_df3['Latitude'], la_df3['Longitude'], la_df3['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_la)  
    
map_la

#### we now use FOURSQUARE API to get data of each location 

In [53]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough in zip(la_df3['Latitude'], la_df3['Longitude'], la_df3['PostalCode'], la_df3['Borough']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))


# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['PostalCode', 'Borough', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1596, 8)


Unnamed: 0,PostalCode,Borough,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,91331,Arleta (Los Angeles),34.255442,-118.421314,Birreria Apatzingan,34.252693,-118.42536,Mexican Restaurant
1,90019,Arlington Heights (Los Angeles),34.049841,-118.33846,PizzaRev,34.048585,-118.336439,Pizza Place
2,90019,Arlington Heights (Los Angeles),34.049841,-118.33846,Jersey Mike's Subs,34.048449,-118.337419,Sandwich Place
3,90019,Arlington Heights (Los Angeles),34.049841,-118.33846,Planet Fitness,34.047774,-118.338605,Gym / Fitness Center
4,90019,Arlington Heights (Los Angeles),34.049841,-118.33846,Midtown Crossing,34.048047,-118.337077,Shopping Mall


In [54]:
venues_df.groupby(["PostalCode", "Borough"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
PostalCode,Borough,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
90001,"Florence-Graham, South Los Angeles (Los Angeles)",13,13,13,13,13,13
90002,Southeast Los Angeles (Los Angeles),2,2,2,2,2,2
90003,South Los Angeles/Broadway Manchester (Los Angeles),2,2,2,2,2,2
90004,Hancock Park (Los Angeles),51,51,51,51,51,51
90005,Koreatown (Los Angeles),40,40,40,40,40,40
90006,Byzantine-Latino Quarter (Los Angeles),12,12,12,12,12,12
90007,University Park (Los Angeles),20,20,20,20,20,20
90008,Baldwin Hills (Los Angeles),2,2,2,2,2,2
90012,Chinatown (Los Angeles),29,29,29,29,29,29
90013,Downtown Fashion District (Los Angeles),100,100,100,100,100,100


In [55]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 268 uniques categories.


#### Now lets analyze each area

In [56]:
# one hot encoding
la_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
la_onehot['PostalCode'] = venues_df['PostalCode'] 
la_onehot['Borough'] = venues_df['Borough'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(la_onehot.columns[-2:]) + list(la_onehot.columns[:-2])
la_onehot = la_onehot[fixed_columns]

print(la_onehot.shape)
la_onehot.head()

(1596, 270)


Unnamed: 0,PostalCode,Borough,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Amphitheater,Aquarium,Arcade,...,Video Store,Vietnamese Restaurant,Watch Shop,Waterfront,Weight Loss Center,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,91331,Arleta (Los Angeles),0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,90019,Arlington Heights (Los Angeles),0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,90019,Arlington Heights (Los Angeles),0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,90019,Arlington Heights (Los Angeles),0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,90019,Arlington Heights (Los Angeles),0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [57]:


la_grouped = la_onehot.groupby(["PostalCode", "Borough"]).mean().reset_index()

print(la_grouped.shape)

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = la_grouped['PostalCode']
neighborhoods_venues_sorted['Borough'] = la_grouped['Borough']

for ind in np.arange(la_grouped.shape[0]):
    row_categories = la_grouped.iloc[ind, :].iloc[2:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 2:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted



(74, 270)
(74, 12)


Unnamed: 0,PostalCode,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,90001,"Florence-Graham, South Los Angeles (Los Angeles)",Mexican Restaurant,Donut Shop,Pizza Place,Fast Food Restaurant,Pharmacy,Discount Store,Grocery Store,Sandwich Place,Fruit & Vegetable Store,Shoe Store
1,90002,Southeast Los Angeles (Los Angeles),Park,Auto Garage,Diner,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Yoshoku Restaurant
2,90003,South Los Angeles/Broadway Manchester (Los Ang...,Fast Food Restaurant,Southern / Soul Food Restaurant,Yoshoku Restaurant,Diner,Flea Market,Fish & Chips Shop,Filipino Restaurant,Farmers Market,Electronics Store,Eastern European Restaurant
3,90004,Hancock Park (Los Angeles),Korean Restaurant,Coffee Shop,Bar,Japanese Restaurant,Cocktail Bar,Bakery,Sandwich Place,Seafood Restaurant,Salon / Barbershop,Smoke Shop
4,90005,Koreatown (Los Angeles),Korean Restaurant,Japanese Restaurant,Bakery,Coffee Shop,Ice Cream Shop,Café,Concert Hall,Steakhouse,Brazilian Restaurant,South American Restaurant
5,90006,Byzantine-Latino Quarter (Los Angeles),Pizza Place,Video Game Store,Spa,Mobile Phone Shop,Food Truck,Bus Station,Diner,Sandwich Place,Cosmetics Shop,Spanish Restaurant
6,90007,University Park (Los Angeles),Coffee Shop,Shipping Store,Ramen Restaurant,Caribbean Restaurant,Food Truck,Korean Restaurant,Big Box Store,Juice Bar,Gastropub,Fraternity House
7,90008,Baldwin Hills (Los Angeles),Clothing Store,Scenic Lookout,Dim Sum Restaurant,Discount Store,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Yoshoku Restaurant
8,90012,Chinatown (Los Angeles),Chinese Restaurant,Bakery,Vietnamese Restaurant,Bar,Japanese Restaurant,Café,Brewery,Dim Sum Restaurant,French Restaurant,Recreation Center
9,90013,Downtown Fashion District (Los Angeles),Japanese Restaurant,Sushi Restaurant,Coffee Shop,Ice Cream Shop,Ramen Restaurant,Bar,Gift Shop,Cocktail Bar,Bakery,Brewery


In [93]:

kclusters = 20
la_postal=la_grouped["PostalCode"]
la_grouped_clustering = la_grouped.drop(["PostalCode", "Borough"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(la_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:100])
la_grouped_clustering.insert(0, 'Cluster Labels', kmeans.labels_)
la_grouped_clustering.insert(0, 'PostalCode', la_postal)
la_grouped_clustering



[12  1  7  4  4 12 18 10 18 18 18 18 18 18 18  4 18  9  0 18  5 12 18  8
 12 12 18  0  0 18 12 12 18 18 18 15 18 13 12 18 14 18 18 18 18 18  0 19
 18 12 12 11 18 12  2 18 18 16  6 18 12 12  3 12 17 18 18 12 12 18  0 18
 12 12]


Unnamed: 0,PostalCode,Cluster Labels,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Amphitheater,Aquarium,Arcade,...,Video Store,Vietnamese Restaurant,Watch Shop,Waterfront,Weight Loss Center,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,90001,12,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
1,90002,1,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
2,90003,7,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
3,90004,4,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.019608,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
4,90005,4,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.025000,0.00
5,90006,12,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
6,90007,18,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.050000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
7,90008,10,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
8,90012,18,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.103448,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.00
9,90013,18,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,0.010000,...,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.01


In [94]:
la_temp = la_grouped_clustering.filter(['PostalCode','Cluster Labels'],axis=1)
la_temp

Unnamed: 0,PostalCode,Cluster Labels
0,90001,12
1,90002,1
2,90003,7
3,90004,4
4,90005,4
5,90006,12
6,90007,18
7,90008,10
8,90012,18
9,90013,18


we form a data set of all the info merged along with its coordinates.

In [95]:


la_merged = la_temp.merge(la_df3, on="PostalCode", how="left")
la_merged = la_merged.merge(neighborhoods_venues_sorted.drop(['Borough'],1), on="PostalCode", how="left")
la_merged
# add clustering labels




Unnamed: 0,PostalCode,Cluster Labels,Borough,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,90001,12,"Florence-Graham, South Los Angeles (Los Angeles)",33.974027,-118.249509,Mexican Restaurant,Donut Shop,Pizza Place,Fast Food Restaurant,Pharmacy,Discount Store,Grocery Store,Sandwich Place,Fruit & Vegetable Store,Shoe Store
1,90002,1,Southeast Los Angeles (Los Angeles),33.949099,-118.246737,Park,Auto Garage,Diner,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Yoshoku Restaurant
2,90003,7,South Los Angeles/Broadway Manchester (Los Ang...,33.964131,-118.272783,Fast Food Restaurant,Southern / Soul Food Restaurant,Yoshoku Restaurant,Diner,Flea Market,Fish & Chips Shop,Filipino Restaurant,Farmers Market,Electronics Store,Eastern European Restaurant
3,90004,4,Hancock Park (Los Angeles),34.076198,-118.310722,Korean Restaurant,Coffee Shop,Bar,Japanese Restaurant,Cocktail Bar,Bakery,Sandwich Place,Seafood Restaurant,Salon / Barbershop,Smoke Shop
4,90005,4,Koreatown (Los Angeles),34.059163,-118.306892,Korean Restaurant,Japanese Restaurant,Bakery,Coffee Shop,Ice Cream Shop,Café,Concert Hall,Steakhouse,Brazilian Restaurant,South American Restaurant
5,90006,12,Byzantine-Latino Quarter (Los Angeles),34.048041,-118.294177,Pizza Place,Video Game Store,Spa,Mobile Phone Shop,Food Truck,Bus Station,Diner,Sandwich Place,Cosmetics Shop,Spanish Restaurant
6,90007,18,University Park (Los Angeles),34.028127,-118.284830,Coffee Shop,Shipping Store,Ramen Restaurant,Caribbean Restaurant,Food Truck,Korean Restaurant,Big Box Store,Juice Bar,Gastropub,Fraternity House
7,90008,10,Baldwin Hills (Los Angeles),34.009552,-118.346724,Clothing Store,Scenic Lookout,Dim Sum Restaurant,Discount Store,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Yoshoku Restaurant
8,90012,18,Chinatown (Los Angeles),34.065975,-118.238642,Chinese Restaurant,Bakery,Vietnamese Restaurant,Bar,Japanese Restaurant,Café,Brewery,Dim Sum Restaurant,French Restaurant,Recreation Center
9,90013,18,Downtown Fashion District (Los Angeles),34.045405,-118.240454,Japanese Restaurant,Sushi Restaurant,Coffee Shop,Ice Cream Shop,Ramen Restaurant,Bar,Gift Shop,Cocktail Bar,Bakery,Brewery


### At last we get the final clustered areas on our folium map

In [96]:

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, cluster in zip(la_merged['Latitude'], la_merged['Longitude'], la_merged['PostalCode'], la_merged['Borough'], la_merged['Cluster Labels']):
    label = folium.Popup('{} ({}): - Cluster {}'.format(bor, post, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters)
       
map_clusters
