## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Scrap the Wikipedia</a>

2.  <a href="#item2">Get coordinates</a>

3.  <a href="#item3">Clustering</a>
  
    </font>
    </div>

## 1. Scrap the Wikipedia <a id='item1'></a>

Lets use pandas feature to parse HTML pages. 

In [46]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import matplotlib.cm as cm
import matplotlib.colors as colors

all_tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
len(all_tables)

3

That is not a lot of tables - we can easily observe it. Starting with, let me guess, 0 and bingo - that's it. 

In [2]:
df_0 = all_tables[0]
df_0.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,M1ANot assigned,M2ANot assigned,M3ANorth York(Parkwoods),M4ANorth York(Victoria Village),M5ADowntown Toronto(Regent Park / Harbourfront),M6ANorth York(Lawrence Manor / Lawrence Heights),M7AQueen's Park(Ontario Provincial Government),M8ANot assigned,M9AEtobicoke(Islington Avenue)
1,M1BScarborough(Malvern / Rouge),M2BNot assigned,M3BNorth York(Don Mills)North,M4BEast York(Parkview Hill / Woodbine Gardens),"M5BDowntown Toronto(Garden District, Ryerson)",M6BNorth York(Glencairn),M7BNot assigned,M8BNot assigned,M9BEtobicoke(West Deane Park / Princess Garden...
2,M1CScarborough(Rouge Hill / Port Union / Highl...,M2CNot assigned,M3CNorth York(Don Mills)South(Flemingdon Park),M4CEast York(Woodbine Heights),M5CDowntown Toronto(St. James Town),M6CYork(Humewood-Cedarvale),M7CNot assigned,M8CNot assigned,M9CEtobicoke(Eringate / Bloordale Gardens / Ol...
3,M1EScarborough(Guildwood / Morningside / West ...,M2ENot assigned,M3ENot assigned,M4EEast Toronto(The Beaches),M5EDowntown Toronto(Berczy Park),M6EYork(Caledonia-Fairbanks),M7ENot assigned,M8ENot assigned,M9ENot assigned
4,M1GScarborough(Woburn),M2GNot assigned,M3GNot assigned,M4GEast York(Leaside),M5GDowntown Toronto(Central Bay Street),M6GDowntown Toronto(Christie),M7GNot assigned,M8GNot assigned,M9GNot assigned


We need to make it a bit nicer:
* Melt columns
* Slice the whole string by magical regexp functionality
* Replace ' \ ' in column 'Neighborhood' to required ', '
* Drop whatever we don't need

In [3]:
df = pd.melt(df_0, value_vars = [0, 1, 2, 3, 4, 5, 6, 7, 8])

df[['PostalCode', 'Borough', 'Neighborhood']]=df['value'].str.extract(r'^(\w{3})([^(]+)\(*([^)]*)\)*')

df.Neighborhood = df.Neighborhood.str.replace(' / ', ', ', regex=False)

df.drop(columns=['variable', 'value'], inplace=True)
df.drop(df[df.Borough == 'Not assigned'].index, inplace=True)
df.reset_index(drop=True, inplace=True)

df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


In [4]:
df.shape

(103, 3)

## 2. Get coordinates <a id='item2'></a>

Unfortunately geocoder.google() keeps returning REQUES_DENIED. So i go with pgeocode - it's working just fine.

In [47]:
#!pip install pgeocode
import pgeocode

geolocator = pgeocode.Nominatim('ca')

def get_coord_pgeocode(postal_code):
    g = geolocator.query_postal_code(postal_code)
    return [g.latitude, g.longitude]

df[['Latitude', 'Longitude']] = df.apply(lambda row: get_coord_pgeocode(row.PostalCode), axis=1, result_type='expand')
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.1930
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7878,-79.1564
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866
3,M1G,Scarborough,Woburn,43.7712,-79.2144
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389
...,...,...,...,...,...
98,M9N,York,Weston,43.7068,-79.5170
99,M9P,Etobicoke,Westmount,43.6949,-79.5323
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.6898,-79.5582
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.7432,-79.5876


## Clustering <a id='item3'></a>

I put my FourSquare credentials into so called 'hidden cell'

In [8]:
# @hidden_cell
CLIENT_ID = '3XTUIK02EJEMFC2IO4YEFSZJ35R1YSPJV2YKMSU2ULOZ0VH5' # your Foursquare ID
CLIENT_SECRET = 'AGJOXDDRMCVNXL331UVTZXTOF4DK4MGT44FMQKOJTNYOU5CE' # your Foursquare Secret

From this point i borrowed a lot of code from the lab)

In [11]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

def getNearbyVenues(postal_codes, latitudes, longitudes, radius=500):
    venues_list=[]
    for postal_code, lat, lng in zip(postal_codes, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
            # return only relevant information for each nearby venue
            venues_list.append([(
                postal_code, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
            nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
            nearby_venues.columns = ['PostalCode', 
                          'PostalCode Latitude', 
                          'PostalCode Longitude', 
                          'Venue', 
                          'Venue Latitude', 
                          'Venue Longitude', 
                          'Venue Category']
        except:
            print('Problem with retrieving venues for PostalCode: ', postal_code)
    return(nearby_venues)

Get venues and save the results - FourSquare puts a limit on paid requests

In [57]:
df_venues = getNearbyVenues(postal_codes=df['PostalCode'],
                            latitudes=df['Latitude'],
                            longitudes=df['Longitude']
                            )
df_venues.to_csv('Venues_of_Toronto.csv')
df_venues['Venue Category'].unique()

Problem with retrieving venues for PostalCode:  M7R


array(['Home Service', 'Fried Chicken Joint', 'Liquor Store',
       'Beer Store', 'Fast Food Restaurant', 'Bank', 'Sandwich Place',
       'Pizza Place', 'Pharmacy', 'Grocery Store', 'Sports Bar',
       'Coffee Shop', 'Bus Line', 'Restaurant', 'Supermarket',
       'Greek Restaurant', 'Donut Shop', 'Thrift / Vintage Store',
       'Convenience Store', 'Breakfast Spot', 'Bus Station',
       'Mexican Restaurant', 'Smoothie Shop', 'Chinese Restaurant',
       'Medical Center', 'Electronics Store', 'Korean BBQ Restaurant',
       'Gaming Cafe', 'Trail', 'Lounge', 'Park', 'Train Station',
       'Department Store', 'Discount Store', 'Light Rail Station',
       'Hockey Arena', 'Intersection', 'Soccer Field', 'Metro Station',
       'Bakery', 'Bistro', 'Ice Cream Shop', 'Gift Shop',
       'General Entertainment', 'Skating Rink', 'College Stadium', 'Café',
       'Event Service', 'Asian Restaurant', 'Auto Garage',
       'Latin American Restaurant', 'Newsagent', 'Badminton Court',
       

In [74]:
df_venues = pd.read_csv('Venues_of_Toronto.csv')
df_venues[['Venue Category', 'PostalCode']].groupby(['Venue Category'], as_index=False).count().sort_values('PostalCode', ascending=False).head(20)

Unnamed: 0,Venue Category,PostalCode
49,Coffee Shop,194
40,Café,87
200,Sandwich Place,87
195,Restaurant,61
182,Pizza Place,57
175,Park,54
130,Japanese Restaurant,50
17,Bank,50
220,Sushi Restaurant,47
16,Bakery,40


I've tried to cluster neighborhoods on this nice diversity of venue categories. Didn't so much. Let's make just 'Restaurant' from current fancy names and see what happens.

In [75]:
df_venues = pd.read_csv('Venues_of_Toronto.csv')

def change_venue_category_name(new_name):
    old_len = len(df_venues['Venue Category'].unique())
    df_venues.loc[df_venues['Venue Category'].str.contains(new_name), 'Venue Category'] = new_name
    print('{}: {} / {}'.format(new_name, old_len, len(df_venues['Venue Category'].unique())))

change_venue_category_name('Restaurant')

df_venues[['Venue Category', 'PostalCode']].groupby(['Venue Category'], as_index=False).count().sort_values('PostalCode', ascending=False).head(20)

Restaurant: 244 / 196


Unnamed: 0,Venue Category,PostalCode
158,Restaurant,493
41,Coffee Shop,194
163,Sandwich Place,87
34,Café,87
147,Pizza Place,57
141,Park,54
13,Bank,50
12,Bakery,40
91,Gym,37
101,Hotel,37


In [76]:
df_onehot = pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")
df_onehot.insert(0, 'PostalCode', df_venues['PostalCode']) 
df_onehot

Unnamed: 0,PostalCode,Accessories Store,Adult Boutique,Airport,Art Gallery,Arts & Crafts Store,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,...,Train,Train Station,Video Game Store,Video Store,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2045,M9W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2046,M9W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2047,M9W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2048,M9W,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [63]:
df_grouped = df_onehot.groupby('PostalCode').mean().reset_index()
df_grouped

Unnamed: 0,PostalCode,Accessories Store,Adult Boutique,Airport,Art Gallery,Arts & Crafts Store,Athletics & Sports,Auto Dealership,Auto Garage,BBQ Joint,...,Train,Train Station,Video Game Store,Video Store,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,M9M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93,M9P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
94,M9R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
95,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [64]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
df_venues_sorted = pd.DataFrame(columns=columns)
df_venues_sorted['PostalCode'] = df_grouped['PostalCode']

for ind in np.arange(df_grouped.shape[0]):
    df_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)

df_venues_sorted

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Home Service,Yoga Studio,Discount Store,Fish & Chips Shop,Field
1,M1E,Restaurant,Pizza Place,Bank,Pharmacy,Coffee Shop
2,M1G,Restaurant,Fish Market,Field,Farmers Market,Event Space
3,M1H,Lounge,Gaming Cafe,Trail,Yoga Studio,Distribution Center
4,M1J,Park,Grocery Store,Yoga Studio,Discount Store,Field
...,...,...,...,...,...,...
92,M9M,Restaurant,Grocery Store,Coffee Shop,Nightclub,Café
93,M9P,Pizza Place,Discount Store,Flea Market,Sandwich Place,Playground
94,M9R,Bank,Pharmacy,Sandwich Place,Bus Line,Pizza Place
95,M9V,Restaurant,Grocery Store,Auto Garage,Sandwich Place,Beer Store


Let's find out what k do we need for k-means clustering

In [65]:
df_grouped_clustering = df_grouped.drop('PostalCode', 1)

for kclusters in range(2, 10):
    kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_grouped_clustering)
    values, counts = np.unique(kmeans.labels_, return_counts=True)
    print(dict(zip(values, counts)))

{0: 64, 1: 33}
{0: 46, 1: 17, 2: 34}
{0: 53, 1: 27, 2: 6, 3: 11}
{0: 4, 1: 29, 2: 50, 3: 11, 4: 3}
{0: 30, 1: 45, 2: 2, 3: 5, 4: 14, 5: 1}
{0: 1, 1: 19, 2: 4, 3: 9, 4: 15, 5: 48, 6: 1}
{0: 5, 1: 50, 2: 7, 3: 1, 4: 1, 5: 16, 6: 16, 7: 1}
{0: 22, 1: 16, 2: 1, 3: 3, 4: 47, 5: 1, 6: 1, 7: 1, 8: 5}


I really don't see any gain from taking k more than 3

In [66]:
kclusters = 3

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_grouped_clustering)

df_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_merged = df
df_merged = df_merged.join(df_venues_sorted.set_index('PostalCode'), on='PostalCode')

df_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.1930,2.0,Home Service,Yoga Studio,Discount Store,Fish & Chips Shop,Field
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7878,-79.1564,,,,,,
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866,0.0,Restaurant,Pizza Place,Bank,Pharmacy,Coffee Shop
3,M1G,Scarborough,Woburn,43.7712,-79.2144,0.0,Restaurant,Fish Market,Field,Farmers Market,Event Space
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389,2.0,Lounge,Gaming Cafe,Trail,Yoga Studio,Distribution Center
...,...,...,...,...,...,...,...,...,...,...,...
98,M9N,York,Weston,43.7068,-79.5170,,,,,,
99,M9P,Etobicoke,Westmount,43.6949,-79.5323,2.0,Pizza Place,Discount Store,Flea Market,Sandwich Place,Playground
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.6898,-79.5582,2.0,Bank,Pharmacy,Sandwich Place,Bus Line,Pizza Place
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.7432,-79.5876,0.0,Restaurant,Grocery Store,Auto Garage,Sandwich Place,Beer Store


As we can see above there are only 97 postal code with any venues. We need to clean a bit

In [67]:
df_merged.dropna(subset=['Cluster Labels'], inplace=True)
df_merged['Cluster Labels'] = df_merged['Cluster Labels'].astype(int)

In [68]:
df_merged.loc[df_merged['Cluster Labels'] == 0, df_merged.columns[[0] + list(range(6, df_merged.shape[1]))]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,M1E,Restaurant,Pizza Place,Bank,Pharmacy,Coffee Shop
3,M1G,Restaurant,Fish Market,Field,Farmers Market,Event Space
10,M1P,Bakery,Event Service,Restaurant,Yoga Studio,Field
12,M1S,Restaurant,Skating Rink,Newsagent,Badminton Court,Breakfast Spot
13,M1T,Restaurant,Pizza Place,Pharmacy,Coffee Shop,Fried Chicken Joint
15,M1W,Restaurant,Bank,Coffee Shop,Sandwich Place,Electronics Store
18,M2J,Restaurant,Clothing Store,Coffee Shop,Cosmetics Shop,Mobile Phone Shop
21,M2M,Park,Playground,Restaurant,Yoga Studio,Diner
22,M2N,Restaurant,Coffee Shop,Pizza Place,Grocery Store,Bank
25,M3A,Construction & Landscaping,Restaurant,Food & Drink Shop,Park,Farmers Market


In [69]:
df_merged.loc[df_merged['Cluster Labels'] == 1, df_merged.columns[[0] + list(range(6, df_merged.shape[1]))]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,M1J,Park,Grocery Store,Yoga Studio,Discount Store,Field
17,M2H,Residential Building (Apartment / Condo),Park,Diner,Field,Farmers Market
20,M2L,Pool,Yoga Studio,Discount Store,Field,Farmers Market
23,M2P,Park,Convenience Store,Yoga Studio,Discount Store,Field
24,M2R,Bookstore,Park,Coffee Shop,Distribution Center,Fish & Chips Shop
26,M3B,Park,Pool,Yoga Studio,Discount Store,Field
27,M3C,Park,Gym,Trail,River,Yoga Studio
31,M3L,Home Service,Park,Pool,Yoga Studio,Diner
40,M4J,Convenience Store,Park,Intersection,Distribution Center,Fish & Chips Shop
44,M4N,Photography Studio,Park,Discount Store,Fish & Chips Shop,Field


In [70]:
df_merged.loc[df_merged['Cluster Labels'] == 2, df_merged.columns[[0] + list(range(6, df_merged.shape[1]))]]

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Home Service,Yoga Studio,Discount Store,Fish & Chips Shop,Field
4,M1H,Lounge,Gaming Cafe,Trail,Yoga Studio,Distribution Center
6,M1K,Coffee Shop,Convenience Store,Sandwich Place,Light Rail Station,Restaurant
7,M1L,Bakery,Coffee Shop,Intersection,Soccer Field,Bus Line
8,M1M,Ice Cream Shop,Pharmacy,Discount Store,Pizza Place,Sandwich Place
9,M1N,College Stadium,Skating Rink,General Entertainment,Café,Yoga Studio
11,M1R,Auto Garage,Yoga Studio,Distribution Center,Fish Market,Fish & Chips Shop
14,M1V,Pharmacy,Intersection,Discount Store,Fish & Chips Shop,Field
19,M2K,Flower Shop,Park,Golf Driving Range,Trail,Gas Station
30,M3K,Airport,Food Court,Shoe Store,Coffee Shop,Yoga Studio


Finally, let's visualize the resulting clusters

In [71]:
toronto_latitude = 43.6532
toronto_longitude = -79.3832
map_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhood'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters