## Introduction

I did this certification on coursera offered by IBM, In which we had to submit a capstone project on segmenting neighborhood of toronto city. In this notebook I am performing similar tasks as in capstone project to demonstrate skills which I gained through that course.

In this notebook I will use Foursquare API service to get nearby venues of zipcodes of Miami city. Then using KMeans clustering model to cluster zipcodes based on venue's data.

## Data Description

In this notebook I using data from three different sources 

1. List of ZIPCODE, I got this list from Miami-Dade County website, you can get this data from [here](https://gis-mdc.opendata.arcgis.com/datasets/fee863cb3da0417fa8b5aaf6b671f8a7_0/data).
2. Latitude and Longitude, I got this coordinates using geopy library.
3. Venues data, I got this data by calling Foursquare API.

### 1. Importing required Libraries

In [4]:
# Libraries for mathematical operations
import pandas as pd
import numpy as np

# Module for extracting Latitude and Longitude
#!pip install geocoder
#!pip install geopy
import geocoder
from geopy.geocoders import Nominatim

# Map rendering library
#!pip install folium
import folium

# Library to handle requests
import requests

# Tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize

# Sklearn module for k-means clustering
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

### 2. Importing and Extracing Data

In [5]:
temp_df = pd.read_csv("Zip_Code.csv")

In [6]:
temp_df.head()

Unnamed: 0,OBJECTID,PZIPCODEID,ZIP,ZIPCODE,GlobalID,SHAPE_Length,SHAPE_Area
0,1,33035,33035,33035,{FF9A785D-0F71-4541-9951-9C8D439B42C6},85419.892942,291742800.0
1,2,33010,33010,33010,{FF56A32A-B713-4ACF-BA60-E7A24DAADADC},19851.540063,13170510.0
2,3,33154,33154,33154,{E0EB98AC-FACD-48B0-8A7E-7E4DFD1AFBF0},15339.770689,10036690.0
3,4,33037,33037,33037,{7A7E3954-A67A-4DAC-940F-89D62D9AA090},3500.649627,547077.8
4,5,33116,33116,33116,{8947A873-8AE0-4D30-98A1-E36163E1673C},802.531871,30251.13


Notice our dataset has lot of unwanted columns for this project, we just need column 'ZIPCODE'. So, lets create a new dataframe with just 'ZIPCODE' column.

In [7]:
df = pd.DataFrame(temp_df['ZIPCODE'])

In [9]:
print(df.shape)
df.head()

(88, 1)


Unnamed: 0,ZIPCODE
0,33035
1,33010
2,33154
3,33037
4,33116


Now, we have the list of zipcodes, lets get Latitudes and Longitudes of this zipcodes. But before getting latitudes and longitudes we have to create a new datafame to store this lat's and long's. So, let's do that.

In [10]:
column_names = ['Latitude','Longitude']
cor = pd.DataFrame(columns = column_names)
cor

Unnamed: 0,Latitude,Longitude


Using geopy library to get the latitude and longitudes values of miami zipcodes

In [11]:
# get latitude and Longitude
address = df['ZIPCODE']

count = 0
for add in address:
    geolocator = Nominatim(user_agent = "miami-explorer")
    location = geolocator.geocode(add, country_codes = "US")
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of Miami are {}, {}.'.format(latitude, longitude))
    count+=1
    print(count)
    cor = cor.append({'Latitude': latitude,
              'Longitude': longitude}, ignore_index = True)

The geograpical coordinate of Miami are 25.452762826613146, -80.44824438369251.
1
The geograpical coordinate of Miami are 25.832095724189735, -80.2778566739543.
2
The geograpical coordinate of Miami are 25.882990117365495, -80.12807849051016.
3
The geograpical coordinate of Miami are 25.145891892250248, -80.40270233194873.
4
The geograpical coordinate of Miami are 25.6713949, -80.3740704.
5
The geograpical coordinate of Miami are 25.595498092943803, -80.35993762710132.
6
The geograpical coordinate of Miami are 25.8655287, -80.1931817.
7
The geograpical coordinate of Miami are 25.6906295, -80.3880968.
8
The geograpical coordinate of Miami are 25.817255917096144, -80.13131868998339.
9
The geograpical coordinate of Miami are 25.841655683333332, -80.18183353333335.
10
The geograpical coordinate of Miami are 25.760052252134507, -80.14302538181724.
11
The geograpical coordinate of Miami are 25.85073469777749, -80.23622924616289.
12
The geograpical coordinate of Miami are 25.767487853631046, 

In [12]:
cor.head()

Unnamed: 0,Latitude,Longitude
0,25.452763,-80.448244
1,25.832096,-80.277857
2,25.88299,-80.128078
3,25.145892,-80.402702
4,25.671395,-80.37407


Now, we got lat's and long's. Let's join 'df' and 'corr'

In [16]:
df = df.join(cor, how='outer')

In [17]:
df

Unnamed: 0,ZIPCODE,Latitude,Longitude
0,33035,25.452763,-80.448244
1,33010,25.832096,-80.277857
2,33154,25.882990,-80.128078
3,33037,25.145892,-80.402702
4,33116,25.671395,-80.374070
...,...,...,...
83,33157,25.606533,-80.349050
84,33033,25.485145,-80.431807
85,33032,25.529617,-80.396254
86,33144,25.763164,-80.309172


Now lets plot this zipcodes on map using there latitude's and longitude's

In [20]:
# create map of New York using latitude and longitude values
map_miami = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, zipcode in zip(df['Latitude'], df['Longitude'], df['ZIPCODE']):
    label = '{}'.format(zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng], 
    radius = 5,
    color = 'blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_miami)
    
map_miami

**Folium** is a great visualization library. Feel free to zoom into the above map.

Now, our data is looking good, lets get neverby venues for each zipcode from Foursquare using API.

To get data from Foursquare, we should have Foursquare developer acccount and have credentials.

In [21]:
CLIENT_ID = 'MEEOYI5MLORDKBROGLTY1FQZSC5EGNO5EL4OUZYMHYRB113N' # your Foursquare ID
CLIENT_SECRET = 'JJSBHTXOMX5QPBDS54NGRXB2E5OGFAMRPLRNOONLBHIH22IM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MEEOYI5MLORDKBROGLTY1FQZSC5EGNO5EL4OUZYMHYRB113N
CLIENT_SECRET:JJSBHTXOMX5QPBDS54NGRXB2E5OGFAMRPLRNOONLBHIH22IM


Lets expore the first zip in our dataframe

In [22]:
df.loc[0, 'ZIPCODE'] 

33035

In [28]:
zip_latitude = df.loc[0, 'Latitude'] # zipcode latitude value
zip_longitude = df.loc[0, 'Longitude'] # zipcode longitude value

zip_code = df.loc[0, 'ZIPCODE'] # zipcode value

print('Latitude and longitude values of {} are {}, {}.'.format(zip_code, 
                                                               zip_latitude, 
                                                               zip_longitude))

Latitude and longitude values of 33035 are 25.452762826613146, -80.44824438369251.


Now, lets get the top 100 venues that are in '33035' within a radius of 500 meters

In [29]:
LIMIT = 100

radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    zip_latitude, 
    zip_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=MEEOYI5MLORDKBROGLTY1FQZSC5EGNO5EL4OUZYMHYRB113N&client_secret=JJSBHTXOMX5QPBDS54NGRXB2E5OGFAMRPLRNOONLBHIH22IM&v=20180605&ll=25.452762826613146,-80.44824438369251&radius=500&limit=100'

Our parameters are set lets request API

In [30]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6022abc51870aa33068dd693'},
  'headerLocation': 'Keys Gate',
  'headerFullLocation': 'Keys Gate, Homestead',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 25.45726283111315,
    'lng': -80.44326996354867},
   'sw': {'lat': 25.44826282211314, 'lng': -80.45321880383635}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '56991e54498efe6220694bd5',
       'name': 'in a uber!',
       'location': {'lat': 25.45613156647947,
        'lng': -80.44965286586,
        'labeledLatLngs': [{'label': 'display',
          'lat': 25.45613156647947,
          'lng': -80.44965286586}],
        'distance': 400,
        'cc': 'US',
        'city': 'Miami',
        'state': 'FL',
        'country': '

In [31]:
# function that extracts the category of the venu
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [32]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,in a uber!,Moving Target,25.456132,-80.449653
1,La Proveedora Supermarket,Grocery Store,25.450735,-80.452446


Now lets get venues for each zipcode and load that data into "miami_venues" dataframe

In [33]:
def getNearbyVenues(zipCode, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for zipCode, lat, lng in zip(zipCode, latitudes, longitudes):
        print(zipCode)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            continue
        # return only relevant information for each nearby venue
        venues_list.append([(
            zipCode, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zipcode', 
                  'Zipcode Latitude', 
                  'Zipcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
miami_venues = getNearbyVenues(zipCode=df['ZIPCODE'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

33035
33010
33154
33037
33116
33197
33153
33283
33140
33101
33109
33147
33192
33018
33034
33015
33016
33155
33146
33181
33039
33178
33151
33190
33169
33152
33193
33128
33131
33122
33143
33132
33166
33135
33165
33031
33161
33162
33179
33167
33176
33160
33196
33141
33180
33185
33056
33055
33186
33054
33127
33129
33142
33189
33158
33130
33187
33136
33137
33139
33125
33145
33134
33138
33133
33150
33156
33149
33173
33170
33012
33014
33168
33183
33182
33184
33174
33172
33175
33030
33013
33194
33177
33157
33033
33032
33144
33126


In [37]:
print(miami_venues.shape)
miami_venues.head()

(1101, 7)


Unnamed: 0,Zipcode,Zipcode Latitude,Zipcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,33035,25.452763,-80.448244,in a uber!,25.456132,-80.449653,Moving Target
1,33035,25.452763,-80.448244,La Proveedora Supermarket,25.450735,-80.452446,Grocery Store
2,33010,25.832096,-80.277857,Sedano's,25.830828,-80.274033,Grocery Store
3,33010,25.832096,-80.277857,Panda Animal Clinic,25.830336,-80.274817,Veterinarian
4,33010,25.832096,-80.277857,Sylvestre Pharmacy,25.833344,-80.281573,Pharmacy


### 3.Explore and Prepare Data 

In [38]:
miami_venues.groupby('Zipcode').count()

Unnamed: 0_level_0,Zipcode Latitude,Zipcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
33010,7,7,7,7,7,7
33012,17,17,17,17,17,17
33013,4,4,4,4,4,4
33014,4,4,4,4,4,4
33015,14,14,14,14,14,14
...,...,...,...,...,...,...
33193,18,18,18,18,18,18
33194,6,6,6,6,6,6
33196,3,3,3,3,3,3
33197,18,18,18,18,18,18


In [39]:
print('There are {} uniques categories.'.format(len(miami_venues['Venue Category'].unique())))

There are 211 uniques categories.


In [40]:
# one hot encoding
miami_onehot = pd.get_dummies(miami_venues[['Venue Category']], prefix='', prefix_sep="")

# add zipcode column back to dataframe
miami_onehot['Zipcode'] = miami_venues['Zipcode']

# move zipcode column to the first column
fixed_columns = [miami_onehot.columns[-1]] + list(miami_onehot.columns[:-1])
miami_onehot = miami_onehot[fixed_columns]

miami_onehot.head()

Unnamed: 0,Zipcode,ATM,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,33035,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,33035,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,33010,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,33010,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,33010,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
miami_onehot.shape

(1101, 212)

In [42]:
miami_grouped = miami_onehot.groupby('Zipcode').mean().reset_index()
miami_grouped

Unnamed: 0,Zipcode,ATM,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,33010,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
1,33012,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.058824,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
2,33013,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
3,33014,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
4,33015,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.071429,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80,33193,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.055556,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
81,33194,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
82,33196,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
83,33197,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.000000,0.055556,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0


In [43]:
miami_grouped.shape

(85, 212)

Let's find out frequencies of top five venues in each zipcode

In [44]:
num_top_venues = 5

for hood in miami_grouped['Zipcode']:
    print("----"+str(hood)+"----")
    temp = miami_grouped[miami_grouped['Zipcode'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----33010----
                       venue  freq
0           Cuban Restaurant  0.29
1  Latin American Restaurant  0.14
2              Grocery Store  0.14
3               Veterinarian  0.14
4                   Pharmacy  0.14


----33012----
                  venue  freq
0  Fast Food Restaurant  0.18
1     Mobile Phone Shop  0.12
2                  Bank  0.12
3   American Restaurant  0.12
4        Breakfast Spot  0.06


----33013----
                  venue  freq
0      Cuban Restaurant  0.25
1                  Café  0.25
2           Pizza Place  0.25
3  Fast Food Restaurant  0.25
4                   ATM  0.00


----33014----
         venue  freq
0       Bakery  0.25
1   Food Truck  0.25
2  Pizza Place  0.25
3         Bank  0.25
4          ATM  0.00


----33015----
                       venue  freq
0       Gym / Fitness Center  0.14
1                  Pet Store  0.07
2       Fast Food Restaurant  0.07
3  South American Restaurant  0.07
4                 Food Truck  0.07


----33016----


In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [46]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zipcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zipCode_venues_sorted = pd.DataFrame(columns=columns)
zipCode_venues_sorted['Zipcode'] = miami_grouped['Zipcode']

for ind in np.arange(miami_grouped.shape[0]):
    zipCode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(miami_grouped.iloc[ind, :], num_top_venues)

zipCode_venues_sorted.head()

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,33010,Cuban Restaurant,Fast Food Restaurant,Latin American Restaurant,Veterinarian,Pharmacy,Grocery Store,Yoga Studio,Food Truck,Food Stand,Food Court
1,33012,Fast Food Restaurant,Mobile Phone Shop,American Restaurant,Bank,Miscellaneous Shop,Sandwich Place,Sporting Goods Shop,Mexican Restaurant,Smoothie Shop,Breakfast Spot
2,33013,Cuban Restaurant,Pizza Place,Café,Fast Food Restaurant,Yoga Studio,Financial or Legal Service,French Restaurant,Food Truck,Food Stand,Food Court
3,33014,Food Truck,Pizza Place,Bank,Bakery,Yoga Studio,Fish Market,Fried Chicken Joint,French Restaurant,Food Stand,Food Court
4,33015,Gym / Fitness Center,Liquor Store,Donut Shop,Breakfast Spot,South American Restaurant,Fast Food Restaurant,Park,Coffee Shop,Food Truck,Pet Store


### Building KMeans Model

In [47]:
from sklearn.cluster import KMeans

In [114]:
# set number of clusters
kclusters = 5

miami_grouped_clustering = miami_grouped.drop('Zipcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(miami_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 4, 0, 0, 4, 0, 0, 2])

In [115]:
# add clustering labels
zipCode_venues_sorted['Cluster Labels'] = kmeans.labels_

miami_merged = df

miami_merged = miami_merged.join(zipCode_venues_sorted.set_index('Zipcode'), on='ZIPCODE')

miami_merged.head() # check the last columns!

Unnamed: 0,ZIPCODE,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,33035,25.452763,-80.448244,Grocery Store,Moving Target,Yoga Studio,Financial or Legal Service,Fried Chicken Joint,French Restaurant,Food Truck,Food Stand,Food Court,Food & Drink Shop,0.0
1,33010,25.832096,-80.277857,Cuban Restaurant,Fast Food Restaurant,Latin American Restaurant,Veterinarian,Pharmacy,Grocery Store,Yoga Studio,Food Truck,Food Stand,Food Court,4.0
2,33154,25.88299,-80.128078,Kosher Restaurant,Grocery Store,Food,Jewelry Store,Bank,Ice Cream Shop,Hotel,Hobby Shop,Harbor / Marina,French Restaurant,0.0
3,33037,25.145892,-80.402702,Seafood Restaurant,Pub,American Restaurant,Mexican Restaurant,Athletics & Sports,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food & Drink Shop,0.0
4,33116,25.671395,-80.37407,Sandwich Place,Japanese Restaurant,Cuban Restaurant,Caribbean Restaurant,Event Space,Hookah Bar,Café,Chinese Restaurant,Cosmetics Shop,Gas Station,0.0


In [116]:
miami_merged.tail()

Unnamed: 0,ZIPCODE,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
83,33157,25.606533,-80.34905,Rental Car Location,Dog Run,Smoke Shop,Bank,Other Repair Shop,Auto Dealership,Seafood Restaurant,Convenience Store,Pizza Place,Financial or Legal Service,0.0
84,33033,25.485145,-80.431807,,,,,,,,,,,
85,33032,25.529617,-80.396254,Speakeasy,Yoga Studio,Frozen Yogurt Shop,French Restaurant,Food Truck,Food Stand,Food Court,Food & Drink Shop,Food,Flower Shop,2.0
86,33144,25.763164,-80.309172,Fast Food Restaurant,Discount Store,Pharmacy,Fried Chicken Joint,Latin American Restaurant,Mobile Phone Shop,Grocery Store,Seafood Restaurant,Spanish Restaurant,Sandwich Place,4.0
87,33126,25.776086,-80.29164,Italian Restaurant,Fast Food Restaurant,Pharmacy,Grocery Store,Department Store,Cuban Restaurant,Paper / Office Supplies Store,Restaurant,Salad Place,Sandwich Place,4.0


In [117]:
miami_merged.shape

(88, 14)

In [118]:
miami_merged.isnull().sum()

ZIPCODE                   0
Latitude                  0
Longitude                 0
1st Most Common Venue     3
2nd Most Common Venue     3
3rd Most Common Venue     3
4th Most Common Venue     3
5th Most Common Venue     3
6th Most Common Venue     3
7th Most Common Venue     3
8th Most Common Venue     3
9th Most Common Venue     3
10th Most Common Venue    3
Cluster Labels            3
dtype: int64

In [119]:
miami_merged = miami_merged.dropna(axis = 0)

In [120]:
miami_merged['10th Most Common Venue'].isnull().sum().sum()

0

In [121]:
miami_merged.shape

(85, 14)

In [122]:
miami_merged['Cluster Labels'] = miami_merged['Cluster Labels'].astype(int)

In [123]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [124]:
miami_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 85 entries, 0 to 87
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ZIPCODE                 85 non-null     int64  
 1   Latitude                85 non-null     float64
 2   Longitude               85 non-null     float64
 3   1st Most Common Venue   85 non-null     object 
 4   2nd Most Common Venue   85 non-null     object 
 5   3rd Most Common Venue   85 non-null     object 
 6   4th Most Common Venue   85 non-null     object 
 7   5th Most Common Venue   85 non-null     object 
 8   6th Most Common Venue   85 non-null     object 
 9   7th Most Common Venue   85 non-null     object 
 10  8th Most Common Venue   85 non-null     object 
 11  9th Most Common Venue   85 non-null     object 
 12  10th Most Common Venue  85 non-null     object 
 13  Cluster Labels          85 non-null     int32  
dtypes: float64(2), int32(1), int64(1), object(10

In [125]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(miami_merged['Latitude'], miami_merged['Longitude'], miami_merged['ZIPCODE'], miami_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters