# Final Project:
## New Facility Location Selection
### by: Jeffrey Dupree

This notebook will scrape neighborhood information from a ZIP-CODES.COM page https://www.zip-codes.com/state/fl.asp#zipcodes to create a dataframe consisting of the Zip Code, the City name, County name and the Zip Code type.

#### Section One: Scrape Tampa, FL ZIP Codes from website

First, we install the necessary libraries.

In [1]:
# If you don't have these packages available, uncomment the appropriate lines below to install them.

import sys
#!{sys.executable} -m pip install beautifulsoup4
#!{sys.executable} -m pip install lxml
#!{sys.executable} -m pip install requests

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

Next, we need to get the information from the webpage using `requests.get`.

In [2]:
source = requests.get('https://www.zip-codes.com/state/fl.asp#zipcodes').text

Use the BeautifulSoup package to scrape the information from the webpage. I used the lxml parsing method, but you can use any you like.

In [3]:
soup = BeautifulSoup(source, 'lxml')

Find the table using `soup.find` from BeautifulSoup. Uncomment the second line to see the structure and content of the table. The tags are needed for the next steps.

In [4]:
table = soup.find(id="tblZIP")
# print(table.prettify())

Now a pandas dataframe needs to be created. This will require looping through the elements from the table and assigning the to a list. The list can then be made into a dataframe using `pd.DataFrame`. The columns will need header names. I manually assigned these instead of pulling them from the BeautifulSoup object `table`.

In [5]:
table_rows = table.find_all('tr')

res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)

# Label the columns.
df = pd.DataFrame(res[1:], columns=['Zip_Code','City','County','Type'])

# Remove the text 'Zip Code' from the records in the Zip Code column.
df['Zip_Code'] = df['Zip_Code'].str[-5:]

# Select only the Zip Codes for Tampa, FL.
df = df.loc[df['City'] == "Tampa"]

Next remove the rows where the type is "P.O. Box".

In [6]:
# Remove rows with Type = "P.O. Box" and "Unique", and reset the index to start at 0
df = df[df.Type == 'Standard']
df = df.reset_index(drop=True)

The resulting dataframe looks like this.

In [7]:
df

Unnamed: 0,Zip_Code,City,County,Type
0,33602,Tampa,Hillsborough,Standard
1,33603,Tampa,Hillsborough,Standard
2,33604,Tampa,Hillsborough,Standard
3,33605,Tampa,Hillsborough,Standard
4,33606,Tampa,Hillsborough,Standard
5,33607,Tampa,Hillsborough,Standard
6,33609,Tampa,Hillsborough,Standard
7,33610,Tampa,Hillsborough,Standard
8,33611,Tampa,Hillsborough,Standard
9,33612,Tampa,Hillsborough,Standard


Check the size of the dataframe.

In [8]:
df.shape

(26, 4)

#### Section Two: Geolocate ZIP Codes

In [9]:
# @hidden_cell
user_agent = "JGD_20191006"

In [10]:
import re

# Uncomment next line to install geopy if necessary.
!{sys.executable} -m pip install geopy

from tqdm import tqdm #This will allow a progress bar to show that there is progress being made. This is helpful when an
tqdm.pandas()         #iterative process may take more than a few seconds.

from functools import partial #This will allow multiple arguments to be passed to RateLimiter.

import geopy
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent=user_agent)
from geopy.extra.rate_limiter import RateLimiter #This will get around getting shut down for too many request errors.
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=0.5, max_retries=2, error_wait_seconds=5.0, swallow_exceptions=True, return_value_on_exception=None)
df['location'] = df['Zip_Code'].progress_apply(partial(geocode, country_codes='us'))

df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)
df.head()

  from pandas import Panel
  0%|                                                                                           | 0/26 [00:00<?, ?it/s]



100%|██████████████████████████████████████████████████████████████████████████████████| 26/26 [00:27<00:00,  1.04s/it]


Unnamed: 0,Zip_Code,City,County,Type,location,point
0,33602,Tampa,Hillsborough,Standard,"(Tampa Heights, Tampa, Hillsborough County, Fl...","(27.964132, -82.459452, 0.0)"
1,33603,Tampa,Hillsborough,Standard,"(Tampa, Florida, 33603, USA, (27.9824664343196...","(27.9824664343196, -82.4630092025027, 0.0)"
2,33604,Tampa,Hillsborough,Standard,"(Sulphur Springs, Tampa, Hillsborough County, ...","(28.0127051, -82.4665599, 0.0)"
3,33605,Tampa,Hillsborough,Standard,"(East Ybor, Tampa, Hillsborough County, Florid...","(27.96589, -82.4209639, 0.0)"
4,33606,Tampa,Hillsborough,Standard,"(Davis Islands, Tampa, Hillsborough County, Fl...","(27.9368959, -82.4596737, 0.0)"


In [11]:
df[['Latitude','Longitude','3']] = pd.DataFrame(df['point'].tolist(), index=df.index)
df = df.drop(columns=['point','3'])

Now there are latitude and longitude values for each of the postal codes.

In [12]:
df.head()

Unnamed: 0,Zip_Code,City,County,Type,location,Latitude,Longitude
0,33602,Tampa,Hillsborough,Standard,"(Tampa Heights, Tampa, Hillsborough County, Fl...",27.964132,-82.459452
1,33603,Tampa,Hillsborough,Standard,"(Tampa, Florida, 33603, USA, (27.9824664343196...",27.982466,-82.463009
2,33604,Tampa,Hillsborough,Standard,"(Sulphur Springs, Tampa, Hillsborough County, ...",28.012705,-82.46656
3,33605,Tampa,Hillsborough,Standard,"(East Ybor, Tampa, Hillsborough County, Florid...",27.96589,-82.420964
4,33606,Tampa,Hillsborough,Standard,"(Davis Islands, Tampa, Hillsborough County, Fl...",27.936896,-82.459674


#### Section Three

In [13]:
import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# uncomment this line if you haven't completed the Foursquare API lab
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

In [14]:
# create map of Tampa using latitude and longitude values
tampa = geolocator.geocode({"state": "fl", "city": "tampa"})
map_tampa = folium.Map(location=[tampa.latitude, tampa.longitude], zoom_start=10)

# add markers to map
for lat, lng, county, city in zip(df['Latitude'], df['Longitude'], df['County'], df['City']):
    label = '{}, {}'.format(county, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tampa)  
    
map_tampa

In [15]:
# @hidden_cell
CLIENT_ID = 'MAI43NUPMV0YXXNFKS2XVGUPBMIB5SBO5T5W5FV4ZND2VTJW' # your Foursquare ID
CLIENT_SECRET = 'V1POSAELWQ0NIURPOW2C43LH2FTO5NJ0VGYQXMSD2OGRLEND' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

Create the url that will query the Foursquare API for the top 100 venues within 500 meters of the location. The cell above assigns the client ID and client secret to variables that will be called below.

In [16]:
search_lat = df.Latitude[0]
search_lon = df.Longitude[0]
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    search_lat, 
    search_lon, 
    radius, 
    LIMIT)


In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5da6ee6f787dba002bebb13f'},
 'response': {'headerLocation': 'Village of Tampa',
  'headerFullLocation': 'Village of Tampa, University',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 10,
  'suggestedBounds': {'ne': {'lat': 27.968632004500005,
    'lng': -82.45436663726198},
   'sw': {'lat': 27.959631995499993, 'lng': -82.46453736273801}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b2fef04f964a520d7f224e3',
       'name': 'YMCA',
       'location': {'address': '110 E Palm Ave',
        'crossStreet': 'Tampa St',
        'lat': 27.962331199942593,
        'lng': -82.4597459295884,
        'labeledLatLngs': [{'label': 'display',
          'lat': 27.962331199942593,
          'lng': -82.4597459295884}],
        'di

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,YMCA,Gym / Fitness Center,27.962331,-82.459746
1,Lee's Grocery,Pizza Place,27.964653,-82.455431
2,The Hall On Franklin,Food Court,27.959731,-82.460077
3,Gold Ring Cafe,Spanish Restaurant,27.966619,-82.461129
4,Family Dollar,Discount Store,27.966373,-82.460462


This creates a function for using the Foursquare API to find the nearby venues for all of the boroughs in the dataframe.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zip_Code', 
                  'Zip Latitude', 
                  'Zip Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
tampa_venues = getNearbyVenues(names=df['Zip_Code'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

33602
33603
33604
33605
33606
33607
33609
33610
33611
33612
33613
33614
33615
33616
33617
33618
33619
33621
33624
33625
33626
33629
33634
33635
33637
33647


In [22]:
print(tampa_venues.shape)
tampa_venues.head()

(591, 7)


Unnamed: 0,Zip_Code,Zip Latitude,Zip Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,33602,27.964132,-82.459452,Lee's Grocery,27.964653,-82.455431,Pizza Place
1,33602,27.964132,-82.459452,YMCA,27.962331,-82.459746,Gym / Fitness Center
2,33602,27.964132,-82.459452,Ulele,27.959821,-82.462889,New American Restaurant
3,33602,27.964132,-82.459452,Water Works Park,27.959167,-82.462802,Park
4,33602,27.964132,-82.459452,Heights Public Market At Tampa Armature Works,27.960934,-82.463941,Market


In [23]:
tampa_venues['Venue Category'].unique()

array(['Pizza Place', 'Gym / Fitness Center', 'New American Restaurant',
       'Park', 'Market', 'Brewery', 'Coffee Shop',
       'Latin American Restaurant', 'Sushi Restaurant', 'Steakhouse',
       'Ice Cream Shop', 'Supermarket', 'Food Court', 'Wine Bar',
       'Sandwich Place', 'Spanish Restaurant', 'Cuban Restaurant',
       'Ramen Restaurant', 'Discount Store', 'Donut Shop',
       'Fast Food Restaurant', 'Breakfast Spot', 'Intersection',
       'Nature Preserve', 'Train Station', 'Seafood Restaurant',
       'Boutique', 'Beach', 'Moving Target', 'Flea Market',
       'Performing Arts Venue', 'Convenience Store', 'Food Truck',
       'Martial Arts Dojo', 'Grocery Store', 'Chinese Restaurant',
       'Burger Joint', 'Playground', 'Pharmacy', 'Sporting Goods Shop',
       'Antique Shop', 'Lawyer', 'Bike Shop', 'Record Shop',
       'Salon / Barbershop', 'Furniture / Home Store',
       'Thrift / Vintage Store', 'Art Gallery', 'Speakeasy',
       'Deli / Bodega', 'Zoo', 'Zoo Exhib

As you can see above, there are several venue categories that could be generally categorized as a 'gym'. There are other venue categories that are not necesarily types of gyms, but might compete with a gym as a place where people go to be active. Another venue that would compete with a gym is 'Military Base'. Military bases have gyms and fitness centers for military members at no cost. We will need to recode these categories with a common category name (i.e., gym).

In [65]:
gym = ['Gym / Fitness Center', 'Park', 'Martial Arts Dojo', 'Gym', 'Pool', 'Tennis Court', 'Disc Golf', 'Volleyball Court',
       'Soccer Field', 'Basketball Court', 'Yoga Studio', 'College Basketball Court', 'College Gym','College Track',
       'Dance Studio', 'Military Base', 'Athletics & Sports', 'Golf Course', 'Baseball Field', 'Trail', 'Hockey Arena',
       'Hockey Field', 'Track', 'Water Park', 'Outdoors & Recreation', 'State / Provincial Park', 'Playground']

In [66]:
tampa_venues['Venue Category'].replace(to_replace =gym, value ="Gym", inplace=True)

In [67]:
tampa_venues.groupby('Zip_Code').count()

Unnamed: 0_level_0,Zip Latitude,Zip Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zip_Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
33602,43,43,43,43,43,43
33603,21,21,21,21,21,21
33604,42,42,42,42,42,42
33605,13,13,13,13,13,13
33606,65,65,65,65,65,65
33607,4,4,4,4,4,4
33609,17,17,17,17,17,17
33610,12,12,12,12,12,12
33611,7,7,7,7,7,7
33612,23,23,23,23,23,23


In [68]:
print('There are {} uniques categories.'.format(len(tampa_venues['Venue Category'].unique())))

There are 153 uniques categories.


We use one-hot encoding to determine if a venue type exists in a neighborhood. This will create a column for each of the unique categories, and assign a value of 1 if that venue type exists in the neighborhood or 0 otherwise.

In [69]:
# one hot encoding
tampa_onehot = pd.get_dummies(tampa_venues[['Venue Category']], prefix="", prefix_sep="")

# add zip code column back to dataframe
tampa_onehot['Zip_Code'] = tampa_venues['Zip_Code'] 

# move zip code column to the first column
fixed_columns = [tampa_onehot.columns[-1]] + list(tampa_onehot.columns[:-1])
tampa_onehot = tampa_onehot[fixed_columns]

tampa_onehot.head()

Unnamed: 0,Zip_Code,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,Assisted Living,Automotive Shop,...,Train Station,Tree,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Zoo,Zoo Exhibit
0,33602,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,33602,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,33602,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,33602,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,33602,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [70]:
tampa_onehot.shape

(591, 154)

With the one-hot encoded data, we can determine the frequency with which each venue type occurs in each borough. This results in a dataframe with a column for each unique venue type and a row for each unique borough.

In [71]:
tampa_grouped = tampa_onehot.groupby('Zip_Code').mean().reset_index()
tampa_grouped.head()

Unnamed: 0,Zip_Code,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,Assisted Living,Automotive Shop,...,Train Station,Tree,Turkish Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Zoo,Zoo Exhibit
0,33602,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0
1,33603,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,33604,0.0,0.02381,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.214286
3,33605,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,33606,0.0,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0


Next we will determine the five most frequent venues within a borough to describe a neighborhood 'type', and group the borough by type symilarity.

In [72]:
num_top_venues = 5

for hood in tampa_grouped['Zip_Code']:
    print("----"+hood+"----")
    temp = tampa_grouped[tampa_grouped['Zip_Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----33602----
                     venue  freq
0                      Gym  0.12
1     Fast Food Restaurant  0.05
2                  Brewery  0.05
3  New American Restaurant  0.05
4           Discount Store  0.05


----33603----
                    venue  freq
0            Intersection  0.14
1                     Gym  0.05
2  Thrift / Vintage Store  0.05
3  Furniture / Home Store  0.05
4           Grocery Store  0.05


----33604----
                          venue  freq
0                   Zoo Exhibit  0.21
1                           Gym  0.14
2                   Coffee Shop  0.07
3  Theme Park Ride / Attraction  0.05
4                   Pizza Place  0.05


----33605----
                             venue  freq
0                              Gym  0.23
1           Furniture / Home Store  0.08
2                    Train Station  0.08
3       Construction & Landscaping  0.08
4  Southern / Soul Food Restaurant  0.08


----33606----
         venue  freq
0          Gym  0.20
1        Hotel  

In [73]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [74]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zip_Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zip_venues_sorted = pd.DataFrame(columns=columns)
zip_venues_sorted['Zip_Code'] = tampa_grouped['Zip_Code']

for ind in np.arange(tampa_grouped.shape[0]):
    zip_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tampa_grouped.iloc[ind, :], num_top_venues)

zip_venues_sorted.head()

Unnamed: 0,Zip_Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,33602,Gym,Fast Food Restaurant,Coffee Shop,New American Restaurant,Discount Store
1,33603,Intersection,Speakeasy,Furniture / Home Store,Pharmacy,Record Shop
2,33604,Zoo Exhibit,Gym,Coffee Shop,Bar,Food
3,33605,Gym,Record Shop,Furniture / Home Store,Bus Station,Construction & Landscaping
4,33606,Gym,Hotel,Coffee Shop,Bar,Café


Now that we can see what the five most common venues are in each Zip Code are, we can eliminate those Zip Codes with 'gym' type venues in the top five.

In [None]:
zip_venues_reduced = 

#### K-means Clustering

Using a k-means clustering, we group the boroughs by similarity of venues available. For this example we chose 5 clusters, but this can be adjusted by setting the `kclusters` variable to the desired number of clusters in the code below.

In [75]:
# set number of clusters
kclusters = 5

tampa_grouped_clustering = tampa_grouped.drop('Zip_Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tampa_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 2, 0, 0, 3, 0])

Each Zip Code is now assigned to one of five clusters, indexed as 0-2.

In [76]:
# add clustering labels
zip_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tampa_merged = df

# merge tampa_grouped with tampa_data to add latitude/longitude for each neighborhood
tampa_merged = tampa_merged.join(zip_venues_sorted.set_index('Zip_Code'), on='Zip_Code')

tampa_merged.head() # check the last columns!

Unnamed: 0,Zip_Code,City,County,Type,location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,33602,Tampa,Hillsborough,Standard,"(Tampa Heights, Tampa, Hillsborough County, Fl...",27.964132,-82.459452,0,Gym,Fast Food Restaurant,Coffee Shop,New American Restaurant,Discount Store
1,33603,Tampa,Hillsborough,Standard,"(Tampa, Florida, 33603, USA, (27.9824664343196...",27.982466,-82.463009,0,Intersection,Speakeasy,Furniture / Home Store,Pharmacy,Record Shop
2,33604,Tampa,Hillsborough,Standard,"(Sulphur Springs, Tampa, Hillsborough County, ...",28.012705,-82.46656,0,Zoo Exhibit,Gym,Coffee Shop,Bar,Food
3,33605,Tampa,Hillsborough,Standard,"(East Ybor, Tampa, Hillsborough County, Florid...",27.96589,-82.420964,0,Gym,Record Shop,Furniture / Home Store,Bus Station,Construction & Landscaping
4,33606,Tampa,Hillsborough,Standard,"(Davis Islands, Tampa, Hillsborough County, Fl...",27.936896,-82.459674,0,Gym,Hotel,Coffee Shop,Bar,Café


In [77]:
tampa_merged

Unnamed: 0,Zip_Code,City,County,Type,location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,33602,Tampa,Hillsborough,Standard,"(Tampa Heights, Tampa, Hillsborough County, Fl...",27.964132,-82.459452,0,Gym,Fast Food Restaurant,Coffee Shop,New American Restaurant,Discount Store
1,33603,Tampa,Hillsborough,Standard,"(Tampa, Florida, 33603, USA, (27.9824664343196...",27.982466,-82.463009,0,Intersection,Speakeasy,Furniture / Home Store,Pharmacy,Record Shop
2,33604,Tampa,Hillsborough,Standard,"(Sulphur Springs, Tampa, Hillsborough County, ...",28.012705,-82.46656,0,Zoo Exhibit,Gym,Coffee Shop,Bar,Food
3,33605,Tampa,Hillsborough,Standard,"(East Ybor, Tampa, Hillsborough County, Florid...",27.96589,-82.420964,0,Gym,Record Shop,Furniture / Home Store,Bus Station,Construction & Landscaping
4,33606,Tampa,Hillsborough,Standard,"(Davis Islands, Tampa, Hillsborough County, Fl...",27.936896,-82.459674,0,Gym,Hotel,Coffee Shop,Bar,Café
5,33607,Tampa,Hillsborough,Standard,"(Tampa, Hillsborough County, Florida, 33607, U...",27.973055,-82.588636,2,Food Truck,Health & Beauty Service,Beach,Harbor / Marina,Zoo Exhibit
6,33609,Tampa,Hillsborough,Standard,"(Palma Ceia, Tampa, Hillsborough County, Flori...",27.944685,-82.538135,0,Gym,Clothing Store,Hotel,Bank,Construction & Landscaping
7,33610,Tampa,Hillsborough,Standard,"(Ybor City, Tampa, Hillsborough County, Florid...",27.977944,-82.442975,0,Discount Store,Pharmacy,Gym,Music Venue,Food Truck
8,33611,Tampa,Hillsborough,Standard,"(Palma Ceia, Tampa, Hillsborough County, Flori...",27.873196,-82.488578,3,Gym,Turkish Restaurant,Grocery Store,Intersection,Seafood Restaurant
9,33612,Tampa,Hillsborough,Standard,"(Sulphur Springs, Tampa, Hillsborough County, ...",28.049509,-82.414625,0,Gym,Hotel,Diner,Pharmacy,Coffee Shop


Visualized on a map, the borough clusters look like this.

In [78]:
# create map
map_clusters = folium.Map(location=[tampa.latitude, tampa.longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tampa_merged['Latitude'], tampa_merged['Longitude'], tampa_merged['Zip_Code'], tampa_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters