***IBM Capstone Week 4 - Introduction/Business Problem***

An investor in the Miami, Florida area is well aware of the population growth and economic development trends in the area and is exploring the idea of opening a venue that is a combination of a health food store with a fitness center adjacent.  The investor is interested to know which neighborhoods are the most densely populated and might have the most potential for success.  For example, an area such as Coconut Grove appears to have a nice mix of residential and commercial land uses, and the area's population continues to grow with time.    

Data to be evaluated will include the Foursquare database to evaluate existing competition throughout the city's neighborhoods.  Foresquare data will be utilized to locate and evaluate ratings of potential competition.  

Additionally, a list of Miami-area neighborhoods and their approximate GPS coordinates will be utilized from this site:  https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami.

***IBM Capstone Week 4 - Data Collection and Cleaning***

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

*First, evaluate and organize the Miami neighborhood data.*

In [2]:
MiamiListURL="https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami"
def getHTMLContent(MiamiListURL):
    html=urlopen(MiamiListURL)
    soup=BeautifulSoup(html, 'html.parser')
    return soup

In [3]:
req=requests.get(MiamiListURL)
soup2=BeautifulSoup(req.content, 'lxml')
table2=soup2.find_all('table')[0]
df=pd.read_html(str(table2))
neighborhoodDF=pd.DataFrame(df[0])
neighborhoodDF

Unnamed: 0,Neighborhood,Demonym,Population2010,Population/Km²,Sub-neighborhoods,Coordinates
0,Allapattah,,54289,4401,,25.815-80.224
1,Arts & Entertainment District,,11033,7948,,25.799-80.190
2,Brickell,Brickellite,31759,14541,West Brickell,25.758-80.193
3,Buena Vista,,9058,3540,Buena Vista East Historic District and Design ...,25.813-80.192
4,Coconut Grove,Grovite,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712-80.257
5,Coral Way,,35062,4496,"Coral Gate, Golden Pines, Shenandoah, Historic...",25.750-80.283
6,Design District,,3573,3623,,25.813-80.193
7,Downtown,Downtowner,"71,000 (13,635 CBD only)",10613,"Brickell, Central Business District (CBD), Dow...",25.774-80.193
8,Edgewater,,15005,6675,,25.802-80.190
9,Flagami,,50834,5665,"Alameda, Grapeland Heights, and Fairlawn",25.762-80.316


In [4]:
# Drop "Demonym", as it is irrelevant to this analysis.
MIAneighb=neighborhoodDF.drop('Demonym', axis=1)
MIAneighb.head()

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Coordinates
0,Allapattah,54289,4401,,25.815-80.224
1,Arts & Entertainment District,11033,7948,,25.799-80.190
2,Brickell,31759,14541,West Brickell,25.758-80.193
3,Buena Vista,9058,3540,Buena Vista East Historic District and Design ...,25.813-80.192
4,Coconut Grove,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712-80.257


In [5]:
# Split "Coorindates" into two separate columns: "Latitude" and "Longitude".  Make sure Lat and Long are float format and Longitude is negative.

# new data frame with split value columns 
coord = MIAneighb["Coordinates"].str.split("-", n = 0, expand = True).astype(float)
  
# making separate Latitude column from new data frame 
MIAneighb["Latitude"]= coord[0] 
  
# making separate Longitude column from new data frame 
MIAneighb["Longitude"]= -1*coord[1] 
  
# Dropping old Coordinates columns 
MIAneighb.drop(columns =["Coordinates"], inplace = True) 
  
# Display revised dataframe
MIAneighb

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
0,Allapattah,54289,4401,,25.815,-80.224
1,Arts & Entertainment District,11033,7948,,25.799,-80.19
2,Brickell,31759,14541,West Brickell,25.758,-80.193
3,Buena Vista,9058,3540,Buena Vista East Historic District and Design ...,25.813,-80.192
4,Coconut Grove,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712,-80.257
5,Coral Way,35062,4496,"Coral Gate, Golden Pines, Shenandoah, Historic...",25.75,-80.283
6,Design District,3573,3623,,25.813,-80.193
7,Downtown,"71,000 (13,635 CBD only)",10613,"Brickell, Central Business District (CBD), Dow...",25.774,-80.193
8,Edgewater,15005,6675,,25.802,-80.19
9,Flagami,50834,5665,"Alameda, Grapeland Heights, and Fairlawn",25.762,-80.316


In [6]:
# Noted that Health District does not have Lat/Long coordinates, and we should remove Index 25 (sum totals) to avoid confusion.

# Health District (aka Civic Center).  Lat/Long (from Google Maps): 25.790, -80.215
MIAneighb["Latitude"].fillna("25.790", inplace = True)
MIAneighb["Longitude"].fillna("-80.215", inplace = True)

# Drop Row 25
MIAneighb=MIAneighb.drop(index=25, axis=0)
MIAneighb

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
0,Allapattah,54289,4401,,25.815,-80.224
1,Arts & Entertainment District,11033,7948,,25.799,-80.19
2,Brickell,31759,14541,West Brickell,25.758,-80.193
3,Buena Vista,9058,3540,Buena Vista East Historic District and Design ...,25.813,-80.192
4,Coconut Grove,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712,-80.257
5,Coral Way,35062,4496,"Coral Gate, Golden Pines, Shenandoah, Historic...",25.75,-80.283
6,Design District,3573,3623,,25.813,-80.193
7,Downtown,"71,000 (13,635 CBD only)",10613,"Brickell, Central Business District (CBD), Dow...",25.774,-80.193
8,Edgewater,15005,6675,,25.802,-80.19
9,Flagami,50834,5665,"Alameda, Grapeland Heights, and Fairlawn",25.762,-80.316


In [7]:
# Also, Midtown and the Venetian Islands need population data.  

# Venetian Islands data was extremely conflicting when researched.  Population and land area estimates were very inconsistent.  For the purposes of this exercise,
# we are going to drop Venetian Islands from this data set.
MIAneighb=MIAneighb.drop(index=21, axis=0)
MIAneighb=MIAneighb.reset_index(drop=True)
MIAneighb

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
0,Allapattah,54289,4401,,25.815,-80.224
1,Arts & Entertainment District,11033,7948,,25.799,-80.19
2,Brickell,31759,14541,West Brickell,25.758,-80.193
3,Buena Vista,9058,3540,Buena Vista East Historic District and Design ...,25.813,-80.192
4,Coconut Grove,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712,-80.257
5,Coral Way,35062,4496,"Coral Gate, Golden Pines, Shenandoah, Historic...",25.75,-80.283
6,Design District,3573,3623,,25.813,-80.193
7,Downtown,"71,000 (13,635 CBD only)",10613,"Brickell, Central Business District (CBD), Dow...",25.774,-80.193
8,Edgewater,15005,6675,,25.802,-80.19
9,Flagami,50834,5665,"Alameda, Grapeland Heights, and Fairlawn",25.762,-80.316


In [8]:
# Midtown was a new development in 2010, so we will use what data we can gather from 
# https://www.point2homes.com/US/Neighborhood/FL/Midtown-Edgewater-Demographics.html and 
# https://www.cpexecutive.com/post/midtown-opportunities-l-l-c-acquires-22-acres-of-land-in-midtown-miami/.  
# The population is approximatley 3,162, and the land area is 56 acres (0.23km2).

MIAneighb.at[16,'Population2010']= 3162
MIAneighb.at[16,'Population/Km²']= 3162/.23
MIAneighb[16:17]

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
16,Midtown,3162,13747.8,Edgewater and Wynwood,25.807,-80.193


In [9]:
# Downtown population needs to be cleaned up.
MIAneighb.at[7,'Population2010']= 71000
MIAneighb[7:8]

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
7,Downtown,71000,10613,"Brickell, Central Business District (CBD), Dow...",25.774,-80.193


In [10]:
# Virginia Key is approximatley 863 acres (3.49 km2).  Let's insert its population density.
MIAneighb.at[21,'Population/Km²']= 14/3.49
MIAneighb[21:22]

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
21,Virginia Key,14,4.01146,,25.736,-80.155


In [11]:
# Let's check the entire table
MIAneighb

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude
0,Allapattah,54289,4401.0,,25.815,-80.224
1,Arts & Entertainment District,11033,7948.0,,25.799,-80.19
2,Brickell,31759,14541.0,West Brickell,25.758,-80.193
3,Buena Vista,9058,3540.0,Buena Vista East Historic District and Design ...,25.813,-80.192
4,Coconut Grove,20076,3091.0,"Center Grove, Northeast Coconut Grove, Southwe...",25.712,-80.257
5,Coral Way,35062,4496.0,"Coral Gate, Golden Pines, Shenandoah, Historic...",25.75,-80.283
6,Design District,3573,3623.0,,25.813,-80.193
7,Downtown,71000,10613.0,"Brickell, Central Business District (CBD), Dow...",25.774,-80.193
8,Edgewater,15005,6675.0,,25.802,-80.19
9,Flagami,50834,5665.0,"Alameda, Grapeland Heights, and Fairlawn",25.762,-80.316


Going forward, this data set will be used in combination with Foursquare to evaluate population density and nearby competition in each neighborhood.

In [12]:
MIAneighb.dtypes

Neighborhood         object
Population2010       object
Population/Km²       object
Sub-neighborhoods    object
Latitude             object
Longitude            object
dtype: object

In [13]:
# Convert the database to proper data types.
MIAneighb[["Population2010", "Population/Km²"]]=MIAneighb[["Population2010", "Population/Km²"]].astype("int")
MIAneighb[["Latitude", "Longitude"]]=MIAneighb[["Latitude", "Longitude"]].astype("float")
MIAneighb.dtypes

Neighborhood          object
Population2010         int64
Population/Km²         int64
Sub-neighborhoods     object
Latitude             float64
Longitude            float64
dtype: object

In [14]:
import json 

!conda install -c conda-forge geopy --yes

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1

In [15]:
# The code was removed by Watson Studio for sharing.

Foursquare credentials are saved.


In [16]:
# Use geopy library to start a map of the Miami area and its neighborhoods.

address = 'Miami, Florida'

geolocator = Nominatim(user_agent="mia_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The approximate geograpical coordinates of Miami are {}, {}.'.format(latitude, longitude))

The approximate geograpical coordinates of Miami are 25.7742658, -80.1936589.


In [17]:
map_mia = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(MIAneighb['Latitude'], MIAneighb['Longitude'], MIAneighb['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mia)  
    
map_mia

In [19]:
# Let's do a cursory search of all the nearby (within 1,000 meters) venues in all the neighborhoods to see what we are dealing with.

def getNearbyVenues(names, latitudes, longitudes, radius=1000, LIMIT=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
mia_venues = getNearbyVenues(names=MIAneighb['Neighborhood'],
                                   latitudes=MIAneighb['Latitude'],
                                   longitudes=MIAneighb['Longitude']
                                  )

Allapattah
Arts & Entertainment District
Brickell
Buena Vista
Coconut Grove
Coral Way
Design District
Downtown
Edgewater
Flagami
Grapeland Heights
Health District
Liberty City
Little Haiti
Little Havana
Lummus Park
Midtown
Overtown
Park West
The Roads
Upper Eastside
Virginia Key
West Flagler
Wynwood


In [21]:
print(mia_venues.shape)
mia_venues.head(15)

(1393, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allapattah,25.815,-80.224,Club Tipico Dominicano,25.809557,-80.218593,Nightclub
1,Allapattah,25.815,-80.224,Little Caesars Pizza,25.809315,-80.22424,Pizza Place
2,Allapattah,25.815,-80.224,Family Dollar,25.807208,-80.223503,Discount Store
3,Allapattah,25.815,-80.224,Winn-Dixie,25.808179,-80.224911,Grocery Store
4,Allapattah,25.815,-80.224,Charles Hadley Pool,25.819565,-80.216753,Park
5,Allapattah,25.815,-80.224,Redbox,25.808122,-80.224456,Video Store
6,Allapattah,25.815,-80.224,El Presidente Supermarket,25.809744,-80.231959,Food & Drink Shop
7,Allapattah,25.815,-80.224,Kuky estilo Barber shop,25.809158,-80.224014,Cosmetics Shop
8,Allapattah,25.815,-80.224,MDT Metrorail - Earlington Heights Station,25.812449,-80.229974,Light Rail Station
9,Allapattah,25.815,-80.224,Cafeteria Amarilis Dominican Food,25.808877,-80.223655,Spanish Restaurant


In [22]:
# Check to see how many venues returned for each neighborhood.

mia_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allapattah,15,15,15,15,15,15
Arts & Entertainment District,100,100,100,100,100,100
Brickell,85,85,85,85,85,85
Buena Vista,87,87,87,87,87,87
Coconut Grove,10,10,10,10,10,10
Coral Way,24,24,24,24,24,24
Design District,85,85,85,85,85,85
Downtown,100,100,100,100,100,100
Edgewater,100,100,100,100,100,100
Flagami,26,26,26,26,26,26


In [23]:
# How many unique categories from the returned values?

print('There are {} uniques categories.'.format(len(mia_venues['Venue Category'].unique())))

There are 206 uniques categories.


In [24]:
# Begin analyzing each neighborhood.

# one hot encoding
mia_onehot = pd.get_dummies(mia_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mia_onehot['Neighborhood'] = mia_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mia_onehot.columns[-1]] + list(mia_onehot.columns[:-1])
mia_onehot = mia_onehot[fixed_columns]

mia_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Service,American Restaurant,Aquarium,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Tree,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
# Double check database size
mia_onehot.shape

(1393, 206)

In [26]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

mia_grouped = mia_onehot.groupby('Neighborhood').mean().reset_index()
mia_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Service,American Restaurant,Aquarium,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Tree,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Allapattah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
1,Arts & Entertainment District,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.1,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
2,Brickell,0.058824,0.0,0.0,0.011765,0.0,0.0,0.035294,0.0,0.0,...,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0
3,Buena Vista,0.0,0.0,0.0,0.011494,0.0,0.011494,0.011494,0.057471,0.011494,...,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0
4,Coconut Grove,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Coral Way,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Design District,0.0,0.0,0.0,0.011765,0.0,0.011765,0.011765,0.058824,0.011765,...,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.0
7,Downtown,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
8,Edgewater,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.07,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0
9,Flagami,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
# Examine each neighborhood and its top five most common venues

num_top_venues = 5

for hood in mia_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = mia_grouped[mia_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allapattah----
               venue  freq
0               Park  0.13
1     Cosmetics Shop  0.13
2  Food & Drink Shop  0.13
3          Nightclub  0.07
4             Bakery  0.07


----Arts & Entertainment District----
            venue  freq
0     Art Gallery  0.10
1      Restaurant  0.06
2  Ice Cream Shop  0.06
3       Juice Bar  0.05
4             Bar  0.04


----Brickell----
                       venue  freq
0                      Hotel  0.11
1         Italian Restaurant  0.07
2                Yoga Studio  0.06
3        Japanese Restaurant  0.04
4  Middle Eastern Restaurant  0.04


----Buena Vista----
                venue  freq
0         Art Gallery  0.06
1  Italian Restaurant  0.05
2         Coffee Shop  0.05
3         Pizza Place  0.03
4                Park  0.03


----Coconut Grove----
           venue  freq
0           Park   0.3
1          Plaza   0.1
2          Trail   0.1
3         Garden   0.1
4  Boat or Ferry   0.1


----Coral Way----
               venue  freq
0      

In [28]:
# Convert this to a pandas dataframe and display the top 10 venues for each neighborhood.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mia_grouped['Neighborhood']

for ind in np.arange(mia_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mia_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,Food & Drink Shop,Park,Cosmetics Shop,Home Service,Bakery,Video Store,Pizza Place,Spanish Restaurant,Grocery Store,Discount Store
1,Arts & Entertainment District,Art Gallery,Restaurant,Ice Cream Shop,Juice Bar,Bar,Coffee Shop,Gym,Gym / Fitness Center,Food Truck,Peruvian Restaurant
2,Brickell,Hotel,Italian Restaurant,Yoga Studio,Argentinian Restaurant,Middle Eastern Restaurant,Pizza Place,Japanese Restaurant,Grocery Store,Steakhouse,Shopping Mall
3,Buena Vista,Art Gallery,Italian Restaurant,Coffee Shop,Park,Pizza Place,Café,Gym,Furniture / Home Store,Jewelry Store,Bakery
4,Coconut Grove,Park,American Restaurant,Garden,Trail,Cosmetics Shop,Boat or Ferry,Playground,Plaza,Donut Shop,Fish Market
5,Coral Way,Café,Park,Historic Site,Intersection,Seafood Restaurant,Burger Joint,Martial Arts Dojo,Pharmacy,Spanish Restaurant,Dive Bar
6,Design District,Art Gallery,Coffee Shop,Italian Restaurant,Park,Pizza Place,Café,Gym,Furniture / Home Store,Jewelry Store,Shopping Mall
7,Downtown,Hotel,Seafood Restaurant,Italian Restaurant,Cocktail Bar,Residential Building (Apartment / Condo),Gym,Cosmetics Shop,Peruvian Restaurant,Restaurant,Shopping Mall
8,Edgewater,Art Gallery,Ice Cream Shop,Coffee Shop,Restaurant,Food Truck,Bar,Pizza Place,Gym / Fitness Center,Mexican Restaurant,Peruvian Restaurant
9,Flagami,Liquor Store,Bakery,Seafood Restaurant,Cuban Restaurant,Record Shop,Fast Food Restaurant,Spanish Restaurant,Food Truck,Latin American Restaurant,Other Repair Shop


In [30]:
# Run k-means to cluster the neighborhoods into 5 groups

# set number of clusters
kclusters = 5

mia_grouped_clustering = mia_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mia_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 2, 2, 3, 2, 2, 2, 2, 1], dtype=int32)

In [31]:
# Create a new dataframe that includes the cluster as well as the Top 10 venues for each neighborhood

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mia_merged = MIAneighb

# merge mia_grouped with mia_data to add latitude/longitude for each neighborhood
mia_merged = mia_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

mia_merged.head()

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,54289,4401,,25.815,-80.224,3,Food & Drink Shop,Park,Cosmetics Shop,Home Service,Bakery,Video Store,Pizza Place,Spanish Restaurant,Grocery Store,Discount Store
1,Arts & Entertainment District,11033,7948,,25.799,-80.19,2,Art Gallery,Restaurant,Ice Cream Shop,Juice Bar,Bar,Coffee Shop,Gym,Gym / Fitness Center,Food Truck,Peruvian Restaurant
2,Brickell,31759,14541,West Brickell,25.758,-80.193,2,Hotel,Italian Restaurant,Yoga Studio,Argentinian Restaurant,Middle Eastern Restaurant,Pizza Place,Japanese Restaurant,Grocery Store,Steakhouse,Shopping Mall
3,Buena Vista,9058,3540,Buena Vista East Historic District and Design ...,25.813,-80.192,2,Art Gallery,Italian Restaurant,Coffee Shop,Park,Pizza Place,Café,Gym,Furniture / Home Store,Jewelry Store,Bakery
4,Coconut Grove,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712,-80.257,3,Park,American Restaurant,Garden,Trail,Cosmetics Shop,Boat or Ferry,Playground,Plaza,Donut Shop,Fish Market


In [32]:
#Visualize the resulting clusters

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mia_merged['Latitude'], mia_merged['Longitude'], mia_merged['Neighborhood'], mia_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

*Now we can examine the clusters and review discriminating venue categories that distinguish each*

#### Cluster 1

In [33]:
Cluster1=mia_merged.loc[mia_merged['Cluster Labels'] == 0, mia_merged.columns[[0] + list(range(1, mia_merged.shape[1]))]]
Cluster1

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Liberty City,19725,3733,,25.832,-80.225,0,Home Service,Fried Chicken Joint,Sandwich Place,Discount Store,Seafood Restaurant,Donut Shop,Food,Gym / Fitness Center,Film Studio,Farmers Market


In [34]:
# Find the approximate centroid of the Cluster using the median lat/long coordinates.
Cluster1Loc=pd.DataFrame(Cluster1[['Latitude', 'Longitude']].median(), columns=['Cluster 1'])
print(Cluster1Loc, '\n')

# Find the sum population of the cluster
C1Pop=pd.DataFrame(Cluster1[['Population2010']].sum(), columns=['Cluster 1'])
print(C1Pop, '\n')

# Find the mean population density of all neighborhoods in the cluster
C1PD=pd.DataFrame(Cluster1[['Population/Km²']].mean(), columns=['Cluster 1'])
print(C1PD)

           Cluster 1
Latitude      25.832
Longitude    -80.225 

                Cluster 1
Population2010      19725 

                Cluster 1
Population/Km²     3733.0


In [35]:
C1Sum=pd.concat([Cluster1Loc, C1Pop, C1PD])
C1Sum

Unnamed: 0,Cluster 1
Latitude,25.832
Longitude,-80.225
Population2010,19725.0
Population/Km²,3733.0


Cluster 1 Conclusion:  Based on the data presented, it appears gyms and health clubs are somewhat popular, but not significantly.

#### Cluster 2

In [36]:
Cluster2=mia_merged.loc[mia_merged['Cluster Labels'] == 1, mia_merged.columns[[0] + list(range(1, mia_merged.shape[1]))]]
Cluster2

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Flagami,50834,5665,"Alameda, Grapeland Heights, and Fairlawn",25.762,-80.316,1,Liquor Store,Bakery,Seafood Restaurant,Cuban Restaurant,Record Shop,Fast Food Restaurant,Spanish Restaurant,Food Truck,Latin American Restaurant,Other Repair Shop
14,Little Havana,76163,8423,Riverside and South River Drive Historic District,25.773,-80.215,1,Cuban Restaurant,Smoke Shop,Latin American Restaurant,Pharmacy,Mexican Restaurant,Pizza Place,Spanish Restaurant,Park,Fast Food Restaurant,Ice Cream Shop
22,West Flagler,31407,4428,,25.775,-80.243,1,Latin American Restaurant,Pharmacy,Pizza Place,Asian Restaurant,Bakery,Concert Hall,Gas Station,Plaza,Comfort Food Restaurant,Coffee Shop


In [37]:
# Find the approximate centroid of the Cluster using the median lat/long coordinates.
Cluster2Loc=pd.DataFrame(Cluster2[['Latitude', 'Longitude']].median(), columns=['Cluster 2'])
print(Cluster2Loc, '\n')

# Find the sum population of the cluster
C2Pop=pd.DataFrame(Cluster2[['Population2010']].sum(), columns=['Cluster 2'])
print(C2Pop, '\n')

# Find the mean population density of all neighborhoods in the cluster
C2PD=pd.DataFrame(Cluster2[['Population/Km²']].mean(), columns=['Cluster 2'])
print(C2PD)

           Cluster 2
Latitude      25.773
Longitude    -80.243 

                Cluster 2
Population2010     158404 

                Cluster 2
Population/Km²     6172.0


In [38]:
C2Sum=pd.concat([Cluster2Loc, C2Pop, C2PD])
C2Sum

Unnamed: 0,Cluster 2
Latitude,25.773
Longitude,-80.243
Population2010,158404.0
Population/Km²,6172.0


Cluster 2 Conclusion:  Cluster 2 appears to have popular restaurants, but not necessarily gyms or health clubs or healthy food markets.

#### Cluster 3

In [39]:
Cluster3=mia_merged.loc[mia_merged['Cluster Labels'] == 2, mia_merged.columns[[0] + list(range(1, mia_merged.shape[1]))]]
Cluster3

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Arts & Entertainment District,11033,7948,,25.799,-80.19,2,Art Gallery,Restaurant,Ice Cream Shop,Juice Bar,Bar,Coffee Shop,Gym,Gym / Fitness Center,Food Truck,Peruvian Restaurant
2,Brickell,31759,14541,West Brickell,25.758,-80.193,2,Hotel,Italian Restaurant,Yoga Studio,Argentinian Restaurant,Middle Eastern Restaurant,Pizza Place,Japanese Restaurant,Grocery Store,Steakhouse,Shopping Mall
3,Buena Vista,9058,3540,Buena Vista East Historic District and Design ...,25.813,-80.192,2,Art Gallery,Italian Restaurant,Coffee Shop,Park,Pizza Place,Café,Gym,Furniture / Home Store,Jewelry Store,Bakery
5,Coral Way,35062,4496,"Coral Gate, Golden Pines, Shenandoah, Historic...",25.75,-80.283,2,Café,Park,Historic Site,Intersection,Seafood Restaurant,Burger Joint,Martial Arts Dojo,Pharmacy,Spanish Restaurant,Dive Bar
6,Design District,3573,3623,,25.813,-80.193,2,Art Gallery,Coffee Shop,Italian Restaurant,Park,Pizza Place,Café,Gym,Furniture / Home Store,Jewelry Store,Shopping Mall
7,Downtown,71000,10613,"Brickell, Central Business District (CBD), Dow...",25.774,-80.193,2,Hotel,Seafood Restaurant,Italian Restaurant,Cocktail Bar,Residential Building (Apartment / Condo),Gym,Cosmetics Shop,Peruvian Restaurant,Restaurant,Shopping Mall
8,Edgewater,15005,6675,,25.802,-80.19,2,Art Gallery,Ice Cream Shop,Coffee Shop,Restaurant,Food Truck,Bar,Pizza Place,Gym / Fitness Center,Mexican Restaurant,Peruvian Restaurant
10,Grapeland Heights,14004,4130,,25.792,-80.258,2,Rental Car Location,Hotel,Hotel Pool,Bus Station,Airport Service,Gym / Fitness Center,Gym,Train Station,Gas Station,Marijuana Dispensary
11,Health District,2705,2148,,25.79,-80.215,2,Sandwich Place,Convenience Store,Bakery,Café,Fast Food Restaurant,Light Rail Station,Coffee Shop,Mexican Restaurant,Food & Drink Shop,Latin American Restaurant
13,Little Haiti,29760,3840,Lemon City (aka Little River),25.824,-80.191,2,Gym,Pizza Place,Sushi Restaurant,Italian Restaurant,Event Space,Pharmacy,Caribbean Restaurant,Café,Shopping Mall,Grocery Store


In [40]:
# Find the approximate centroid of the Cluster using the median lat/long coordinates.
Cluster3Loc=pd.DataFrame(Cluster3[['Latitude', 'Longitude']].median(), columns=['Cluster 3'])
print(Cluster3Loc, '\n')

# Find the sum population of the cluster
C3Pop=pd.DataFrame(Cluster3[['Population2010']].sum(), columns=['Cluster 3'])
print(C3Pop, '\n')

# Find the mean population density of all neighborhoods in the cluster
C3PD=pd.DataFrame(Cluster3[['Population/Km²']].mean(), columns=['Cluster 3'])
print(C3PD)

           Cluster 3
Latitude      25.792
Longitude    -80.193 

                Cluster 3
Population2010     267668 

                  Cluster 3
Population/Km²  5671.529412


In [41]:
C3Sum=pd.concat([Cluster3Loc, C3Pop, C3PD])
C3Sum

Unnamed: 0,Cluster 3
Latitude,25.792
Longitude,-80.193
Population2010,267668.0
Population/Km²,5671.529412


Cluster 3 Conclusion:  Based on the data presented, Cluster 3 is the largest cluster by area and population, with several neighborhoods having popular gyms/health clubs.

#### Cluster 4

In [42]:
Cluster4=mia_merged.loc[mia_merged['Cluster Labels'] == 3, mia_merged.columns[[0] + list(range(1, mia_merged.shape[1]))]]
Cluster4

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,54289,4401,,25.815,-80.224,3,Food & Drink Shop,Park,Cosmetics Shop,Home Service,Bakery,Video Store,Pizza Place,Spanish Restaurant,Grocery Store,Discount Store
4,Coconut Grove,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712,-80.257,3,Park,American Restaurant,Garden,Trail,Cosmetics Shop,Boat or Ferry,Playground,Plaza,Donut Shop,Fish Market


In [43]:
# Find the approximate centroid of the Cluster using the median lat/long coordinates.
Cluster4Loc=pd.DataFrame(Cluster4[['Latitude', 'Longitude']].median(), columns=['Cluster 4'])
print(Cluster4Loc, '\n')

# Find the sum population of the cluster
C4Pop=pd.DataFrame(Cluster4[['Population2010']].sum(), columns=['Cluster 4'])
print(C4Pop, '\n')

# Find the mean population density of all neighborhoods in the cluster
C4PD=pd.DataFrame(Cluster4[['Population/Km²']].mean(), columns=['Cluster 4'])
print(C4PD)

           Cluster 4
Latitude     25.7635
Longitude   -80.2405 

                Cluster 4
Population2010      74365 

                Cluster 4
Population/Km²     3746.0


In [44]:
C4Sum=pd.concat([Cluster4Loc, C4Pop, C4PD])
C4Sum

Unnamed: 0,Cluster 4
Latitude,25.7635
Longitude,-80.2405
Population2010,74365.0
Population/Km²,3746.0


Cluster 4 Conclusions:  Although gyms and health clubs do not explicitly appear to be very popular in this cluster, it was noted that parks, gardens, boat, trails, and playgrounds are quite popular.  It was also noted that Coconut Grove has a reasonably large population density.  This could be a cluster area with an active population that enjoys health-oriented and outdoor activities.

#### Cluster 5

In [45]:
Cluster5=mia_merged.loc[mia_merged['Cluster Labels'] == 4, mia_merged.columns[[0] + list(range(1, mia_merged.shape[1]))]]
Cluster5

Unnamed: 0,Neighborhood,Population2010,Population/Km²,Sub-neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Virginia Key,14,4,,25.736,-80.155,4,Food,Dive Bar,Moving Target,Park,Cafeteria,Deli / Bodega,Electronics Store,Flea Market,Cupcake Shop,Fish Market


In [46]:
# Find the approximate centroid of the Cluster using the median lat/long coordinates.
Cluster5Loc=pd.DataFrame(Cluster5[['Latitude', 'Longitude']].median(), columns=['Cluster 5'])
print(Cluster5Loc, '\n')

# Find the sum population of the cluster
C5Pop=pd.DataFrame(Cluster5[['Population2010']].sum(), columns=['Cluster 5'])
print(C5Pop, '\n')

# Find the mean population density of all neighborhoods in the cluster
C5PD=pd.DataFrame(Cluster5[['Population/Km²']].mean(), columns=['Cluster 5'])
print(C5PD)

           Cluster 5
Latitude      25.736
Longitude    -80.155 

                Cluster 5
Population2010         14 

                Cluster 5
Population/Km²        4.0


In [47]:
C5Sum=pd.concat([Cluster5Loc, C5Pop, C5PD])
C5Sum

Unnamed: 0,Cluster 5
Latitude,25.736
Longitude,-80.155
Population2010,14.0
Population/Km²,4.0


Cluster 5 Conclusions:  Virginia Key is not a very densely populated area of Miami.

#### Summarize the clusters

In [48]:
ClusterSumDF=pd.concat([C1Sum, C2Sum, C3Sum, C4Sum, C5Sum], axis=1)
ClusterSumDF

Unnamed: 0,Cluster 1,Cluster 2,Cluster 3,Cluster 4,Cluster 5
Latitude,25.832,25.773,25.792,25.7635,25.736
Longitude,-80.225,-80.243,-80.193,-80.2405,-80.155
Population2010,19725.0,158404.0,267668.0,74365.0,14.0
Population/Km²,3733.0,6172.0,5671.529412,3746.0,4.0
