<a href="https://colab.research.google.com/github/oonid/Coursera_Capstone/blob/main/The_Battle_of_Neighborhoods/The_Battle_of_Neighbourhoods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Battle of Neighbourhoods

Applied Data Science Capstone. Coursera. IBM.


# Analyzing Medical Facilities in New York City Neighbourhoods

Author: oon arfiandwi

Date: 7th March, 2021

Repository: [The Battle of Neighbourhoods](https://github.com/oonid/Coursera_Capstone/tree/main/The_Battle_of_Neighborhoods)

## Table of Contents

1. Understanding New York Dataset with Boroughs and Neighbourhoods.
2. Working with Foursquare API to Get Medical Center Categories Data
3. Build and Map Nearby Venues
4. Analyze Each Neighbourhood
5. Cluster Neighbourhoods
6. Examine Clusters



# Introduction/Business Problem

Since 2020, pandemics change many things in our lives, everything would be different especially if it is related to health and medicine.

For anyone that has a plan to move to the new city, one of the top questions that would be searched is how far the place from the medical facilities and what type of medical facilities is available in that neighbourhoods.

Through this capstone project, I will be analyzing medical venues in the neighbourhoods of New York City, with the support information from the Foursquare data.

This project would be valuable for anyone that lives and work (also for anyone that plans to be lived or worked) in the neighbourhoods of New York City.

# Data

Here's the data which been used in this capstone project.

*   New York City Boroughs and Neighbourhoods data. The data were already available through the course.

*   Foursquare API explore endpoint to get medical venues for every borough listed on New York City dataset.

Note that Foursquare data for [medical categories](https://developer.foursquare.com/docs/build-with-foursquare/categories/) have limitations if we want to get the venues related to the neighbourhoods in the New York City data. So the data exploration starts from the boroughs level to get the venues and after the collection is finished then join the data to the nearest neighbour with the minimum distance calculation.

In [1]:
# Libraries

import pandas as pd
import numpy as np
import requests
import io
import json

# from pandas.io.json import json_normalize  # this is deprecated, use below
from pandas import json_normalize  # tranform JSON file into a pandas dataframe

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Foursquare CLIENT_ID and CLIENT_SECRET will be saved as pass
from getpass import getpass

print('Libraries imported.')

Libraries imported.


# 1. Understanding New York Dataset with Boroughs and Neighbourhoods

#### Get geographical coordinate of New York City

In [2]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
ny_latitude = location.latitude
ny_longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'
    .format(ny_latitude, ny_longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Get and load New York Neighbourhoods data

In [3]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

print('Got {} New York boroughs data'.format(len(newyork_data)))

# construct neighbourhoods from features key
neighbourhoods_data = newyork_data['features']

Data downloaded!
Got 5 New York boroughs data


#### Transform data into a pandas dataframe

In [4]:
# define the dataframe columns
column_names = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighbourhoods = pd.DataFrame(columns=column_names)

for data in neighbourhoods_data:
    borough = neighbourhood_name = data['properties']['borough'] 
    neighbourhood_name = data['properties']['name']
        
    neighbourhood_latlon = data['geometry']['coordinates']
    neighbourhood_lat = neighbourhood_latlon[1]
    neighbourhood_lon = neighbourhood_latlon[0]
    
    neighbourhoods = neighbourhoods.append({'Borough': borough,
                                          'Neighbourhood': neighbourhood_name,
                                          'Latitude': neighbourhood_lat,
                                          'Longitude': neighbourhood_lon}, ignore_index=True)

neighbourhoods

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


medical center data in Foursquare not complete for all neighbourhoods, that why we focus on boroughs level.
later on the process we will map it back to the nearest neighbourhoods by calculate the distance.

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Latitude', 'Longitude'] 

# instantiate the dataframe
boroughs = pd.DataFrame(columns=column_names)

for borough in neighbourhoods['Borough'].unique():
    address = '{}, NY'.format(borough)

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'
        .format(address, latitude, longitude))
    
    boroughs = boroughs.append({'Borough': borough, 'Latitude': latitude,
                                'Longitude': longitude}, ignore_index=True)

boroughs

The geograpical coordinate of Bronx, NY are 40.8466508, -73.8785937.
The geograpical coordinate of Manhattan, NY are 40.7896239, -73.9598939.
The geograpical coordinate of Brooklyn, NY are 40.6501038, -73.9495823.
The geograpical coordinate of Queens, NY are 40.7498243, -73.7976337.
The geograpical coordinate of Staten Island, NY are 40.5834557, -74.1496048.


Unnamed: 0,Borough,Latitude,Longitude
0,Bronx,40.846651,-73.878594
1,Manhattan,40.789624,-73.959894
2,Brooklyn,40.650104,-73.949582
3,Queens,40.749824,-73.797634
4,Staten Island,40.583456,-74.149605


#### Create a map of New York with neighborhoods superimposed on top.

In [6]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in \
        zip(boroughs['Latitude'], boroughs['Longitude'], boroughs['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

# 2. Working with Foursquare API to Get Medical Center Categories Data

#### Define Foursquare Credentials and Version


In [7]:
CLIENT_ID = getpass('Foursquare ID: ') # your Foursquare ID
CLIENT_SECRET = getpass('Foursquare Secret: ') # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
# print('CLIENT_SECRET:' + CLIENT_SECRET)

Foursquare ID: ··········
Foursquare Secret: ··········
Your credentails:
CLIENT_ID: I2ZHJ20LEPMTBLGLXWJVRBAG00MW3BQFDK2OA3QYMEFX41YG


In [8]:
# query medical category in several cities
LIMIT = 100  # Maximum is 100 per query (city)
radius = 5000  # meters, set upper number until every borough returns 100

results = {}
for lat, lng, borough in \
        zip(boroughs['Latitude'], boroughs['Longitude'], boroughs['Borough']):

    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}' \
          '&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}' \
          .format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat,
        lng,
        radius,
        LIMIT,
        "4bf58dd8d48988d104941735")  # Medical Center Categories
    results[borough] = requests.get(url).json()

print('save results from {} boroughs.'.format(boroughs.shape[0]))

save results from 5 boroughs.


Borrow `get_category_type` function from Foursquare lab, all the information on _items_ key.

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# 3. Build and Map Nearby Venues

Now we are ready to clean the json and structure it into a _pandas_ dataframe.

In [10]:
nearby_venues = {}
for lat, lng, borough in \
        zip(boroughs['Latitude'], boroughs['Longitude'], boroughs['Borough']):

    venues = results[borough]['response']['groups'][0]['items']
    nearby_venues[borough] = json_normalize(venues) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat',
                        'venue.location.lng', 'venue.location.address']
    nearby_venues[borough] = nearby_venues[borough].loc[:, filtered_columns]

    # filter the category for each row
    nearby_venues[borough]['venue.categories'] = \
        nearby_venues[borough].apply(get_category_type, axis=1)

    # clean columns
    nearby_venues[borough].columns = \
        [col.split(".")[-1] for col in nearby_venues[borough].columns]

    print('{} venues in {} were returned by Foursquare.'
        .format(nearby_venues[borough].shape[0], borough))

# check on one of the boroughs
print('Nearby venues on Queens')
nearby_venues['Queens']


100 venues in Bronx were returned by Foursquare.
100 venues in Manhattan were returned by Foursquare.
100 venues in Brooklyn were returned by Foursquare.
100 venues in Queens were returned by Foursquare.
100 venues in Staten Island were returned by Foursquare.
Nearby venues on Queens


Unnamed: 0,name,categories,lat,lng,address
0,NewYork-Presbyterian Queens,Hospital,40.747248,-73.825336,56-45 Main St
1,Ozanam Hall of Queens Nursing Home,Medical Center,40.758543,-73.782521,4241 201st St
2,Main Street Radiology,Medical Center,40.757215,-73.782714,44-01 Francis Lewis Blvd
3,Better Sight Vision Center,Doctor's Office,40.755320,-73.828285,4202 Main St
4,NewYork-Presbyterian Medical Group Queens - Pe...,Medical Center,40.738964,-73.805264,163-03 Horace Harding Expy Ste 5
...,...,...,...,...,...
95,Cohen's Fashion Optical,Optical Shop,40.778680,-73.777957,211-51 26th Ave
96,Sanford B Ratner (Sanford Ratner),Doctor's Office,40.721639,-73.759194,20615 Hillside Ave
97,Whitestone Physical Therapy of Queens,Doctor's Office,40.788689,-73.813148,150-12 14th Ave
98,NewYork-Presbyterian Medical Group Queens - Ca...,Doctor's Office,40.788687,-73.813929,14-02 150th St


In [11]:
nearby_maps = {}

for borough_lat, borough_lng, borough in \
        zip(boroughs['Latitude'], boroughs['Longitude'], boroughs['Borough']):

    nearby_maps[borough] = folium.Map(location=[borough_lat, borough_lng],
                                       zoom_start=12)

    # add markers to map
    for lat, lng, label in \
            zip(nearby_venues[borough]['lat'], nearby_venues[borough]['lng'],
                nearby_venues[borough]['name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(nearby_maps[borough])  
    print("Show Top 100 from {} total number of medical places in {} (5 km)"
        .format(results[borough]['response']['totalResults'], borough))


Show Top 100 from 182 total number of medical places in Bronx (5 km)
Show Top 100 from 274 total number of medical places in Manhattan (5 km)
Show Top 100 from 215 total number of medical places in Brooklyn (5 km)
Show Top 100 from 181 total number of medical places in Queens (5 km)
Show Top 100 from 160 total number of medical places in Staten Island (5 km)


The most medical places are near from Manhattan.

In [12]:
borough = 'Manhattan'
print("Showing Top 100 medical places in {}".format(borough))
nearby_maps[borough]

Showing Top 100 medical places in Manhattan


#### Calculate the minimum distance between Venues and Neighbourhoods

In [14]:
# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))



Iterate for every venues, calculate the distance from the venue to the neighbourhoods with the same borough.

In [15]:
medical_venues = pd.DataFrame(columns=['name', 'categories', 'lat', 'lng',
                                       'address', 'Borough', 'Neighbourhood',
                                       'Latitude', 'Longitude', 'distance'])

for borough in boroughs['Borough']:
    medical_in_borough = nearby_venues[borough]
    neighbourhoods_by_borough = neighbourhoods[neighbourhoods['Borough'] == 
                                               borough]
    # merge both df with new A
    df_merge = medical_in_borough.assign(A=1) \
        .merge(neighbourhoods_by_borough.assign(A=1), on='A').drop('A', 1) 
    # calculate distance from every rows and add it to new column distance
    df_merge['distance'] = \
        df_merge.apply(lambda d: haversine(d['lat'], d['lng'],
                                           d['Latitude'], d['Longitude']), 1)
    # re-group by name and lat and lng (make sure same name in diffence latlon)
    medical_venues = medical_venues.append(
            df_merge[df_merge['distance'] \
                     .isin(df_merge.groupby(['name', 'lat', 'lng']) \
                           .min()['distance'].values)])

# cleanup
medical_venues.drop(['Latitude', 'Longitude', 'distance'], axis=1, inplace=True)
medical_venues.reset_index(drop=True)

Unnamed: 0,name,categories,lat,lng,address,Borough,Neighbourhood
0,Montefiore Medical Pavillion,Medical Center,40.880135,-73.878712,3400 Bainbridge Ave,Bronx,Norwood
1,Montefiore Medical Center (Albert Einstein Col...,Medical Center,40.880879,-73.880010,2532 Grand Concourse,Bronx,Norwood
2,Isabella Geriatric,Medical Center,40.854553,-73.927845,515 Audubon Ave,Bronx,Morris Heights
3,Bronx VA Medical Center,Hospital,40.867172,-73.905960,130 W Kingsbridge Rd,Bronx,Kingsbridge Heights
4,Bronx Lebanon - Fulton Division,Medical Center,40.831548,-73.902859,1276 Fulton Ave,Bronx,Claremont Village
...,...,...,...,...,...,...,...
495,"NYC Health + Hospitals/Gotham Health, Mariners...",Hospital,40.625953,-74.157035,2040 Forest Ave,Staten Island,Graniteville
496,Physicians First Messages / Medical Answering ...,Office,40.596117,-74.094927,1282 Richmond Rd,Staten Island,Old Town
497,Staten Island Dental Group,Dentist's Office,40.603145,-74.140400,979 Willowbrook Rd,Staten Island,Willowbrook
498,New Dorp MRI,Medical Lab,40.575391,-74.118723,,Staten Island,New Dorp


#### Create a map of medical venues in New York with neighborhoods superimposed on top.

In [16]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10)

# add markers to map
for name, lat, lng, borough, neighbourhood in \
        zip(medical_venues['name'], 
            medical_venues['lat'], medical_venues['lng'],
            medical_venues['Borough'], medical_venues['Neighbourhood']):
    label = '{}, {}'.format(name, neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

# 4. Analyze Each Neighbourhood

In [17]:
# one hot encoding
medical_onehot = pd.get_dummies(medical_venues[['categories']],
                                prefix="", prefix_sep="")

# add neighborhood column back to dataframe
medical_onehot['Neighbourhood'] = medical_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [medical_onehot.columns[-1]] + list(medical_onehot.columns[:-1])
medical_onehot = medical_onehot[fixed_columns]

medical_onehot

Unnamed: 0,Neighbourhood,Coworking Space,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital,Medical Center,Medical Lab,Medical School,Office,Optical Shop,Pharmacy,Physical Therapist,Sandwich Place,Veterinarian
7,Norwood,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
59,Norwood,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
118,Morris Heights,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
207,Kingsbridge Heights,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
253,Claremont Village,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6011,Graniteville,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
6078,Old Town,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
6167,Willowbrook,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6188,New Dorp,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0


In [18]:
medical_grouped = medical_onehot.groupby('Neighbourhood').mean().reset_index()
medical_grouped

Unnamed: 0,Neighbourhood,Coworking Space,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital,Medical Center,Medical Lab,Medical School,Office,Optical Shop,Pharmacy,Physical Therapist,Sandwich Place,Veterinarian
0,Allerton,0.0,0.0,0.000000,0.500000,0.0,0.0,0.0,0.0,0.000000,0.500000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
1,Auburndale,0.0,0.0,0.000000,0.769231,0.0,0.0,0.0,0.0,0.000000,0.230769,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
2,Bay Terrace,0.0,0.0,0.000000,0.714286,0.0,0.0,0.0,0.0,0.142857,0.000000,0.0,0.000000,0.0,0.142857,0.0,0.0,0.0,0.0
3,Baychester,0.0,0.0,0.000000,1.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
4,Bayside,0.0,0.0,0.000000,0.625000,0.0,0.0,0.0,0.0,0.125000,0.250000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
117,Williamsbridge,0.0,0.0,0.000000,1.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
118,Willowbrook,0.0,0.0,0.166667,0.500000,0.0,0.0,0.0,0.0,0.000000,0.333333,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
119,Windsor Terrace,0.0,0.0,0.000000,1.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
120,Wingate,0.0,0.0,0.000000,0.285714,0.0,0.0,0.0,0.0,0.428571,0.142857,0.0,0.142857,0.0,0.000000,0.0,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues and put to dataframe


In [19]:
num_top_venues = 5

for hood in medical_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = medical_grouped[medical_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allerton----
                venue  freq
0      Medical Center   0.5
1     Doctor's Office   0.5
2         Medical Lab   0.0
3      Sandwich Place   0.0
4  Physical Therapist   0.0


----Auburndale----
                venue  freq
0     Doctor's Office  0.77
1      Medical Center  0.23
2         Medical Lab  0.00
3      Sandwich Place  0.00
4  Physical Therapist  0.00


----Bay Terrace----
             venue  freq
0  Doctor's Office  0.71
1     Optical Shop  0.14
2         Hospital  0.14
3  Coworking Space  0.00
4      Medical Lab  0.00


----Baychester----
                venue  freq
0     Doctor's Office   1.0
1     Coworking Space   0.0
2         Medical Lab   0.0
3      Sandwich Place   0.0
4  Physical Therapist   0.0


----Bayside----
             venue  freq
0  Doctor's Office  0.62
1   Medical Center  0.25
2         Hospital  0.12
3      Medical Lab  0.00
4   Sandwich Place  0.00


----Bedford Park----
                venue  freq
0     Doctor's Office   1.0
1     Coworking Sp

First, let's write a function to sort the venues in descending order.

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = medical_grouped['Neighbourhood']

for ind in np.arange(medical_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = \
        return_most_common_venues(medical_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
1,Auburndale,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
2,Bay Terrace,Doctor's Office,Hospital,Optical Shop,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
3,Baychester,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
4,Bayside,Doctor's Office,Medical Center,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
...,...,...,...,...,...,...,...,...,...,...,...
117,Williamsbridge,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
118,Willowbrook,Doctor's Office,Medical Center,Dentist's Office,Veterinarian,Home Service,Daycare,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
119,Windsor Terrace,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
120,Wingate,Hospital,Doctor's Office,Medical School,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service


# 5. Cluster Neighbourhoods


Run _k_-means to cluster the neighborhood into 5 clusters.


In [22]:
# set number of clusters
kclusters = 5

medical_grouped_clustering = medical_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(medical_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 0, 0, 2, 0, 2, 2, 2, 3, 2], dtype=int32)

In [23]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

medical_merged = medical_venues

# merge medical_grouped with medical_data to add latitude/longitude for each neighborhood
medical_merged = medical_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

medical_merged # check the last columns!

Unnamed: 0,name,categories,lat,lng,address,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Montefiore Medical Pavillion,Medical Center,40.880135,-73.878712,3400 Bainbridge Ave,Bronx,Norwood,4,Doctor's Office,Medical Center,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
59,Montefiore Medical Center (Albert Einstein Col...,Medical Center,40.880879,-73.880010,2532 Grand Concourse,Bronx,Norwood,4,Doctor's Office,Medical Center,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
118,Isabella Geriatric,Medical Center,40.854553,-73.927845,515 Audubon Ave,Bronx,Morris Heights,4,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
207,Bronx VA Medical Center,Hospital,40.867172,-73.905960,130 W Kingsbridge Rd,Bronx,Kingsbridge Heights,4,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
253,Bronx Lebanon - Fulton Division,Medical Center,40.831548,-73.902859,1276 Fulton Ave,Bronx,Claremont Village,4,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6011,"NYC Health + Hospitals/Gotham Health, Mariners...",Hospital,40.625953,-74.157035,2040 Forest Ave,Staten Island,Graniteville,3,Hospital,Doctor's Office,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Veterinarian
6078,Physicians First Messages / Medical Answering ...,Office,40.596117,-74.094927,1282 Richmond Rd,Staten Island,Old Town,3,Office,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
6167,Staten Island Dental Group,Dentist's Office,40.603145,-74.140400,979 Willowbrook Rd,Staten Island,Willowbrook,4,Doctor's Office,Medical Center,Dentist's Office,Veterinarian,Home Service,Daycare,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
6188,New Dorp MRI,Medical Lab,40.575391,-74.118723,,Staten Island,New Dorp,4,Doctor's Office,Medical Center,Medical Lab,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service


#### Finally, let's visualize the resulting clusters


In [24]:
# create map
map_clusters = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in \
        zip(medical_merged['lat'], medical_merged['lng'],
            medical_merged['Neighbourhood'], medical_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 6. Examine Clusters

Now, examine each cluster and determine the discriminating venue categories that distinguish each cluster.

#### Cluster 0 (red): Doctor's Office

In [25]:
cluster0 = medical_merged.loc[medical_merged['Cluster Labels'] == 0, medical_merged.columns[[1] + list(range(5, medical_merged.shape[1]))]]
print(cluster0.shape)
cluster0

(194, 14)


Unnamed: 0,categories,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
398,Hospital,Bronx,Spuyten Duyvil,0,Doctor's Office,Hospital,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Veterinarian
790,Medical Center,Bronx,Pelham Parkway,0,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
1070,Medical Center,Bronx,Westchester Square,0,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
1185,Medical Center,Bronx,Pelham Gardens,0,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
1433,Doctor's Office,Bronx,Parkchester,0,Doctor's Office,Hospital,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Veterinarian
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5492,Doctor's Office,Staten Island,Castleton Corners,0,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
5687,Doctor's Office,Staten Island,Eltingville,0,Doctor's Office,Hospital,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
5768,Hospital,Staten Island,Bay Terrace,0,Doctor's Office,Hospital,Optical Shop,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
5876,Hospital,Staten Island,Eltingville,0,Doctor's Office,Hospital,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian


#### Cluster 1 (purple): Medical Center

In [26]:
cluster1 = medical_merged.loc[medical_merged['Cluster Labels'] == 1, medical_merged.columns[[1] + list(range(5, medical_merged.shape[1]))]]
print(cluster1.shape)
cluster1

(43, 14)


Unnamed: 0,categories,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
355,Medical Center,Bronx,Unionport,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
433,Medical Center,Bronx,West Farms,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
1272,Medical Center,Bronx,Morrisania,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
1550,Medical Center,Bronx,Concourse,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
2601,Medical Center,Bronx,Co-op City,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
436,Medical Center,Manhattan,Tudor City,1,Medical Center,Coworking Space,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
756,Coworking Space,Manhattan,Tudor City,1,Medical Center,Coworking Space,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
774,Medical Center,Manhattan,Clinton,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
905,Medical Center,Manhattan,Manhattan Valley,1,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
972,Medical Center,Manhattan,Upper West Side,1,Medical Center,Hospital,Home Service,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian


#### Cluster 2 (blue): Doctor's Office (Veterinarian)

In [27]:
cluster2 = medical_merged.loc[medical_merged['Cluster Labels'] == 2, medical_merged.columns[[1] + list(range(5, medical_merged.shape[1]))]]
print(cluster2.shape)
cluster2

(55, 14)


Unnamed: 0,categories,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2821,Doctor's Office,Bronx,University Heights,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3081,Doctor's Office,Bronx,University Heights,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3246,Doctor's Office,Bronx,Longwood,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3350,Doctor's Office,Bronx,Longwood,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3392,Doctor's Office,Bronx,Bedford Park,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3613,Doctor's Office,Bronx,Soundview,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3714,Doctor's Office,Bronx,Longwood,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
3965,Doctor's Office,Bronx,University Heights,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
4217,Doctor's Office,Bronx,Kingsbridge,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital
4303,Doctor's Office,Bronx,Castle Hill,2,Doctor's Office,Veterinarian,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Hospital


#### Cluster 3 (green): Doctor's Office (Veterinarian)

In [28]:
cluster3 = medical_merged.loc[medical_merged['Cluster Labels'] == 3, medical_merged.columns[[1] + list(range(5, medical_merged.shape[1]))]]
print(cluster3.shape)
cluster3

(25, 14)


Unnamed: 0,categories,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1245,Doctor's Office,Bronx,Bronxdale,3,Hospital,Doctor's Office,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Veterinarian
2077,Hospital,Bronx,Bronxdale,3,Hospital,Doctor's Office,Sandwich Place,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Veterinarian
4401,Hospital,Bronx,Belmont,3,Hospital,Sandwich Place,Daycare,Dentist's Office,Doctor's Office,Eye Doctor,Fire Station,Health & Beauty Service,Home Service,Veterinarian
1129,Medical Center,Manhattan,Yorkville,3,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
1769,Hospital,Manhattan,Yorkville,3,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
2409,Doctor's Office,Manhattan,Yorkville,3,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
2489,Doctor's Office,Manhattan,Yorkville,3,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
3569,Hospital,Manhattan,Yorkville,3,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
475,Hospital,Brooklyn,Wingate,3,Hospital,Doctor's Office,Medical School,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service
685,Medical Center,Brooklyn,Wingate,3,Hospital,Doctor's Office,Medical School,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service


#### Cluster 4 (orange): Doctor's Office/Hospital/Medical Center

In [29]:
cluster4 = medical_merged.loc[medical_merged['Cluster Labels'] == 4, medical_merged.columns[[1] + list(range(5, medical_merged.shape[1]))]]
print(cluster4.shape)
cluster4

(183, 14)


Unnamed: 0,categories,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Medical Center,Bronx,Norwood,4,Doctor's Office,Medical Center,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
59,Medical Center,Bronx,Norwood,4,Doctor's Office,Medical Center,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
118,Medical Center,Bronx,Morris Heights,4,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
207,Hospital,Bronx,Kingsbridge Heights,4,Hospital,Doctor's Office,Medical Center,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Veterinarian
253,Medical Center,Bronx,Claremont Village,4,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4802,Hospital,Staten Island,New Dorp,4,Doctor's Office,Medical Center,Medical Lab,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service
5575,Doctor's Office,Staten Island,Dongan Hills,4,Doctor's Office,Medical Center,Veterinarian,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
5936,Home Service,Staten Island,New Dorp,4,Doctor's Office,Medical Center,Medical Lab,Hospital,Home Service,Daycare,Dentist's Office,Eye Doctor,Fire Station,Health & Beauty Service
6167,Dentist's Office,Staten Island,Willowbrook,4,Doctor's Office,Medical Center,Dentist's Office,Veterinarian,Home Service,Daycare,Eye Doctor,Fire Station,Health & Beauty Service,Hospital
