# Capstone Project - Residential  Area For Education
### Applied Data Science Capstone by IBM/Coursera


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>



In this project we will try to find an optimal location for a better education options available  in Miami. Specifically, this report will be targeted to stakeholders interested in selecting a **residence location** in **Miami**, Florida, USA which has maximum **Education institution**.

There are lots educational institutes  available  in  Miami and we will try to detect **locations that has more options for Education**. 

We will use our data science powers to generate a few most promising  neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>


Based on definition of our problem, factors that will influence our decision  are:
* Number of existing education institution available in the neighborhood 

Following data sources will be needed to extract/generate the required information:

* Neighborhoods in Miami and its Co-ordinates.
This data will be extracted from the following web page.
https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami


* Number of educational institutes and their type and location in every neighborhood will be obtained using **Foursquare API**

Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install geopy

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>


## 1. Download and Explore Dataset


Miami has a total  of 24 neighborhoods. In order to segment  the neighborhoods and explore them, we will essentially need a dataset that contains  the neighborhoods well as the latitude and longitude coordinates of each neighborhood. 



#### Load and explore the data


Next, let's load the data.


In [2]:
from bs4 import BeautifulSoup
import requests

url = 'https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami'

r = requests.get(url)

soup = BeautifulSoup(r.content)


In [3]:

table = soup.find('table')
df = pd.read_html(str(table))[0]

In [4]:
df.head()

Unnamed: 0,Neighborhood,Demonym,Population2010,Population/Km²,Sub-neighborhoods,Coordinates
0,Allapattah,,54289,4401,,25.815-80.224
1,Arts & Entertainment District,,11033,7948,,25.799-80.190
2,Brickell,Brickellite,31759,14541,West Brickell,25.758-80.193
3,Buena Vista,,9058,3540,Buena Vista East Historic District and Design ...,25.813-80.192
4,Coconut Grove,Grovite,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712-80.257


Remove columns which is not required

In [5]:
df.drop(["Demonym","Population2010","Population/Km²","Sub-neighborhoods"], axis = 1, inplace=True) 

Remove rows which has no coordinate value

In [6]:
df.drop(df[df['Coordinates'].isnull()].index, inplace=True)

Split the coordinate in to Latitude and Longitude columns

In [7]:
df[['Latitude','Longitude']] = df['Coordinates'].str.split('-',expand=True)

remove the Coordinates columns

In [8]:
df.drop(["Coordinates"], axis = 1, inplace=True) 

While splitiing columns we lost -, just add that to longitude.

In [9]:
df['Longitude'] = '-' + df['Longitude']

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 24 entries, 0 to 24
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Neighborhood  24 non-null     object
 1   Latitude      24 non-null     object
 2   Longitude     24 non-null     object
dtypes: object(3)
memory usage: 768.0+ bytes


Convert Latitude and Longitude columns to flot

In [11]:
df[['Latitude', 'Longitude']] = df[['Latitude', 'Longitude']].apply(pd.to_numeric)

In [12]:
neighborhoods = df

Let's take a quick look at the data.


In [13]:
neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allapattah,25.815,-80.224
1,Arts & Entertainment District,25.799,-80.19
2,Brickell,25.758,-80.193
3,Buena Vista,25.813,-80.192
4,Coconut Grove,25.712,-80.257


Take a look at the empty dataframe to confirm that the columns are as intended.


In [14]:
print('The dataframe has {} neighborhoods.'.format(neighborhoods.shape[0]))

The dataframe has 24 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Miami City.


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.


In [15]:
address = 'Miami, FL'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Miami City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Miami City are 25.7742658, -80.1936589.


#### Create a map of Miami with neighborhoods superimposed on top.


In [16]:
# create map of Miami using latitude and longitude values
map_miami = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_miami)  
    
map_miami

In [17]:
#neighborhoods.info()
neighborhoods.head(40)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allapattah,25.815,-80.224
1,Arts & Entertainment District,25.799,-80.19
2,Brickell,25.758,-80.193
3,Buena Vista,25.813,-80.192
4,Coconut Grove,25.712,-80.257
5,Coral Way,25.75,-80.283
6,Design District,25.813,-80.193
7,Downtown,25.774,-80.193
8,Edgewater,25.802,-80.19
9,Flagami,25.762,-80.316


Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.


#### Define Foursquare Credentials and Version


In [51]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


#### Let's explore the first neighborhood in our dataframe.


Get the first neighborhood's name.


In [19]:
neighborhoods.loc[0, 'Neighborhood']

'Allapattah'

Get the neighborhood's latitude and longitude values.


In [20]:
neighborhood_latitude = round(neighborhoods.loc[0, 'Latitude'],3) # neighborhood latitude value
neighborhood_longitude = round(neighborhoods.loc[0, 'Longitude'],3) # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Allapattah are 25.815, -80.224.


#### Now, let's get the top 100 venues that are in Allapattah within a radius of 1500 meters.


First, let's create the GET request URL. Name your URL **url**.


In [21]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1500 # define radius
categoryId='4bf58dd8d48988d13b941735' # category for school
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    categoryId,
    radius, 
    LIMIT)


url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=QBDVQDIAMM0I0Q3QVMOL31R3XQRX0BPV5OOSEIILFNFXDGEJ&client_secret=JPGFSDCCFB1POQDRPVMJOCGOBWJQOLEGD24GUTMDAI3SPM1N&v=20180605&ll=25.815,-80.224&categoryId=4bf58dd8d48988d13b941735&radius=1500&limit=100'

Send the GET request and examine the resutls


In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f93bb134e36ac59a5a54865'},
 'response': {'headerLocation': 'Model City',
  'headerFullLocation': 'Model City, Miami',
  'headerLocationGranularity': 'neighborhood',
  'query': 'school',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 25.828500013500015,
    'lng': -80.2090313978883},
   'sw': {'lat': 25.801499986499987, 'lng': -80.23896860211171}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c3a53500a71c9b6d01844c9',
       'name': 'Miami Jackson Senior High School',
       'location': {'address': '1751 NW 36th St',
        'lat': 25.809993097862588,
        'lng': -80.22548431119802,
        'labeledLatLngs': [{'label': 'display',
          'lat': 25.809993097862588,
          'lng': -80.22548431119802},
         {'lab

From the Foursquare lab in the previous module, we know that all the information is in the _items_ key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.


In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.


In [24]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Miami Jackson Senior High School,High School,25.809993,-80.225484
1,Allapattah Middle School,College Academic Building,25.817823,-80.218966
2,Lenora Braynon Smith Elementary School,Elementary School,25.818559,-80.217238
3,Brownsville Middle School,Middle School,25.819448,-80.235542
4,Earlington Heights Elementary,Elementary School,25.818344,-80.232717


And how many venues were returned by Foursquare?


In [25]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

7 venues were returned by Foursquare.


# Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Miami that have high educational institute density, 

In first step we have collected the required **data: location and type (category) of every educational institutes in  Miami ** (Allapattah). We have also **identified educational institute types** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**educational institute density**' across different areas of Miami - we will use **heatmaps** to identify a few promising areas with high number of educational institutes in general.

We will present map of all neighborhoods  and create clusters (using **k-means clustering**) of those locations to identify / neighborhoods which should be a optimal potential location for stakeholders.

## Analysis <a name="analysis"></a>


Let's perform some basic explanatory data analysis and derive some additional info from our raw data. Lets Explore each Neighborhoods in Miami

## 2. Explore Neighborhoods in Miami


#### Let's create a function to repeat the same process to all the neighborhoods in Miami


In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryId
        )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called miami_venues.


In [27]:
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allapattah,25.815,-80.224
1,Arts & Entertainment District,25.799,-80.19
2,Brickell,25.758,-80.193
3,Buena Vista,25.813,-80.192
4,Coconut Grove,25.712,-80.257
5,Coral Way,25.75,-80.283
6,Design District,25.813,-80.193
7,Downtown,25.774,-80.193
8,Edgewater,25.802,-80.19
9,Flagami,25.762,-80.316


In [28]:
miami_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude'])

Allapattah
Arts & Entertainment District
Brickell
Buena Vista
Coconut Grove
Coral Way
Design District
Downtown
Edgewater
Flagami
Grapeland Heights
Liberty City
Little Haiti
Little Havana
Lummus Park
Midtown
Overtown
Park West
The Roads
Upper Eastside
Venetian Islands
Virginia Key
West Flagler
Wynwood


In [29]:
neighborhoods['Latitude']

0     25.815
1     25.799
2     25.758
3     25.813
4     25.712
5     25.750
6     25.813
7     25.774
8     25.802
9     25.762
10    25.792
12    25.832
13    25.824
14    25.773
15    25.777
16    25.807
17    25.787
18    25.785
19    25.756
20    25.830
21    25.791
22    25.736
23    25.775
24    25.804
Name: Latitude, dtype: float64

#### Let's check the size of the resulting dataframe


In [30]:
print(miami_venues.shape)
miami_venues.head()

(342, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allapattah,25.815,-80.224,Miami Jackson Senior High School,25.809993,-80.225484,High School
1,Allapattah,25.815,-80.224,Allapattah Middle School,25.817823,-80.218966,College Academic Building
2,Allapattah,25.815,-80.224,Lenora Braynon Smith Elementary School,25.818559,-80.217238,Elementary School
3,Allapattah,25.815,-80.224,Brownsville Middle School,25.819448,-80.235542,Middle School
4,Allapattah,25.815,-80.224,Earlington Heights Elementary,25.818344,-80.232717,Elementary School


Let's check how many venues were returned for each neighborhood


In [31]:
miami_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allapattah,7,7,7,7,7,7
Arts & Entertainment District,20,20,20,20,20,20
Brickell,25,25,25,25,25,25
Buena Vista,17,17,17,17,17,17
Coconut Grove,5,5,5,5,5,5
Coral Way,6,6,6,6,6,6
Design District,18,18,18,18,18,18
Downtown,37,37,37,37,37,37
Edgewater,24,24,24,24,24,24
Flagami,9,9,9,9,9,9


#### Let's find out how many unique categories can be curated from all the returned venues


In [32]:
print('There are {} uniques categories.'.format(len(miami_venues['Venue Category'].unique())))

There are 28 uniques categories.


<a id='item3'></a>


## 3. Analyze Each Neighborhood


In [33]:
# one hot encoding
miami_onehot = pd.get_dummies(miami_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
miami_onehot['Neighborhood'] = miami_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [miami_onehot.columns[-1]] + list(miami_onehot.columns[:-1])
miami_onehot = miami_onehot[fixed_columns]

miami_onehot.head()

Unnamed: 0,Neighborhood,Adult Education Center,Art Gallery,Business Service,Church,College Academic Building,College Arts Building,College Lab,Daycare,Driving School,Elementary School,General College & University,Golf Course,High School,Language School,Middle School,Miscellaneous Shop,Music School,Nursery School,Playground,Preschool,Private School,Religious School,School,Student Center,Swim School,Trade School,University,Zoo
0,Allapattah,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Allapattah,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Allapattah,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Allapattah,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Allapattah,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [34]:
miami_onehot.shape

(342, 29)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [35]:
miami_grouped = miami_onehot.groupby('Neighborhood').mean().reset_index()
miami_grouped

Unnamed: 0,Neighborhood,Adult Education Center,Art Gallery,Business Service,Church,College Academic Building,College Arts Building,College Lab,Daycare,Driving School,Elementary School,General College & University,Golf Course,High School,Language School,Middle School,Miscellaneous Shop,Music School,Nursery School,Playground,Preschool,Private School,Religious School,School,Student Center,Swim School,Trade School,University,Zoo
0,Allapattah,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.571429,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arts & Entertainment District,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.1,0.1,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.05,0.0
2,Brickell,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.12,0.04,0.0,0.04,0.04,0.0,0.2,0.08,0.0,0.2,0.04,0.04,0.04,0.0,0.0
3,Buena Vista,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.176471,0.0,0.0,0.176471,0.058824,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.352941,0.0,0.0,0.0,0.0,0.0
4,Coconut Grove,0.0,0.0,0.2,0.2,0.4,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Coral Way,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
6,Design District,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.222222,0.0,0.0,0.166667,0.055556,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
7,Downtown,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.135135,0.054054,0.0,0.027027,0.108108,0.027027,0.0,0.054054,0.027027,0.0,0.081081,0.054054,0.0,0.27027,0.027027,0.027027,0.027027,0.0,0.0
8,Edgewater,0.041667,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.166667,0.0,0.0,0.125,0.083333,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.416667,0.0,0.0,0.0,0.041667,0.0
9,Flagami,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.111111,0.0,0.0


#### Let's confirm the new size


In [36]:
miami_grouped.shape

(24, 29)

#### Let's print each neighborhood along with the top 5 most common venues


In [37]:
num_top_venues = 5

for hood in miami_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = miami_grouped[miami_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allapattah----
                       venue  freq
0          Elementary School  0.57
1              Middle School  0.14
2  College Academic Building  0.14
3                High School  0.14
4               Music School  0.00


----Arts & Entertainment District----
                    venue  freq
0                  School  0.40
1       Elementary School  0.20
2             High School  0.10
3         Language School  0.10
4  Adult Education Center  0.05


----Brickell----
                    venue  freq
0                  School  0.20
1               Preschool  0.20
2         Language School  0.12
3          Private School  0.08
4  Adult Education Center  0.04


----Buena Vista----
                venue  freq
0              School  0.35
1         High School  0.18
2   Elementary School  0.18
3  Miscellaneous Shop  0.06
4         Art Gallery  0.06


----Coconut Grove----
                       venue  freq
0  College Academic Building   0.4
1           Business Service   0.2
2        

#### Let's put that into a _pandas_ dataframe


First, let's write a function to sort the venues in descending order.


In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = miami_grouped['Neighborhood']

for ind in np.arange(miami_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(miami_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,Elementary School,High School,College Academic Building,Middle School,Zoo,Art Gallery,Business Service,Church,College Arts Building,College Lab
1,Arts & Entertainment District,School,Elementary School,Language School,High School,Miscellaneous Shop,Art Gallery,University,Adult Education Center,Preschool,Playground
2,Brickell,School,Preschool,Language School,Private School,Nursery School,Church,Elementary School,General College & University,Middle School,Music School
3,Buena Vista,School,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
4,Coconut Grove,College Academic Building,Business Service,Church,Elementary School,Zoo,University,Art Gallery,College Arts Building,College Lab,Daycare


<a id='item4'></a>


## 4. Cluster Neighborhoods


Run _k_-means to cluster the neighborhood into 5 clusters.


In [40]:
# set number of clusters
kclusters = 5

miami_grouped_clustering = miami_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(miami_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 0, 1, 1, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

miami_merged = neighborhoods

# merge miami_merged with neighborhoods to add latitude/longitude for each neighborhood
miami_merged = miami_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

miami_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,25.815,-80.224,0,Elementary School,High School,College Academic Building,Middle School,Zoo,Art Gallery,Business Service,Church,College Arts Building,College Lab
1,Arts & Entertainment District,25.799,-80.19,1,School,Elementary School,Language School,High School,Miscellaneous Shop,Art Gallery,University,Adult Education Center,Preschool,Playground
2,Brickell,25.758,-80.193,1,School,Preschool,Language School,Private School,Nursery School,Church,Elementary School,General College & University,Middle School,Music School
3,Buena Vista,25.813,-80.192,1,School,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
4,Coconut Grove,25.712,-80.257,0,College Academic Building,Business Service,Church,Elementary School,Zoo,University,Art Gallery,College Arts Building,College Lab,Daycare


In [42]:
miami_merged.head(30)

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,25.815,-80.224,0,Elementary School,High School,College Academic Building,Middle School,Zoo,Art Gallery,Business Service,Church,College Arts Building,College Lab
1,Arts & Entertainment District,25.799,-80.19,1,School,Elementary School,Language School,High School,Miscellaneous Shop,Art Gallery,University,Adult Education Center,Preschool,Playground
2,Brickell,25.758,-80.193,1,School,Preschool,Language School,Private School,Nursery School,Church,Elementary School,General College & University,Middle School,Music School
3,Buena Vista,25.813,-80.192,1,School,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
4,Coconut Grove,25.712,-80.257,0,College Academic Building,Business Service,Church,Elementary School,Zoo,University,Art Gallery,College Arts Building,College Lab,Daycare
5,Coral Way,25.75,-80.283,1,School,Elementary School,Nursery School,Zoo,High School,Art Gallery,Business Service,Church,College Academic Building,College Arts Building
6,Design District,25.813,-80.193,1,School,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
7,Downtown,25.774,-80.193,1,School,Elementary School,Language School,Preschool,Music School,General College & University,Private School,Church,Daycare,High School
8,Edgewater,25.802,-80.19,1,School,Elementary School,High School,Language School,Miscellaneous Shop,Art Gallery,College Arts Building,University,Adult Education Center,Playground
9,Flagami,25.762,-80.316,1,School,Music School,Trade School,Daycare,Middle School,Elementary School,Zoo,Golf Course,Art Gallery,Business Service


In [43]:

miami_merged.drop(miami_merged[miami_merged['Cluster Labels'].isnull()].index, inplace=True)

In [44]:
miami_merged.head(30)

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allapattah,25.815,-80.224,0,Elementary School,High School,College Academic Building,Middle School,Zoo,Art Gallery,Business Service,Church,College Arts Building,College Lab
1,Arts & Entertainment District,25.799,-80.19,1,School,Elementary School,Language School,High School,Miscellaneous Shop,Art Gallery,University,Adult Education Center,Preschool,Playground
2,Brickell,25.758,-80.193,1,School,Preschool,Language School,Private School,Nursery School,Church,Elementary School,General College & University,Middle School,Music School
3,Buena Vista,25.813,-80.192,1,School,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
4,Coconut Grove,25.712,-80.257,0,College Academic Building,Business Service,Church,Elementary School,Zoo,University,Art Gallery,College Arts Building,College Lab,Daycare
5,Coral Way,25.75,-80.283,1,School,Elementary School,Nursery School,Zoo,High School,Art Gallery,Business Service,Church,College Academic Building,College Arts Building
6,Design District,25.813,-80.193,1,School,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
7,Downtown,25.774,-80.193,1,School,Elementary School,Language School,Preschool,Music School,General College & University,Private School,Church,Daycare,High School
8,Edgewater,25.802,-80.19,1,School,Elementary School,High School,Language School,Miscellaneous Shop,Art Gallery,College Arts Building,University,Adult Education Center,Playground
9,Flagami,25.762,-80.316,1,School,Music School,Trade School,Daycare,Middle School,Elementary School,Zoo,Golf Course,Art Gallery,Business Service


Finally, let's visualize the resulting clusters


In [45]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(miami_merged['Latitude'], miami_merged['Longitude'], miami_merged['Neighborhood'], miami_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>


## 5. Examine Clusters


Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.


#### Cluster 1


In [46]:
miami_merged.loc[miami_merged['Cluster Labels'] == 0, miami_merged.columns[[1] + list(range(5, miami_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,25.815,High School,College Academic Building,Middle School,Zoo,Art Gallery,Business Service,Church,College Arts Building,College Lab
4,25.712,Business Service,Church,Elementary School,Zoo,University,Art Gallery,College Arts Building,College Lab,Daycare


#### Cluster 2


In [47]:
miami_merged.loc[miami_merged['Cluster Labels'] == 1, miami_merged.columns[[1] + list(range(5, miami_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,25.799,Elementary School,Language School,High School,Miscellaneous Shop,Art Gallery,University,Adult Education Center,Preschool,Playground
2,25.758,Preschool,Language School,Private School,Nursery School,Church,Elementary School,General College & University,Middle School,Music School
3,25.813,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
5,25.75,Elementary School,Nursery School,Zoo,High School,Art Gallery,Business Service,Church,College Academic Building,College Arts Building
6,25.813,Elementary School,High School,Language School,Art Gallery,College Arts Building,Preschool,Miscellaneous Shop,Golf Course,Business Service
7,25.774,Elementary School,Language School,Preschool,Music School,General College & University,Private School,Church,Daycare,High School
8,25.802,Elementary School,High School,Language School,Miscellaneous Shop,Art Gallery,College Arts Building,University,Adult Education Center,Playground
9,25.762,Music School,Trade School,Daycare,Middle School,Elementary School,Zoo,Golf Course,Art Gallery,Business Service
12,25.832,School,High School,Zoo,Art Gallery,Business Service,Church,College Academic Building,College Arts Building,College Lab
13,25.824,Elementary School,High School,College Arts Building,Preschool,Middle School,Zoo,Golf Course,Art Gallery,Business Service


#### Cluster 3


In [48]:
miami_merged.loc[miami_merged['Cluster Labels'] == 2, miami_merged.columns[[1] + list(range(5, miami_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,25.736,High School,Zoo,University,Art Gallery,Business Service,Church,College Academic Building,College Arts Building,Daycare


#### Cluster 4


In [49]:
miami_merged.loc[miami_merged['Cluster Labels'] == 3, miami_merged.columns[[1] + list(range(5, miami_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,25.792,Golf Course,Zoo,University,Art Gallery,Business Service,Church,College Academic Building,College Arts Building,College Lab


#### Cluster 5


In [50]:
miami_merged.loc[miami_merged['Cluster Labels'] == 4, miami_merged.columns[[1] + list(range(5, miami_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,25.791,University,Art Gallery,Business Service,Church,College Academic Building,College Arts Building,College Lab,Daycare,Driving School


## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of educational institutes in Miami, when moving away from city center its density is reducing. Highest concentration of educational institutes was detected near to the coastal area especially northern area of the city. So we focused our attention to areas northern area. Cluster wise we can see Cluster 2 is doing exceptionally well in terms of number of educational institutes.


By considering data and exploring the map we can see that area near to Midtown neighborhood is most suitable  for requirement.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Miami areas with high number of educational institutions in order to aid stakeholders in narrowing down the search for optimal location for their residence. By calculating educational institutions density distribution from Foursquare data we have identified. 

By considering data and map we can see that area near to Midtown neighborhood is most suitable for requirement.
