# Capstone Project - The Battle of the Neighborhoods (Week 2) #

## Applied Data Science Capstone by IBM/Coursera ##


### Introduction: Business Problem ###

The aim of this project is to find a public schools in Chicago, US placed in a suitable location equipped witha proper commercial establishments. In particular this report will be targeted to people moving from other cities/states/countries to Chicago and interested in sending their children to the right school in Chicago.

### Data ###

Based on definition of the problem, the following factors that will influence our decission are:

- finding the geographical location of the schools in Chicago,
- finding the most common venues surrounding a particular school,

We will be using the geographical coordinates of Chicago and geographical location of the schools to plot school location, 
and finally cluster our schools and present our findings.

Following data sources will be needed to extract/generate the required information:

Part 1: Using a real world data set from City of Chicago containing information on Chicago public schools in 2011-2012 school year, updated in 2018. A dataset consisting of location of the school, its type and other optional parameters describing a school.

Part 2: Creating a new consolidated dataset of the schools, the most common venues and the respective Community Areas along with co-ordinates.: This data will be fetched using Four Square API to explore the venues around schhols and to apply machine learning algorithm to cluster the schools and present the findings by plotting it on maps using Folium.

### Part 1: Using a real world data set from City of Chicago containing information on Chicago public schools in 2011-2012 school year,  updated in 2018 ###

Chicago Public Schools - Progress Report Cards (2011-2012)

This is a very detaied dataset containing many useful information about each public school in Chicago 
Some properties of dataset include:
- Name of School
- Type of School (Elementary, Middle, or High School)
- Street Address
- ZIP Code
- Phone Number
- Website URL
- Safety Score
- Family Involvement Score
- Environment Score
- Leaders Score
- Teachers Score
- Latitude
- Longitude
- Community Area Name

Data set URL: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t

Importing libraries:

In [58]:
import numpy as np
import pandas as pd

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install folium

#Importing folium to visualise Maps and plot based on Lat and Lng
import folium


# use the inline backend to generate the plots within the browser
%matplotlib inline 

#Importing Matplot lib and associated packages to perform Data Visualisation and Exploratory Data Analysis
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot') # optional: for ggplot-like style

# check for latest version of Matplotlib
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Requests to request web pages by making get requests to FourSquare REST Client
import requests

#To normalise data returned by FourSquare API
from pandas.io.json import json_normalize

#Importing KMeans from SciKit library to Classify neighborhoods into clusters
from sklearn.cluster import KMeans

print('Libraries imported')

Matplotlib version:  3.3.2
Libraries imported


Reading from dataset:

In [59]:
chicago_schools_df = pd.read_csv('Chicago_Public_Schools_2011-2012_updated.csv', index_col=None, error_bad_lines=False)
chicago_schools_df.head()


Unnamed: 0,School ID,Name of School,"Elementary, Middle, or High School",Street Address,City,State,ZIP Code,Phone Number,Link,Network Manager,...,RCDTS Code,X_COORDINATE,Y_COORDINATE,Latitude,Longitude,Community Area Number,Community Area Name,Ward,Police District,Location
0,609966,Charles G Hammond Elementary School,ES,2819 W 21st Pl,Chicago,IL,60623,(773) 535-4580,http://schoolreports.cps.edu/SchoolProgressRep...,Pilsen-Little Village Elementary Network,...,150000000000000,1157809.015,1889554.623,41.852691,-87.696278,30,SOUTH LAWNDALE,12,10,"(41.85269133, -87.69627777)"
1,610539,Marvin Camras Elementary School,ES,3000 N Mango Ave,Chicago,IL,60634,(773) 534-2960,http://schoolreports.cps.edu/SchoolProgressRep...,Fullerton Elementary Network,...,150000000000000,1137482.296,1919394.9,41.934966,-87.770165,19,BELMONT CRAGIN,30,25,"(41.93496641, -87.77016525)"
2,609852,Eliza Chappell Elementary School,ES,2135 W Foster Ave,Chicago,IL,60625,(773) 534-2390,http://schoolreports.cps.edu/SchoolProgressRep...,Ravenswood-Ridge Elementary Network,...,150000000000000,1161016.902,1934466.545,41.975867,-87.683254,4,LINCOLN SQUARE,47,20,"(41.975867, -87.68325438)"
3,609835,Daniel R Cameron Elementary School,ES,1234 N Monticello Ave,Chicago,IL,60651,(773) 534-4290,http://schoolreports.cps.edu/SchoolProgressRep...,Garfield-Humboldt Elementary Network,...,150000000000000,1151767.546,1908130.068,41.903785,-87.717963,23,HUMBOLDT PARK,26,25,"(41.90378521, -87.71796315)"
4,610521,Sir Miles Davis Magnet Elementary Academy,ES,6730 S Paulina St,Chicago,IL,60636,(773) 535-9120,http://schoolreports.cps.edu/SchoolProgressRep...,Englewood-Gresham Elementary Network,...,150000000000000,1166133.893,1859929.018,41.771222,-87.666567,67,WEST ENGLEWOOD,15,7,"(41.77122181, -87.66656657)"


In [60]:
chicago_schools_sel = chicago_schools_df[["Name of School", "Street Address", "City","ZIP Code","Elementary, Middle, or High School","Safety Score",
 #                                          "Family Involvement Score","Environment Score","Instruction Score","Leaders Score ","Teachers Score",
 #                                          "Rate of Misconducts (per 100 students) ",
 #                                        "Average Teacher Attendance","Average Student Attendance",
 #                                          "College Enrollment (number of students) ",
 #                                          "Graduation Rate %",
 #                                          "Students Taking  Algebra %","Students Passing  Algebra %",
                                          "Latitude","Longitude",
 #                                         "Community Area Number",
                                         "Community Area Name"]]
 #                                         ,"Ward","Police District","Location"]]
chicago_schools_sel.head()
chicago_schools_sel.shape

(566, 9)

How many schools in each Chicago Community Area: 

In [61]:
chicago_schools_sel['Community Area Name'].value_counts()

AUSTIN            23
SOUTH LAWNDALE    22
WEST TOWN         20
ENGLEWOOD         17
NEAR WEST SIDE    16
                  ..
BURNSIDE           1
MONTCLARE          1
LOOP               1
OAKLAND            1
OHARE              1
Name: Community Area Name, Length: 77, dtype: int64

How many public schools of a particular type are in Chicago:

In [62]:
seriesObjE = chicago_schools_sel.apply(lambda x: True if x['Elementary, Middle, or High School'] == 'ES' else False , axis=1)
# Count number of True in series
numOfRowsE = len(seriesObjE[seriesObjE == True].index)
print('Number of elementary schools : ', numOfRowsE)

seriesObjM = chicago_schools_sel.apply(lambda x: True if x['Elementary, Middle, or High School'] == 'MS' else False , axis=1)
numOfRowsM = len(seriesObjM[seriesObjM == True].index)
print('Number of middle schools : ', numOfRowsM)

seriesObjH = chicago_schools_sel.apply(lambda x: True if x['Elementary, Middle, or High School'] == 'HS' else False , axis=1)
numOfRowsH = len(seriesObjH[seriesObjH == True].index)
print('Number of high schools : ', numOfRowsH)

Number of elementary schools :  462
Number of middle schools :  11
Number of high schools :  93


Pivoting table to show a type of a particular school (ES - elementary school, MS - middle school, HS - high school)

In [63]:
chicago_schools_cat = pd.pivot_table(chicago_schools_sel,
                               values=['City'],
                               index=['Name of School'],
                               columns=['Elementary, Middle, or High School'],
                               aggfunc=len,
                               fill_value=0,
                               margins=True)
chicago_schools_cat

Unnamed: 0_level_0,City,City,City,City
"Elementary, Middle, or High School",ES,HS,MS,All
Name of School,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
A.N. Pritzker School,1,0,0,1
Abraham Lincoln Elementary School,1,0,0,1
Adam Clayton Powell Paideia Community Academy Elementary School,1,0,0,1
Adlai E Stevenson Elementary School,1,0,0,1
Agustin Lara Elementary Academy,1,0,0,1
...,...,...,...,...
Wilma Rudolph Elementary Learning Center,1,0,0,1
Wolfgang A Mozart Elementary School,1,0,0,1
Woodlawn Community Elementary School,1,0,0,1
World Language Academy High School,0,1,0,1


Groupping of schools with accordance to ZIP Code and Community Area Name:

In [64]:
chicago_schools_sel_gr = chicago_schools_sel.groupby(["ZIP Code","Community Area Name"])["Name of School"].apply(", ".join).reset_index()
chicago_schools_sel_gr.head()
chicago_schools_sel_gr.shape

(140, 3)

Using geopy Nominatim geolocator to fing geographical coordinates of Chicago:

In [65]:
address = "Chicago, IL"

geolocator = Nominatim(user_agent="chicago_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chicago are 41.8755616, -87.6244212.


Plotting map of Chicago using Folium:

In [66]:
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=10)
map_chicago

Overlaiying school locations on the map of Chicago:

In [67]:
for lat, lng, ncomm, nschool in zip(
        chicago_schools_sel['Latitude'], 
        chicago_schools_sel['Longitude'], 
        chicago_schools_sel['Community Area Name'], 
        chicago_schools_sel['Name of School']):
    label = '{}, {}'.format(ncomm, nschool)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  

map_chicago

Due to substantial number of public schools in Chicago, focusing on high schools only:

In [68]:
chicago_high_schools = chicago_schools_sel[chicago_schools_sel['Elementary, Middle, or High School'] == 'HS']
chicago_high_schools

Unnamed: 0,Name of School,Street Address,City,ZIP Code,"Elementary, Middle, or High School",Safety Score,Latitude,Longitude,Community Area Name
8,Walter Payton College Preparatory High School,1034 N Wells St,Chicago,60610,HS,98.0,41.901552,-87.634537,NEAR NORTH SIDE
15,Manley Career Academy High School,2935 W Polk St,Chicago,60612,HS,41.0,41.870912,-87.699887,EAST GARFIELD PARK
17,Northside College Preparatory High School,5501 N Kedzie Ave,Chicago,60625,HS,99.0,41.981352,-87.708672,NORTH PARK
28,Michele Clark Academic Prep Magnet High School,5101 W Harrison St,Chicago,60644,HS,,41.872857,-87.753355,AUSTIN
30,Uplift Community High School,900 W Wilson Ave,Chicago,60640,HS,50.0,41.965574,-87.652522,UPTOWN
...,...,...,...,...,...,...,...,...,...
554,Chicago High School for Agricultural Sciences,3857 W 111th St,Chicago,60655,HS,87.0,41.691194,-87.717739,MOUNT GREENWOOD
559,Stephen T Mather High School,5835 N Lincoln Ave,Chicago,60659,HS,58.0,41.987595,-87.702449,WEST RIDGE
560,High School of Leadership at South Shore,7627 S Constance Ave,Chicago,60649,HS,,41.756194,-87.579607,SOUTH SHORE
561,TEAM Englewood Community Academy High School,6201 S Stewart Ave,Chicago,60621,HS,45.0,41.781493,-87.634942,ENGLEWOOD


## Part 2: Creating a new consolidated dataset of the schools, and the most common venues and the respective Community Areas along with co-ordinates.: ##

This data will be fetched using Four Square API to explore the venues and to apply machine learning algorithm to cluster the neighbourhoods and present the findings by plotting it on maps using Folium.

Setting Up Foursquare Credentials (to be removed from github version of notebook):

In [91]:
#Four Square Credentials

CLIENT_ID = ''
CLIENT_SECRET = ''
VERSION = '20210516'
LIMIT = 10

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


A function to fetch venues around a given location:

In [70]:
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name of School', 
                  'School Latitude', 
                  'School Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Fetching venues around public high schools:

In [71]:
chicago_school_venues = getNearbyVenues(names=chicago_high_schools['Name of School'],
                                   latitudes=chicago_high_schools['Latitude'],
                                   longitudes=chicago_high_schools['Longitude']
                                  )

Walter Payton College Preparatory High School
Manley Career Academy High School
Northside College Preparatory High School
Michele Clark Academic Prep Magnet High School
Uplift Community High School
Morgan Park High School
Bronzeville Scholastic Academy High School
William J Bogan High School
Emil G Hirsch Metropolitan High School
Austin Polytechnical Academy High School
World Language Academy High School
Multicultural Academy of Scholarship
Mason High School
Marie Sklodowska Curie Metropolitan High School
George Washington High School
Robert Lindblom Math & Science Academy High School
Benito Juarez Community Academy High School
Hyde Park Academy High School
John Marshall Metropolitan High School
Friedrich W von Steuben Metropolitan Science High School
Southside Occupational Academy High School
Chicago Military Academy High School
Eric Solorio Academy High School
Neal F Simeon Career Academy High School
John Hancock College Preparatory High School
Roald Amundsen High School
Edwin G Fore

Data frame containing venues aroung each public high school in Chicago:

In [72]:
print(chicago_school_venues.shape)
chicago_school_venues.head()

(69, 5)


Unnamed: 0,Name of School,School Latitude,School Longitude,Venue,Venue Category
0,Northside College Preparatory High School,41.981352,-87.708672,Lake Shore Symphony Rehersal,Music Venue
1,Michele Clark Academic Prep Magnet High School,41.872857,-87.753355,YWCA of Metropolitan Chicago,Gym / Fitness Center
2,Uplift Community High School,41.965574,-87.652522,Citizen Skate Cafe,Café
3,Uplift Community High School,41.965574,-87.652522,CVS pharmacy,Pharmacy
4,William J Bogan High School,41.749348,-87.721097,Dollar Tree,Discount Store


Groupping of venues with respect to school location: 

In [73]:
chicago_school_venues.groupby('Name of School').count().drop(['School Latitude','School Longitude','Venue Category'], axis = 1)

Unnamed: 0_level_0,Venue
Name of School,Unnamed: 1_level_1
Alcott High School for the Humanities,2
Benito Juarez Community Academy High School,1
Carl Schurz High School,4
Chicago High School for Agricultural Sciences,2
Chicago Military Academy High School,8
DeVry University Advantage Academy High School,2
Friedrich W von Steuben Metropolitan Science High School,2
Gage Park High School,1
George H Corliss High School,1
Gwendolyn Brooks College Preparatory Academy High School,1


In [74]:
print('There are {} uniques categories.'.format(len(chicago_school_venues['Venue Category'].unique())))

There are 47 uniques categories.


One Hot Encoding to analyze each high school surrounding:

In [75]:
# one hot encoding
chicago_school_onehot = pd.get_dummies(chicago_school_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chicago_school_onehot['Name of School'] = chicago_school_venues['Name of School'] 

# move neighborhood column to the first column
fixed_columns = [chicago_school_onehot.columns[-1]] + list(chicago_school_onehot.columns[:-1])
chicago_school_onehot = chicago_school_onehot[fixed_columns]

chicago_school_onehot.head()

Unnamed: 0,Name of School,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Basketball Court,Breakfast Spot,Burger Joint,Bus Station,Café,...,Pool,Record Shop,Restaurant,River,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Thai Restaurant,Wings Joint,Women's Store
0,Northside College Preparatory High School,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Michele Clark Academic Prep Magnet High School,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Uplift Community High School,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,Uplift Community High School,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,William J Bogan High School,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [76]:
chicago_school_onehot.shape

(69, 48)

In [77]:
chicago_school_grouped = chicago_school_onehot.groupby('Name of School').mean().reset_index()
chicago_school_grouped
chicago_school_grouped.shape

(29, 48)

Top 5 most common venues around high school:

In [78]:
num_top_venues = 5

for school in chicago_school_grouped['Name of School']:
    print("----"+school+"----")
    temp = chicago_school_grouped[chicago_school_grouped['Name of School'] == school].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alcott High School for the Humanities----
                 venue  freq
0              Dog Run   0.5
1                 Park   0.5
2  American Restaurant   0.0
3           Hookah Bar   0.0
4    Korean Restaurant   0.0


----Benito Juarez Community Academy High School----
                 venue  freq
0   Mexican Restaurant   1.0
1  American Restaurant   0.0
2          Pizza Place   0.0
3    Korean Restaurant   0.0
4               Market   0.0


----Carl Schurz High School----
                 venue  freq
0    Convenience Store  0.25
1     Asian Restaurant  0.25
2      Thai Restaurant  0.25
3  Martial Arts School  0.25
4           Playground  0.00


----Chicago High School for Agricultural Sciences----
                 venue  freq
0            BBQ Joint   0.5
1             Dive Bar   0.5
2  American Restaurant   0.0
3          Pizza Place   0.0
4               Market   0.0


----Chicago Military Academy High School----
                 venue  freq
0       History Museum  0.12
1       S

Building a new dataframe and display the top 5 venues around each high school:

In [79]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [80]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name of School']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
chicago_school_venues_sorted = pd.DataFrame(columns=columns)
chicago_school_venues_sorted['Name of School'] = chicago_school_grouped['Name of School']

for ind in np.arange(chicago_school_grouped.shape[0]):
    chicago_school_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chicago_school_grouped.iloc[ind, :], num_top_venues)

chicago_school_venues_sorted.head()

Unnamed: 0,Name of School,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alcott High School for the Humanities,Dog Run,Park,Women's Store,Historic Site,Furniture / Home Store
1,Benito Juarez Community Academy High School,Mexican Restaurant,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint
2,Carl Schurz High School,Thai Restaurant,Asian Restaurant,Martial Arts School,Convenience Store,Women's Store
3,Chicago High School for Agricultural Sciences,BBQ Joint,Dive Bar,Women's Store,Coffee Shop,Furniture / Home Store
4,Chicago Military Academy High School,History Museum,Pizza Place,Historic Site,Wings Joint,Cosmetics Shop


#### Clustering Chicago high schools: ####

In [81]:
# set number of clusters
kclusters = 5

chicago_school_grouped_clustering = chicago_school_grouped.drop('Name of School', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chicago_school_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 0, 0, 0, 0, 0, 2, 1])

In [82]:
# add clustering labels
chicago_school_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [83]:
chicago_merged = chicago_schools_sel
chicago_merged = chicago_merged.join(chicago_school_venues_sorted.set_index('Name of School'), on='Name of School')
chicago_merged

Unnamed: 0,Name of School,Street Address,City,ZIP Code,"Elementary, Middle, or High School",Safety Score,Latitude,Longitude,Community Area Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Charles G Hammond Elementary School,2819 W 21st Pl,Chicago,60623,ES,40.0,41.852691,-87.696278,SOUTH LAWNDALE,,,,,,
1,Marvin Camras Elementary School,3000 N Mango Ave,Chicago,60634,ES,54.0,41.934966,-87.770165,BELMONT CRAGIN,,,,,,
2,Eliza Chappell Elementary School,2135 W Foster Ave,Chicago,60625,ES,70.0,41.975867,-87.683254,LINCOLN SQUARE,,,,,,
3,Daniel R Cameron Elementary School,1234 N Monticello Ave,Chicago,60651,ES,42.0,41.903785,-87.717963,HUMBOLDT PARK,,,,,,
4,Sir Miles Davis Magnet Elementary Academy,6730 S Paulina St,Chicago,60636,ES,35.0,41.771222,-87.666567,WEST ENGLEWOOD,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
561,TEAM Englewood Community Academy High School,6201 S Stewart Ave,Chicago,60621,HS,45.0,41.781493,-87.634942,ENGLEWOOD,,,,,,
562,Robert A Black Magnet Elementary School,9101 S Euclid Ave,Chicago,60619,ES,68.0,41.729656,-87.576605,CALUMET HEIGHTS,,,,,,
563,Martin A Ryerson Elementary School,646 N Lawndale Ave,Chicago,60624,ES,40.0,41.893211,-87.718833,HUMBOLDT PARK,,,,,,
564,Infinity Math Science and Technology High School,3120 S Kostner Ave,Chicago,60623,HS,58.0,41.836020,-87.734195,SOUTH LAWNDALE,,,,,,


In [84]:
chicago_merged.dropna(subset = ["Cluster Labels"], inplace=True)
chicago_merged['Cluster Labels'].isna().sum()

0

In [85]:
chicago_merged['Cluster Labels'] = chicago_merged['Cluster Labels'].apply(np.int64)
#chicago_merged['Longitude']

In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chicago_merged['Latitude'], chicago_merged['Longitude'], chicago_merged['Name of School'], chicago_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Analysis

Examining the resulting Clusters:

Cluster 1:

In [86]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 0, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,Street Address,Safety Score,Latitude,Longitude,Community Area Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
17,5501 N Kedzie Ave,99.0,41.981352,-87.708672,NORTH PARK,0,Music Venue,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint
28,5101 W Harrison St,,41.872857,-87.753355,AUSTIN,0,Gym / Fitness Center,Coffee Shop,Furniture / Home Store,Fried Chicken Joint,Football Stadium
30,900 W Wilson Ave,50.0,41.965574,-87.652522,UPTOWN,0,Pharmacy,Café,Coffee Shop,Furniture / Home Store,Fried Chicken Joint
38,3939 W 79th St,20.0,41.749348,-87.721097,ASHBURN,0,Furniture / Home Store,Fast Food Restaurant,Discount Store,Women's Store,Coffee Shop
77,2150 S Laflin St,46.0,41.852673,-87.663769,LOWER WEST SIDE,0,Mexican Restaurant,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint
95,5039 N Kimball Ave,70.0,41.973193,-87.71335,NORTH PARK,0,River,Bus Station,Women's Store,Coffee Shop,Furniture / Home Store
107,3519 S Giles Ave,32.0,41.830538,-87.619178,DOUGLAS,0,History Museum,Pizza Place,Historic Site,Wings Joint,Cosmetics Shop
116,5110 N Damen Ave,51.0,41.975079,-87.679521,LINCOLN SQUARE,0,Basketball Court,Pool,Women's Store,Coffee Shop,Furniture / Home Store
139,3601 N Milwaukee Ave,48.0,41.946408,-87.735625,IRVING PARK,0,Thai Restaurant,Asian Restaurant,Martial Arts School,Convenience Store,Women's Store
190,4015 N Ashland Ave,64.0,41.954784,-87.668916,LAKE VIEW,0,Chinese Restaurant,Coffee Shop,Thai Restaurant,Fried Chicken Joint,Breakfast Spot


Cluster 2:

In [87]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 1, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,Street Address,Safety Score,Latitude,Longitude,Community Area Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
244,4136 S California Ave,36.0,41.818711,-87.694675,BRIGHTON PARK,1,Park,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint
463,2957 N Hoyne Ave,70.0,41.935761,-87.680524,NORTH CENTER,1,Dog Run,Park,Women's Store,Historic Site,Furniture / Home Store
473,250 E 111th St,64.0,41.69279,-87.616381,ROSELAND,1,Park,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint


In [None]:
Cluster 3:

In [88]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 2, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,Street Address,Safety Score,Latitude,Longitude,Community Area Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
198,821 E 103rd St,33.0,41.707391,-87.603078,PULLMAN,2,Football Stadium,Women's Store,Coffee Shop,Furniture / Home Store,Fried Chicken Joint


Cluster 4:

In [89]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 3, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,Street Address,Safety Score,Latitude,Longitude,Community Area Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
68,4959 S Archer Ave,43.0,41.803046,-87.722007,ARCHER HEIGHTS,3,Hotel,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint
176,5900 N Glenwood Ave,64.0,41.989051,-87.665262,EDGEWATER,3,Hotel,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint
274,5900 N Glenwood Ave,48.0,41.989051,-87.665262,EDGEWATER,3,Hotel,Women's Store,Historic Site,Furniture / Home Store,Fried Chicken Joint


Cluster 5:

In [90]:
chicago_merged.loc[chicago_merged['Cluster Labels'] == 4, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,Street Address,Safety Score,Latitude,Longitude,Community Area Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
521,2710 E 89th St,17.0,41.733761,-87.557753,SOUTH CHICAGO,4,American Restaurant,Coffee Shop,Furniture / Home Store,Fried Chicken Joint,Football Stadium


## Results and Discussion ##

The object of the business problem was to help Chicago migrants to identify suitable public school to their children, located in area surrounded with the appropriate venues. This has been achieved by first making use of Chicago Public Schools data to identify a proper place with considerable number of venues. Due to substantial number of public schools in Chicago  focus was made on the public high schools only. Next, groupping of the high schools into clusters was done to assist the migrants by providing them with relevant data about venues and safety of a given school surrounding.

## Conclusion

We have explored the Chicago Public Schools data to understand different types of public schools in all Community Areas of Chicago and later categorized them into different types. This helped us group the schools. We further shortlist the high schools based on the common venues, to choose clusters of schools which best suits the business problem.